deepinsight.doctor.clustering package¶
Submodules¶
deepinsight.doctor.clustering.anomaly_detection module¶
-
class
deepinsight.doctor.clustering.anomaly_detection.DkuIsolationForest(n_estimators, max_samples, max_features, contamination, bootstrap, max_anomalies)¶ Bases:
object-
fit(X)¶
-
fit_predict(X)¶
-
get_additional_scoring_columns(X)¶
-
get_cluster_labels()¶
-
get_top_outliers(train_X, rescalers, extra_profiling_df)¶
-
predict(X)¶
-
deepinsight.doctor.clustering.clustering_fit module¶
-
class
deepinsight.doctor.clustering.clustering_fit.ClusteringModelInspector(modeling_params, clf)¶ Bases:
object-
get_actual_params()¶
-
-
deepinsight.doctor.clustering.clustering_fit.clustering_fit(modeling_params, transformed_train)¶ Returns (clf, actual_params, cluster_labels)
-
deepinsight.doctor.clustering.clustering_fit.clustering_model_from_params(modeling_params, rows=0)¶
-
deepinsight.doctor.clustering.clustering_fit.clustering_predict(modeling_params, clusterer, transformed_data)¶ Returns (labels np array, addtional columns DF)
-
deepinsight.doctor.clustering.clustering_fit.scikit_model(modeling_params)¶
deepinsight.doctor.clustering.clustering_scorer module¶
clustering_scorer : Takes a trained clusterer, dataframe and outputs appropriate scoring data
-
class
deepinsight.doctor.clustering.clustering_scorer.ClusteringModelScorer(cluster_model, transformed_source, source_index, cluster_labels, preprocessing_params, modeling_params, pipeline, run_folder)¶ Bases:
object-
add_metric(measure, value)¶
-
build_facts()¶
-
build_heatmap()¶
-
build_numerical_cluster_stats()¶
-
build_scatter()¶
-
cluster_description()¶
-
cluster_profiling()¶
-
cluster_summary()¶
-
drop_outliers()¶
-
gap_statistic(model, nboot)¶
-
iter_facts()¶
-
pk_path(path)¶
-
score()¶
-
silhouette_score()¶
-
variables_importance()¶
-
-
class
deepinsight.doctor.clustering.clustering_scorer.IntrinsicClusteringModelScorer(modeling_params, clf, train_X, pipeline, out_folder, profiling_df=None)¶ Bases:
object-
pk_path(path)¶
-
score()¶
-
-
deepinsight.doctor.clustering.clustering_scorer.make_percentile(vals)¶
-
deepinsight.doctor.clustering.clustering_scorer.no_nan(vals)¶
-
deepinsight.doctor.clustering.clustering_scorer.value_counts(series, n_most_common=100)¶ Returns an ordered dict, value -> count
Handles null. n.b. in new versions of pandas value_counts can handle null as well.
deepinsight.doctor.clustering.common module¶
-
deepinsight.doctor.clustering.common.prepare_multiframe(train_X, modeling_params)¶
deepinsight.doctor.clustering.reg_cluster_recipe module¶
Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment
-
deepinsight.doctor.clustering.reg_cluster_recipe.main(exec_folder, output_dataset, keptInputColumns)¶
deepinsight.doctor.clustering.reg_scoring_recipe module¶
Execute a clustering scoring recipe in PyRegular mode Must be called in a Flow environment
-
deepinsight.doctor.clustering.reg_scoring_recipe.main(model_folder, input_dataset_smartname, output_dataset_smartname, recipe_desc, script, preparation_output_schema)¶
deepinsight.doctor.clustering.reg_train_recipe module¶
Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment
-
deepinsight.doctor.clustering.reg_train_recipe.main(exec_folder)¶