deepinsight.doctor.clustering package¶
Submodules¶
deepinsight.doctor.clustering.anomaly_detection module¶
-
class
deepinsight.doctor.clustering.anomaly_detection.
DkuIsolationForest
(n_estimators, max_samples, max_features, contamination, bootstrap, max_anomalies)¶ Bases:
object
-
fit
(X)¶
-
fit_predict
(X)¶
-
get_additional_scoring_columns
(X)¶
-
get_cluster_labels
()¶
-
get_top_outliers
(train_X, rescalers, extra_profiling_df)¶
-
predict
(X)¶
-
deepinsight.doctor.clustering.clustering_fit module¶
-
class
deepinsight.doctor.clustering.clustering_fit.
ClusteringModelInspector
(modeling_params, clf)¶ Bases:
object
-
get_actual_params
()¶
-
-
deepinsight.doctor.clustering.clustering_fit.
clustering_fit
(modeling_params, transformed_train)¶ Returns (clf, actual_params, cluster_labels)
-
deepinsight.doctor.clustering.clustering_fit.
clustering_model_from_params
(modeling_params, rows=0)¶
-
deepinsight.doctor.clustering.clustering_fit.
clustering_predict
(modeling_params, clusterer, transformed_data)¶ Returns (labels np array, addtional columns DF)
-
deepinsight.doctor.clustering.clustering_fit.
scikit_model
(modeling_params)¶
deepinsight.doctor.clustering.clustering_scorer module¶
clustering_scorer : Takes a trained clusterer, dataframe and outputs appropriate scoring data
-
class
deepinsight.doctor.clustering.clustering_scorer.
ClusteringModelScorer
(cluster_model, transformed_source, source_index, cluster_labels, preprocessing_params, modeling_params, pipeline, run_folder)¶ Bases:
object
-
add_metric
(measure, value)¶
-
build_facts
()¶
-
build_heatmap
()¶
-
build_numerical_cluster_stats
()¶
-
build_scatter
()¶
-
cluster_description
()¶
-
cluster_profiling
()¶
-
cluster_summary
()¶
-
drop_outliers
()¶
-
gap_statistic
(model, nboot)¶
-
iter_facts
()¶
-
pk_path
(path)¶
-
score
()¶
-
silhouette_score
()¶
-
variables_importance
()¶
-
-
class
deepinsight.doctor.clustering.clustering_scorer.
IntrinsicClusteringModelScorer
(modeling_params, clf, train_X, pipeline, out_folder, profiling_df=None)¶ Bases:
object
-
pk_path
(path)¶
-
score
()¶
-
-
deepinsight.doctor.clustering.clustering_scorer.
make_percentile
(vals)¶
-
deepinsight.doctor.clustering.clustering_scorer.
no_nan
(vals)¶
-
deepinsight.doctor.clustering.clustering_scorer.
value_counts
(series, n_most_common=100)¶ Returns an ordered dict, value -> count
Handles null. n.b. in new versions of pandas value_counts can handle null as well.
deepinsight.doctor.clustering.common module¶
-
deepinsight.doctor.clustering.common.
prepare_multiframe
(train_X, modeling_params)¶
deepinsight.doctor.clustering.reg_cluster_recipe module¶
Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment
-
deepinsight.doctor.clustering.reg_cluster_recipe.
main
(exec_folder, output_dataset, keptInputColumns)¶
deepinsight.doctor.clustering.reg_scoring_recipe module¶
Execute a clustering scoring recipe in PyRegular mode Must be called in a Flow environment
-
deepinsight.doctor.clustering.reg_scoring_recipe.
main
(model_folder, input_dataset_smartname, output_dataset_smartname, recipe_desc, script, preparation_output_schema)¶
deepinsight.doctor.clustering.reg_train_recipe module¶
Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment
-
deepinsight.doctor.clustering.reg_train_recipe.
main
(exec_folder)¶