deepinsight.doctor.clustering package¶
Submodules¶
deepinsight.doctor.clustering.anomaly_detection module¶
- 
class deepinsight.doctor.clustering.anomaly_detection.DkuIsolationForest(n_estimators, max_samples, max_features, contamination, bootstrap, max_anomalies)¶
- Bases: - object- 
fit(X)¶
 - 
fit_predict(X)¶
 - 
get_additional_scoring_columns(X)¶
 - 
get_cluster_labels()¶
 - 
get_top_outliers(train_X, rescalers, extra_profiling_df)¶
 - 
predict(X)¶
 
- 
deepinsight.doctor.clustering.clustering_fit module¶
- 
class deepinsight.doctor.clustering.clustering_fit.ClusteringModelInspector(modeling_params, clf)¶
- Bases: - object- 
get_actual_params()¶
 
- 
- 
deepinsight.doctor.clustering.clustering_fit.clustering_fit(modeling_params, transformed_train)¶
- Returns (clf, actual_params, cluster_labels) 
- 
deepinsight.doctor.clustering.clustering_fit.clustering_model_from_params(modeling_params, rows=0)¶
- 
deepinsight.doctor.clustering.clustering_fit.clustering_predict(modeling_params, clusterer, transformed_data)¶
- Returns (labels np array, addtional columns DF) 
- 
deepinsight.doctor.clustering.clustering_fit.scikit_model(modeling_params)¶
deepinsight.doctor.clustering.clustering_scorer module¶
clustering_scorer : Takes a trained clusterer, dataframe and outputs appropriate scoring data
- 
class deepinsight.doctor.clustering.clustering_scorer.ClusteringModelScorer(cluster_model, transformed_source, source_index, cluster_labels, preprocessing_params, modeling_params, pipeline, run_folder)¶
- Bases: - object- 
add_metric(measure, value)¶
 - 
build_facts()¶
 - 
build_heatmap()¶
 - 
build_numerical_cluster_stats()¶
 - 
build_scatter()¶
 - 
cluster_description()¶
 - 
cluster_profiling()¶
 - 
cluster_summary()¶
 - 
drop_outliers()¶
 - 
gap_statistic(model, nboot)¶
 - 
iter_facts()¶
 - 
pk_path(path)¶
 - 
score()¶
 - 
silhouette_score()¶
 - 
variables_importance()¶
 
- 
- 
class deepinsight.doctor.clustering.clustering_scorer.IntrinsicClusteringModelScorer(modeling_params, clf, train_X, pipeline, out_folder, profiling_df=None)¶
- Bases: - object- 
pk_path(path)¶
 - 
score()¶
 
- 
- 
deepinsight.doctor.clustering.clustering_scorer.make_percentile(vals)¶
- 
deepinsight.doctor.clustering.clustering_scorer.no_nan(vals)¶
- 
deepinsight.doctor.clustering.clustering_scorer.value_counts(series, n_most_common=100)¶
- Returns an ordered dict, value -> count - Handles null. n.b. in new versions of pandas value_counts can handle null as well. 
deepinsight.doctor.clustering.common module¶
- 
deepinsight.doctor.clustering.common.prepare_multiframe(train_X, modeling_params)¶
deepinsight.doctor.clustering.reg_cluster_recipe module¶
Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment
- 
deepinsight.doctor.clustering.reg_cluster_recipe.main(exec_folder, output_dataset, keptInputColumns)¶
deepinsight.doctor.clustering.reg_scoring_recipe module¶
Execute a clustering scoring recipe in PyRegular mode Must be called in a Flow environment
- 
deepinsight.doctor.clustering.reg_scoring_recipe.main(model_folder, input_dataset_smartname, output_dataset_smartname, recipe_desc, script, preparation_output_schema)¶
deepinsight.doctor.clustering.reg_train_recipe module¶
Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment
- 
deepinsight.doctor.clustering.reg_train_recipe.main(exec_folder)¶