deepinsight.doctor.clustering package

Submodules

deepinsight.doctor.clustering.anomaly_detection module

class deepinsight.doctor.clustering.anomaly_detection.DkuIsolationForest(n_estimators, max_samples, max_features, contamination, bootstrap, max_anomalies)

Bases: object

fit(X)
fit_predict(X)
get_additional_scoring_columns(X)
get_cluster_labels()
get_top_outliers(train_X, rescalers, extra_profiling_df)
predict(X)

deepinsight.doctor.clustering.clustering_fit module

class deepinsight.doctor.clustering.clustering_fit.ClusteringModelInspector(modeling_params, clf)

Bases: object

get_actual_params()
deepinsight.doctor.clustering.clustering_fit.clustering_fit(modeling_params, transformed_train)

Returns (clf, actual_params, cluster_labels)

deepinsight.doctor.clustering.clustering_fit.clustering_model_from_params(modeling_params, rows=0)
deepinsight.doctor.clustering.clustering_fit.clustering_predict(modeling_params, clusterer, transformed_data)

Returns (labels np array, addtional columns DF)

deepinsight.doctor.clustering.clustering_fit.scikit_model(modeling_params)

deepinsight.doctor.clustering.clustering_scorer module

clustering_scorer : Takes a trained clusterer, dataframe and outputs appropriate scoring data

class deepinsight.doctor.clustering.clustering_scorer.ClusteringModelScorer(cluster_model, transformed_source, source_index, cluster_labels, preprocessing_params, modeling_params, pipeline, run_folder)

Bases: object

add_metric(measure, value)
build_facts()
build_heatmap()
build_numerical_cluster_stats()
build_scatter()
cluster_description()
cluster_profiling()
cluster_summary()
drop_outliers()
gap_statistic(model, nboot)
iter_facts()
pk_path(path)
score()
silhouette_score()
variables_importance()
class deepinsight.doctor.clustering.clustering_scorer.IntrinsicClusteringModelScorer(modeling_params, clf, train_X, pipeline, out_folder, profiling_df=None)

Bases: object

pk_path(path)
score()
deepinsight.doctor.clustering.clustering_scorer.make_percentile(vals)
deepinsight.doctor.clustering.clustering_scorer.no_nan(vals)
deepinsight.doctor.clustering.clustering_scorer.value_counts(series, n_most_common=100)

Returns an ordered dict, value -> count

Handles null. n.b. in new versions of pandas value_counts can handle null as well.

deepinsight.doctor.clustering.common module

deepinsight.doctor.clustering.common.prepare_multiframe(train_X, modeling_params)

deepinsight.doctor.clustering.reg_cluster_recipe module

Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment

deepinsight.doctor.clustering.reg_cluster_recipe.main(exec_folder, output_dataset, keptInputColumns)

deepinsight.doctor.clustering.reg_scoring_recipe module

Execute a clustering scoring recipe in PyRegular mode Must be called in a Flow environment

deepinsight.doctor.clustering.reg_scoring_recipe.main(model_folder, input_dataset_smartname, output_dataset_smartname, recipe_desc, script, preparation_output_schema)

deepinsight.doctor.clustering.reg_train_recipe module

Execute a clustering training recipe in PyRegular mode Must be called in a Flow environment

deepinsight.doctor.clustering.reg_train_recipe.main(exec_folder)

deepinsight.doctor.clustering.two_step_clustering module

class deepinsight.doctor.clustering.two_step_clustering.Tree(id, left_son=None, right_son=None, parent=None)

Bases: object

clean_statistics()
compute_statistics(data, clusters)
is_leaf()
merge(other, id)
to_json(shifts, inv_scales)
class deepinsight.doctor.clustering.two_step_clustering.TwoStepClustering(k, kmeans_k, max_iterations, seed)

Bases: object

fit(X)
fit_predict(X)
get_cluster_labels()
post_process(user_meta)
predict(X)
to_json(data, rescalers)

Module contents