deepinsight.doctor package¶

Subpackages¶

Submodules¶

deepinsight.doctor.clustering_entrypoints module¶

deepinsight.doctor.clustering_entrypoints.clustering_train_score_save(transformed_src, src_index, preprocessing_params, modeling_params, run_folder, listener, update_fn, pipeline)¶: Trains one model and saves results to run_folder

deepinsight.doctor.commands module¶

Commands available from the doctor main kernel server.

To add a command, simple add a method. Method starting by a _ are not exposed.

Arguments with default values are supported. *args ,**kargs are not supported.

If one of your json parameter is a global in python, you can suffix your parameter by an _ (e.g. input_)

deepinsight.doctor.commands.build_pipeline_and_handler(collector_data, core_params, run_folder, preprocessing_params, selection_state_folder=None, allow_empty_mf=False)¶

deepinsight.doctor.commands.clustering_rescore(split_desc, preprocessing_folder, model_folder)¶

deepinsight.doctor.commands.compute_pdp(job_id, split_desc, core_params, preprocessing_folder, model_folder, computation_parameters=None)¶

deepinsight.doctor.commands.compute_subpopulation(job_id, split_desc, core_params, preprocessing_folder, model_folder, computation_parameters=None)¶

deepinsight.doctor.commands.create_clustering_notebook(model_name, model_date, dataset_smartname, script, preparation_output_schema, split_stuff, preprocessing_params, pre_train, post_train)¶

deepinsight.doctor.commands.create_ensemble(split_desc, core_params, model_folder, preprocessing_folder, model_folders, preprocessing_folders)¶

deepinsight.doctor.commands.create_prediction_notebook(model_name, model_date, dataset_smartname, script, preparation_output_schema, split_stuff, core_params, preprocessing_params, pre_train, post_train)¶

deepinsight.doctor.commands.ping()¶

deepinsight.doctor.commands.train_clustering_models_nosave(split_desc, preprocessing_set)¶

Regular (mode 1) train: - Non streamed single split + fit preprocess on train + preprocess test - Fit N models sequentially

Fit

Save clf

Compute and save clf performance

Score, save scored test set + scored performnace

deepinsight.doctor.commands.train_prediction_keras(core_params, preprocessing_set, split_desc)¶

deepinsight.doctor.commands.train_prediction_kfold(core_params, preprocessing_set, split_desc)¶

deepinsight.doctor.commands.train_prediction_models_nosave(core_params, preprocessing_set, split_desc)¶

Regular (mode 1) train: - Non streamed single split + fit preprocess on train + preprocess test - Fit N models sequentially

Fit

Save clf

Compute and save clf performance

Score, save scored test set + scored performnace

deepinsight.doctor.constants module¶

deepinsight.doctor.dtapi module¶

deepinsight.doctor.dtapi.json_api(api)¶

deepinsight.doctor.dtapi.trim_underscores(s)¶

deepinsight.doctor.exception module¶

exception deepinsight.doctor.exception.DoctorException(message='', code=400, errorType='ExpectedException')¶: Bases: Exception

deepinsight.doctor.forest module¶

class deepinsight.doctor.forest.ClassificationIML(**params)¶: Bases: deepinsight.doctor.forest.IML

class deepinsight.doctor.forest.IML(**params)¶

Bases: object

classes_¶

estimators_¶

feature_importances_¶

fit(X, Y, sample_weight=None)¶

get_params(**kwargs)¶

merge(clf2)¶

model(params)¶

predict(X)¶

predict_proba(X)¶

set_params(**params)¶

should_continue(Ytest, Y1, Y2)¶

class deepinsight.doctor.forest.RandomForestClassifierIML(**params)¶

Bases: deepinsight.doctor.forest.ClassificationIML

Random Forest with autostop of growing the forest

i = 0¶

merge(clf2)¶

model(params)¶

class deepinsight.doctor.forest.RandomForestRegressorIML(**params)¶

Bases: deepinsight.doctor.forest.RegressionIML

Random Forest with autostop of growing the forest

i = 0¶

merge(clf2)¶

model(params)¶

class deepinsight.doctor.forest.RegressionIML(**params)¶: Bases: deepinsight.doctor.forest.IML

deepinsight.doctor.multiframe module¶

class deepinsight.doctor.multiframe.DataFrameBuilder(prefix='')¶

Bases: object

A dataframe builder just receives columns to ultimately create a dataframe, respecting the insertion order.

add_column(column_name, column_values)¶

columns¶

prefix¶

to_dataframe()¶

class deepinsight.doctor.multiframe.DataFrameWrapper(df)¶

Bases: object

shape¶

class deepinsight.doctor.multiframe.MultiFrame¶

Bases: object

The multiframe agglomerates horizontally several blocks of columns. All blocks must have the same number of rows. Each block is named.

Blocks can be:

Pandas DataFrames
Numpy arrays
Scipy sparse matrices

The MultiFrame also gives a single dataframe builder that allows you to build a dataframe from several series.

append_df(name, df, keep=True)¶

append_np_block(name, array, col_names)¶

append_sparse(name, matrix)¶

as_csr_matrix()¶

as_dataframe()¶

as_np_array()¶

static block_as_np_array(blk)¶

col_as_series(block, col_name)¶

columns()¶

drop_rows(deletion_mask)¶

flush_df_builder(name)¶

get_block(name)¶

get_df_builder(name)¶: Helper for building a dataframe from series

has_df_builder(name)¶

iter_blocks(with_keep_info=False)¶

iter_columns()¶

iter_dataframes()¶

nnz()¶

select_columns(names)¶

set_index_from_df(df)¶

shape()¶

stats()¶

class deepinsight.doctor.multiframe.NamedNPArray(array, names)¶

Bases: object

shape¶

class deepinsight.doctor.multiframe.SparseMatrixWithNames(matrix, names)¶

Bases: object

shape¶

deepinsight.doctor.multiframe.delete_rows_csr(mat, indices)¶: Remove the rows denoted by indices form the CSR sparse matrix mat. Taken from http://stackoverflow.com/questions/13077527

deepinsight.doctor.multiframe.is_series_like(series)¶

deepinsight.doctor.notebook_builder module¶

notebook_builder.py Base classes for creating IPython notebooks

class deepinsight.doctor.notebook_builder.ClusteringNotebookBuilder(model_name, model_date, dataset_smartname, script_steps, preparation_output_schema, split_stuff, preprocessing_params, pre_train, post_train)¶

Bases: deepinsight.doctor.notebook_builder.NotebookBuilder

context()¶

is_supervized()¶

template_name()¶

title()¶

class deepinsight.doctor.notebook_builder.NotebookBuilder¶

Bases: object

algorithm¶

categorical_preprocessing_context()¶

context()¶

create_notebook()¶

handle_missing_context()¶

is_supervized()¶

rescale_context()¶

template()¶

template_name()¶

text_preprocessing_context()¶

title()¶

class deepinsight.doctor.notebook_builder.PredictionNotebookBuilder(model_name, model_date, dataset_smartname, script_steps, preparation_output_schema, split_stuff, core_params, preprocessing_params, pre_train, post_train)¶

Bases: deepinsight.doctor.notebook_builder.NotebookBuilder

categorical_preprocessing_context()¶

context()¶

is_supervized()¶

prediction_type¶

target_variable¶

template_name()¶

title()¶

deepinsight.doctor.notebook_builder.code_cell(code)¶

deepinsight.doctor.notebook_builder.comment_cell(comment)¶

deepinsight.doctor.notebook_builder.extract_input_columns(preprocessing_params, with_target=False, with_profiling=True)¶

deepinsight.doctor.notebook_builder.header_cell(msg=None, level=1)¶

deepinsight.doctor.notebook_builder.parse_cells_from_render(content)¶

deepinsight.doctor.prediction_entrypoints module¶

deepinsight.doctor.prediction_entrypoints.prediction_train_model_keras(transformed_normal, train_df, test_df, pipeline, modeling_params, core_params, per_feature, run_folder, listener, update_fn, target_map, generated_features_mapping, save_model=True)¶: Fit a CLF on Keras, save it, computes intrinsic scores, writes them, scores a test set it, write scores and extrinsinc perf

deepinsight.doctor.prediction_entrypoints.prediction_train_model_kfold(full_df_clean, core_params, split_desc, preprocessing_params, optimized_params, pp_folder, m_folder, listener, update_fn, with_sample_weight, with_class_weight, calibrate_proba=False)¶

deepinsight.doctor.prediction_entrypoints.prediction_train_score_save(transformed_train, transformed_test, test_df_index, core_params, split_desc, modeling_params, run_folder, listener, target_map, update_fn, pipeline, m_folder)¶: Fit a CLF, save it, computes intrinsic scores, writes them, scores a test set it, write scores and extrinsinc perf

deepinsight.doctor.prediction_entrypoints.prediction_train_score_save_ensemble(train, test, core_params, split_desc, modeling_params, run_folder, listener, target_map, update_fn, pipeline, with_sample_weight)¶: Fit a CLF, save it, computes intrinsic scores, writes them, scores a test set it, write scores and extrinsinc perf

deepinsight.doctor.preprocessing_collector module¶

Perform the initial feature analysis that will drive the actual preprocessor for prediction Takes the preprocessing params and the train dataframe and outputs the feature analysis data.

class deepinsight.doctor.preprocessing_collector.ClusteringPreprocessingDataCollector(train_df, preprocessing_params)¶

Bases: deepinsight.doctor.preprocessing_collector.PreprocessingDataCollector

feature_needs_analysis(params)¶: params is the params object from preprocessing params

class deepinsight.doctor.preprocessing_collector.PredictionPreprocessingDataCollector(train_df, preprocessing_params)¶

Bases: deepinsight.doctor.preprocessing_collector.PreprocessingDataCollector

feature_needs_analysis(params)¶: params is the params object from preprocessing params

class deepinsight.doctor.preprocessing_collector.PreprocessingDataCollector(train_df, preprocessing_params)¶

Bases: object

build()¶

get_feature_analysis_data(name, params)¶: Analyzes a single feature (preprocessing params -> feature analysis data) params is the preprocessing params for this feature.

It must contain: - name, type, role (role_reason) - missing_handling, missing_impute_with, category_handling, rescaling

deepinsight.doctor.preprocessing_handler module¶

class deepinsight.doctor.preprocessing_handler.BinaryClassificationPreprocessingHandler(core_params, preprocessing_params, data_path)¶

Bases: deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler

target_map¶

class deepinsight.doctor.preprocessing_handler.ClusteringPreprocessingHandler(core_params, preprocessing_params, data_path)¶

Bases: deepinsight.doctor.preprocessing_handler.PreprocessingHandler

Build the preprocessing pipeline for clustering projects

Clustering preprocessing is especially difficult from misc reasons, we need to keep track of the multiframe at different state of its processing :

train
The model used for clustering performs on preprocessed INPUT columns, on which we may or may not remove outliers, and may or may not apply a PCA.
- TRAIN
profiling
Columns that are not actually INPUT should still be preprocessed (e.g. Dummified) in order to compute different statistics on the the different values. Such columns have a role called “PROFILING”.

Dataframe preprocessed, (including PROFILING columns)
- PREPROCESSED
feature importance
Feature importance is done by making a classification on the variables. In order to have its result human readable, we need to do this analysis on prepca values.
- TRAIN_PREPCA
outliers
The outliers labels is used to make sure we can reannotated the initial datasets (for feature importance and profiling)
- OUTLIERS

preprocessing_steps()¶

class deepinsight.doctor.preprocessing_handler.MulticlassPreprocessingHandler(core_params, preprocessing_params, data_path)¶

Bases: deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler

target_map¶

class deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler(core_params, preprocessing_params, data_path)¶

Bases: deepinsight.doctor.preprocessing_handler.PreprocessingHandler

has_sample_weight_variable¶

preprocessing_steps(with_target=False, verbose=True, allow_empty_mf=False)¶

sample_weight_variable¶

set_selection_state_folder(selection_state_folder)¶

target_map¶

weight_map¶

class deepinsight.doctor.preprocessing_handler.PreprocessingHandler(core_params, preprocessing_params, data_path)¶

Bases: object

Manager class for the preprocessing

static build(core_params, preprocessing_params, data_path)¶: Build the proper type of preprocessing handling depending on the preprocessing params

build_preprocessing_pipeline(*args, **kwargs)¶

get_impact_coder(column)¶

get_pca_resource()¶

get_resource(resource_name, type)¶: Resources are just dictionaries either: - pickled in a .pkl named after their resource name. - dumped to a .json named after their resource name

get_texthash_svd_data(column)¶

input_columns(with_target=True, with_profiling=True)¶

Return the list of input features.

Can help limit RAM usage, by giving that to get_dataframe.

(includes profiling columns)

open(relative_filepath, *args, **kargs)¶: open a file relatively to self.folder_path

prediction_type¶

preprocessing_steps(verbose=True, **kwargs)¶

report(pipeline)¶

save_data()¶

target_variable¶

class deepinsight.doctor.preprocessing_handler.RegressionPreprocessingHandler(core_params, preprocessing_params, data_path)¶

Bases: deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler

target_map¶

deepinsight.doctor.preprocessing_handler.extract_input_columns(preprocessing_params, with_target=False, with_profiling=True, with_sample_weight=False)¶

deepinsight.doctor.preprocessing_handler.get_rescaler(in_block, column_name, column_params, column_collector)¶: Build a rescaler for the original column

deepinsight.doctor.preprocessing_handler.load_relfilepath(basepath, relative_filepath)¶: Returns None if the file does not exists

deepinsight.doctor.server module¶

Main doctor entry point. This is a HTTP server which receives commands from the AnalysisMLKernel Java class

deepinsight.doctor.server.serve(port, secret)¶

deepinsight.doctor package¶

Subpackages¶

Submodules¶

deepinsight.doctor.clustering_entrypoints module¶

deepinsight.doctor.commands module¶

deepinsight.doctor.constants module¶

deepinsight.doctor.dtapi module¶

deepinsight.doctor.exception module¶

deepinsight.doctor.forest module¶

deepinsight.doctor.multiframe module¶

deepinsight.doctor.notebook_builder module¶

deepinsight.doctor.prediction_entrypoints module¶

deepinsight.doctor.preprocessing_collector module¶

deepinsight.doctor.preprocessing_handler module¶

deepinsight.doctor.server module¶

Module contents¶