deepinsight.doctor package¶
Subpackages¶
- deepinsight.doctor.clustering package
- Submodules
- deepinsight.doctor.clustering.anomaly_detection module
- deepinsight.doctor.clustering.clustering_fit module
- deepinsight.doctor.clustering.clustering_scorer module
- deepinsight.doctor.clustering.common module
- deepinsight.doctor.clustering.reg_cluster_recipe module
- deepinsight.doctor.clustering.reg_scoring_recipe module
- deepinsight.doctor.clustering.reg_train_recipe module
- deepinsight.doctor.clustering.two_step_clustering module
- Module contents
- deepinsight.doctor.crossval package
- deepinsight.doctor.deep_learning package
- Submodules
- deepinsight.doctor.deep_learning.gpu module
- deepinsight.doctor.deep_learning.keras_callbacks module
- deepinsight.doctor.deep_learning.keras_support module
- deepinsight.doctor.deep_learning.keras_utils module
- deepinsight.doctor.deep_learning.load_model module
- deepinsight.doctor.deep_learning.preprocessing module
- deepinsight.doctor.deep_learning.sequences module
- deepinsight.doctor.deep_learning.shared_variables module
- Module contents
- deepinsight.doctor.posttraining package
- deepinsight.doctor.prediction package
- Submodules
- deepinsight.doctor.prediction.classification_fit module
- deepinsight.doctor.prediction.classification_scoring module
- deepinsight.doctor.prediction.common module
- deepinsight.doctor.prediction.dt_xgboost module
- deepinsight.doctor.prediction.ensembles module
- deepinsight.doctor.prediction.feature_selection module
- deepinsight.doctor.prediction.keras_evaluation_recipe module
- deepinsight.doctor.prediction.keras_scoring_recipe module
- deepinsight.doctor.prediction.lars module
- deepinsight.doctor.prediction.prediction_model_serialization module
- deepinsight.doctor.prediction.reg_evaluation_recipe module
- deepinsight.doctor.prediction.reg_scoring_recipe module
- deepinsight.doctor.prediction.reg_train_recipe module
- deepinsight.doctor.prediction.regression_fit module
- deepinsight.doctor.prediction.regression_scoring module
- deepinsight.doctor.prediction.scoring_base module
- Module contents
- deepinsight.doctor.preprocessing package
- deepinsight.doctor.utils package
- Submodules
- deepinsight.doctor.utils.calibration module
- deepinsight.doctor.utils.crossval module
- deepinsight.doctor.utils.dataframe_cache module
- deepinsight.doctor.utils.interrupt_optimization module
- deepinsight.doctor.utils.lift_curve module
- deepinsight.doctor.utils.listener module
- deepinsight.doctor.utils.magic_main module
- deepinsight.doctor.utils.metrics module
- deepinsight.doctor.utils.split module
- deepinsight.doctor.utils.subsampler module
- Module contents
Submodules¶
deepinsight.doctor.clustering_entrypoints module¶
-
deepinsight.doctor.clustering_entrypoints.
clustering_train_score_save
(transformed_src, src_index, preprocessing_params, modeling_params, run_folder, listener, update_fn, pipeline)¶ Trains one model and saves results to run_folder
deepinsight.doctor.commands module¶
Commands available from the doctor main kernel server.
To add a command, simple add a method. Method starting by a _ are not exposed.
Arguments with default values are supported. *args ,**kargs are not supported.
If one of your json parameter is a global in python, you can suffix your parameter by an _ (e.g. input_)
-
deepinsight.doctor.commands.
build_pipeline_and_handler
(collector_data, core_params, run_folder, preprocessing_params, selection_state_folder=None, allow_empty_mf=False)¶
-
deepinsight.doctor.commands.
clustering_rescore
(split_desc, preprocessing_folder, model_folder)¶
-
deepinsight.doctor.commands.
compute_pdp
(job_id, split_desc, core_params, preprocessing_folder, model_folder, computation_parameters=None)¶
-
deepinsight.doctor.commands.
compute_subpopulation
(job_id, split_desc, core_params, preprocessing_folder, model_folder, computation_parameters=None)¶
-
deepinsight.doctor.commands.
create_clustering_notebook
(model_name, model_date, dataset_smartname, script, preparation_output_schema, split_stuff, preprocessing_params, pre_train, post_train)¶
-
deepinsight.doctor.commands.
create_ensemble
(split_desc, core_params, model_folder, preprocessing_folder, model_folders, preprocessing_folders)¶
-
deepinsight.doctor.commands.
create_prediction_notebook
(model_name, model_date, dataset_smartname, script, preparation_output_schema, split_stuff, core_params, preprocessing_params, pre_train, post_train)¶
-
deepinsight.doctor.commands.
ping
()¶
-
deepinsight.doctor.commands.
train_clustering_models_nosave
(split_desc, preprocessing_set)¶ Regular (mode 1) train: - Non streamed single split + fit preprocess on train + preprocess test - Fit N models sequentially
- Fit
- Save clf
- Compute and save clf performance
- Score, save scored test set + scored performnace
-
deepinsight.doctor.commands.
train_prediction_keras
(core_params, preprocessing_set, split_desc)¶
-
deepinsight.doctor.commands.
train_prediction_kfold
(core_params, preprocessing_set, split_desc)¶
-
deepinsight.doctor.commands.
train_prediction_models_nosave
(core_params, preprocessing_set, split_desc)¶ Regular (mode 1) train: - Non streamed single split + fit preprocess on train + preprocess test - Fit N models sequentially
- Fit
- Save clf
- Compute and save clf performance
- Score, save scored test set + scored performnace
deepinsight.doctor.constants module¶
deepinsight.doctor.dtapi module¶
-
deepinsight.doctor.dtapi.
json_api
(api)¶
-
deepinsight.doctor.dtapi.
trim_underscores
(s)¶
deepinsight.doctor.exception module¶
-
exception
deepinsight.doctor.exception.
DoctorException
(message='', code=400, errorType='ExpectedException')¶ Bases:
Exception
deepinsight.doctor.forest module¶
-
class
deepinsight.doctor.forest.
ClassificationIML
(**params)¶
-
class
deepinsight.doctor.forest.
IML
(**params)¶ Bases:
object
-
classes_
¶
-
estimators_
¶
-
feature_importances_
¶
-
fit
(X, Y, sample_weight=None)¶
-
get_params
(**kwargs)¶
-
merge
(clf2)¶
-
model
(params)¶
-
predict
(X)¶
-
predict_proba
(X)¶
-
set_params
(**params)¶
-
should_continue
(Ytest, Y1, Y2)¶
-
-
class
deepinsight.doctor.forest.
RandomForestClassifierIML
(**params)¶ Bases:
deepinsight.doctor.forest.ClassificationIML
Random Forest with autostop of growing the forest
-
i
= 0¶
-
merge
(clf2)¶
-
model
(params)¶
-
-
class
deepinsight.doctor.forest.
RandomForestRegressorIML
(**params)¶ Bases:
deepinsight.doctor.forest.RegressionIML
Random Forest with autostop of growing the forest
-
i
= 0¶
-
merge
(clf2)¶
-
model
(params)¶
-
-
class
deepinsight.doctor.forest.
RegressionIML
(**params)¶
deepinsight.doctor.multiframe module¶
-
class
deepinsight.doctor.multiframe.
DataFrameBuilder
(prefix='')¶ Bases:
object
A dataframe builder just receives columns to ultimately create a dataframe, respecting the insertion order.
-
add_column
(column_name, column_values)¶
-
columns
¶
-
prefix
¶
-
to_dataframe
()¶
-
-
class
deepinsight.doctor.multiframe.
MultiFrame
¶ Bases:
object
The multiframe agglomerates horizontally several blocks of columns. All blocks must have the same number of rows. Each block is named.
Blocks can be:
- Pandas DataFrames
- Numpy arrays
- Scipy sparse matrices
The MultiFrame also gives a single dataframe builder that allows you to build a dataframe from several series.
-
append_df
(name, df, keep=True)¶
-
append_np_block
(name, array, col_names)¶
-
append_sparse
(name, matrix)¶
-
as_csr_matrix
()¶
-
as_dataframe
()¶
-
as_np_array
()¶
-
static
block_as_np_array
(blk)¶
-
col_as_series
(block, col_name)¶
-
columns
()¶
-
drop_rows
(deletion_mask)¶
-
flush_df_builder
(name)¶
-
get_block
(name)¶
-
get_df_builder
(name)¶ Helper for building a dataframe from series
-
has_df_builder
(name)¶
-
iter_blocks
(with_keep_info=False)¶
-
iter_columns
()¶
-
iter_dataframes
()¶
-
nnz
()¶
-
select_columns
(names)¶
-
set_index_from_df
(df)¶
-
shape
()¶
-
stats
()¶
-
deepinsight.doctor.multiframe.
delete_rows_csr
(mat, indices)¶ Remove the rows denoted by
indices
form the CSR sparse matrixmat
. Taken from http://stackoverflow.com/questions/13077527
-
deepinsight.doctor.multiframe.
is_series_like
(series)¶
deepinsight.doctor.notebook_builder module¶
notebook_builder.py Base classes for creating IPython notebooks
-
class
deepinsight.doctor.notebook_builder.
ClusteringNotebookBuilder
(model_name, model_date, dataset_smartname, script_steps, preparation_output_schema, split_stuff, preprocessing_params, pre_train, post_train)¶ Bases:
deepinsight.doctor.notebook_builder.NotebookBuilder
-
context
()¶
-
is_supervized
()¶
-
template_name
()¶
-
title
()¶
-
-
class
deepinsight.doctor.notebook_builder.
NotebookBuilder
¶ Bases:
object
-
algorithm
¶
-
categorical_preprocessing_context
()¶
-
context
()¶
-
create_notebook
()¶
-
handle_missing_context
()¶
-
is_supervized
()¶
-
rescale_context
()¶
-
template
()¶
-
template_name
()¶
-
text_preprocessing_context
()¶
-
title
()¶
-
-
class
deepinsight.doctor.notebook_builder.
PredictionNotebookBuilder
(model_name, model_date, dataset_smartname, script_steps, preparation_output_schema, split_stuff, core_params, preprocessing_params, pre_train, post_train)¶ Bases:
deepinsight.doctor.notebook_builder.NotebookBuilder
-
categorical_preprocessing_context
()¶
-
context
()¶
-
is_supervized
()¶
-
prediction_type
¶
-
target_variable
¶
-
template_name
()¶
-
title
()¶
-
-
deepinsight.doctor.notebook_builder.
code_cell
(code)¶
-
deepinsight.doctor.notebook_builder.
comment_cell
(comment)¶
-
deepinsight.doctor.notebook_builder.
extract_input_columns
(preprocessing_params, with_target=False, with_profiling=True)¶
-
deepinsight.doctor.notebook_builder.
header_cell
(msg=None, level=1)¶
-
deepinsight.doctor.notebook_builder.
parse_cells_from_render
(content)¶
deepinsight.doctor.prediction_entrypoints module¶
-
deepinsight.doctor.prediction_entrypoints.
prediction_train_model_keras
(transformed_normal, train_df, test_df, pipeline, modeling_params, core_params, per_feature, run_folder, listener, update_fn, target_map, generated_features_mapping, save_model=True)¶ Fit a CLF on Keras, save it, computes intrinsic scores, writes them, scores a test set it, write scores and extrinsinc perf
-
deepinsight.doctor.prediction_entrypoints.
prediction_train_model_kfold
(full_df_clean, core_params, split_desc, preprocessing_params, optimized_params, pp_folder, m_folder, listener, update_fn, with_sample_weight, with_class_weight, calibrate_proba=False)¶
-
deepinsight.doctor.prediction_entrypoints.
prediction_train_score_save
(transformed_train, transformed_test, test_df_index, core_params, split_desc, modeling_params, run_folder, listener, target_map, update_fn, pipeline, m_folder)¶ Fit a CLF, save it, computes intrinsic scores, writes them, scores a test set it, write scores and extrinsinc perf
-
deepinsight.doctor.prediction_entrypoints.
prediction_train_score_save_ensemble
(train, test, core_params, split_desc, modeling_params, run_folder, listener, target_map, update_fn, pipeline, with_sample_weight)¶ Fit a CLF, save it, computes intrinsic scores, writes them, scores a test set it, write scores and extrinsinc perf
deepinsight.doctor.preprocessing_collector module¶
Perform the initial feature analysis that will drive the actual preprocessor for prediction Takes the preprocessing params and the train dataframe and outputs the feature analysis data.
-
class
deepinsight.doctor.preprocessing_collector.
ClusteringPreprocessingDataCollector
(train_df, preprocessing_params)¶ Bases:
deepinsight.doctor.preprocessing_collector.PreprocessingDataCollector
-
feature_needs_analysis
(params)¶ params is the params object from preprocessing params
-
-
class
deepinsight.doctor.preprocessing_collector.
PredictionPreprocessingDataCollector
(train_df, preprocessing_params)¶ Bases:
deepinsight.doctor.preprocessing_collector.PreprocessingDataCollector
-
feature_needs_analysis
(params)¶ params is the params object from preprocessing params
-
-
class
deepinsight.doctor.preprocessing_collector.
PreprocessingDataCollector
(train_df, preprocessing_params)¶ Bases:
object
-
build
()¶
-
get_feature_analysis_data
(name, params)¶ Analyzes a single feature (preprocessing params -> feature analysis data) params is the preprocessing params for this feature.
It must contain: - name, type, role (role_reason) - missing_handling, missing_impute_with, category_handling, rescaling
-
deepinsight.doctor.preprocessing_handler module¶
-
class
deepinsight.doctor.preprocessing_handler.
BinaryClassificationPreprocessingHandler
(core_params, preprocessing_params, data_path)¶ Bases:
deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler
-
target_map
¶
-
-
class
deepinsight.doctor.preprocessing_handler.
ClusteringPreprocessingHandler
(core_params, preprocessing_params, data_path)¶ Bases:
deepinsight.doctor.preprocessing_handler.PreprocessingHandler
Build the preprocessing pipeline for clustering projects
Clustering preprocessing is especially difficult from misc reasons, we need to keep track of the multiframe at different state of its processing :
- train
The model used for clustering performs on preprocessed INPUT columns, on which we may or may not remove outliers, and may or may not apply a PCA.
- TRAIN
- profiling
Columns that are not actually INPUT should still be preprocessed (e.g. Dummified) in order to compute different statistics on the the different values. Such columns have a role called “PROFILING”.
Dataframe preprocessed, (including PROFILING columns)
- PREPROCESSED
- feature importance
Feature importance is done by making a classification on the variables. In order to have its result human readable, we need to do this analysis on prepca values.
- TRAIN_PREPCA
- outliers
The outliers labels is used to make sure we can reannotated the initial datasets (for feature importance and profiling)
- OUTLIERS
-
preprocessing_steps
()¶
-
class
deepinsight.doctor.preprocessing_handler.
MulticlassPreprocessingHandler
(core_params, preprocessing_params, data_path)¶ Bases:
deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler
-
target_map
¶
-
-
class
deepinsight.doctor.preprocessing_handler.
PredictionPreprocessingHandler
(core_params, preprocessing_params, data_path)¶ Bases:
deepinsight.doctor.preprocessing_handler.PreprocessingHandler
-
has_sample_weight_variable
¶
-
preprocessing_steps
(with_target=False, verbose=True, allow_empty_mf=False)¶
-
sample_weight_variable
¶
-
set_selection_state_folder
(selection_state_folder)¶
-
target_map
¶
-
weight_map
¶
-
-
class
deepinsight.doctor.preprocessing_handler.
PreprocessingHandler
(core_params, preprocessing_params, data_path)¶ Bases:
object
Manager class for the preprocessing
-
static
build
(core_params, preprocessing_params, data_path)¶ Build the proper type of preprocessing handling depending on the preprocessing params
-
build_preprocessing_pipeline
(*args, **kwargs)¶
-
get_impact_coder
(column)¶
-
get_pca_resource
()¶
-
get_resource
(resource_name, type)¶ Resources are just dictionaries either: - pickled in a .pkl named after their resource name. - dumped to a .json named after their resource name
-
get_texthash_svd_data
(column)¶
-
input_columns
(with_target=True, with_profiling=True)¶ Return the list of input features.
Can help limit RAM usage, by giving that to get_dataframe.
(includes profiling columns)
-
open
(relative_filepath, *args, **kargs)¶ open a file relatively to self.folder_path
-
prediction_type
¶
-
preprocessing_steps
(verbose=True, **kwargs)¶
-
report
(pipeline)¶
-
save_data
()¶
-
target_variable
¶
-
static
-
class
deepinsight.doctor.preprocessing_handler.
RegressionPreprocessingHandler
(core_params, preprocessing_params, data_path)¶ Bases:
deepinsight.doctor.preprocessing_handler.PredictionPreprocessingHandler
-
target_map
¶
-
-
deepinsight.doctor.preprocessing_handler.
extract_input_columns
(preprocessing_params, with_target=False, with_profiling=True, with_sample_weight=False)¶
-
deepinsight.doctor.preprocessing_handler.
get_rescaler
(in_block, column_name, column_params, column_collector)¶ Build a rescaler for the original column
-
deepinsight.doctor.preprocessing_handler.
load_relfilepath
(basepath, relative_filepath)¶ Returns None if the file does not exists
deepinsight.doctor.server module¶
Main doctor entry point. This is a HTTP server which receives commands from the AnalysisMLKernel Java class
-
deepinsight.doctor.server.
serve
(port, secret)¶