deepinsight.doctor.utils package¶

Submodules¶

deepinsight.doctor.utils.calibration module¶

deepinsight.doctor.utils.calibration.dt_calibration_curve(y_true, y_prob, sample_weight=None, n_bins=10, pos_label=None)¶

deepinsight.doctor.utils.calibration.dt_calibration_loss(freqs, avg_preds, weights, reducer='sum', normalize=True)¶

deepinsight.doctor.utils.crossval module¶

class deepinsight.doctor.utils.crossval.DKULeaveOneGroupOut(column_name)¶

Bases: object

get_n_splits(X, y, groups=None)¶

set_column_labels(column_labels)¶

split(X, y, groups=None)¶

class deepinsight.doctor.utils.crossval.DKULeavePGroupsOut(column_name, p)¶

Bases: object

get_n_splits(X, y, groups=None)¶

set_column_labels(column_labels)¶

split(X, y, groups=None)¶

deepinsight.doctor.utils.dataframe_cache module¶

deepinsight.doctor.utils.dataframe_cache.clear_cache()¶

deepinsight.doctor.utils.dataframe_cache.get_dataframe(dataset, *args, **kwargs)¶

deepinsight.doctor.utils.dataframe_cache.hashablify(c)¶

deepinsight.doctor.utils.interrupt_optimization module¶

deepinsight.doctor.utils.interrupt_optimization.create_interrupt_file()¶

deepinsight.doctor.utils.interrupt_optimization.must_interrupt()¶

deepinsight.doctor.utils.interrupt_optimization.set_before_interrupt_check_callback(new_callback)¶

deepinsight.doctor.utils.interrupt_optimization.set_interrupt_folder(folder_p)¶

deepinsight.doctor.utils.lift_curve module¶

class deepinsight.doctor.utils.lift_curve.LiftBuilder(data, actual, predicted, with_weight=False)¶

Bases: object

Builds the data for lift curves

build()¶

deepinsight.doctor.utils.listener module¶

class deepinsight.doctor.utils.listener.ExitState(listener)¶: Bases: object

class deepinsight.doctor.utils.listener.ProgressListener(verbose=True)¶

Bases: object

add_future_step(name)¶

add_future_steps(names)¶

pop_state()¶

push_state(name, target=None)¶

reset()¶

set_current_progress(progress)¶

to_jsonifiable()¶

deepinsight.doctor.utils.listener.unix_time_millis()¶

deepinsight.doctor.utils.magic_main module¶

deepinsight.doctor.utils.magic_main.magic_main(main)¶

deepinsight.doctor.utils.metrics module¶

deepinsight.doctor.utils.metrics.check_test_set_ok_for_classification(y_true)¶

deepinsight.doctor.utils.metrics.log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None)¶

Log loss, aka logistic loss or cross-entropy loss.

sk-learn version is bugged when a class never appears in the predictions.

deepinsight.doctor.utils.metrics.log_odds(array, clip_min=0.0, clip_max=1.0)¶: Compute the log odd of each elements of a numpy array logodd = p / (1-p) with p a probability :param array: (numpy array) :param clip_min: (float) minimum value :param clip_max: (float) maximum value :return: a numpy array with the same dimension as input array

deepinsight.doctor.utils.metrics.mcalibration_loss(y_true, y_pred, sample_weight=None)¶

deepinsight.doctor.utils.metrics.mean_absolute_percentage_error(y_true, y_pred, sample_weight=None)¶

deepinsight.doctor.utils.metrics.mroc_auc_score(y_true, y_predictions, sample_weight=None)¶

Returns a auc score. Handles multi-class

For multi-class, the AUC score is in fact the MAUC score described in

David J. Hand and Robert J. Till. 2001. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Mach. Learn. 45, 2 (October 2001), 171-186. DOI=10.1023/A:1010920819831

http://dx.doi.org/10.1023/A:1010920819831

deepinsight.doctor.utils.metrics.rmse_score(y, y_pred, sample_weight=None)¶: Root Mean Square Error, more readable than MSE

deepinsight.doctor.utils.metrics.rmsle_score(y, y_pred, sample_weight=None)¶: Root Mean Square Logarithmic Error https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError

deepinsight.doctor.utils.split module¶

deepinsight.doctor.utils.split.df_from_split_desc(split_desc, split, feature_params, prediction_type=None)¶

deepinsight.doctor.utils.split.df_from_split_desc_no_normalization(split_desc, split, feature_params, prediction_type=None)¶

deepinsight.doctor.utils.subsampler module¶

class deepinsight.doctor.utils.subsampler.Subsampler(df, variable, sampling_type='stratified', ratio=0.1)¶

Bases: object

balanced_subsampling()¶

Subsample targetting the representation of clusters in a scatter plot. This has really no statistical property whatsoever.

Proper stratified subsampling may lead to cluster with too few sample to be visible.

This method tries a same number of points for each class.

The number of rows outputted is ‘about’ ratio * nb_rows.

# TODO we may want to change this code to # make big cluster actually look big.

cluster_sampling()¶

Sample on the categories itself.

Select a proportion (prop) of the categories.

run()¶

stratified_forced_subsampling()¶: Pick samples from each category proportionally, but force a minimal sample size per category.

stratified_subsampling()¶: Pick samples from each category proportionally.

deepinsight.doctor.utils.subsampler.subsample(df, variable, sampling_type='stratified', ratio=0.1)¶

Module contents¶

deepinsight.doctor.utils.datetime_to_epoch(series)¶

deepinsight.doctor.utils.dt_isnan(val)¶: Safe isnan that accepts non-numeric

deepinsight.doctor.utils.dt_nonan(val)¶: Replaces numerical NaNs by None

deepinsight.doctor.utils.dt_nonaninf(val)¶: Replaces numerical NaNs and Inf by None

deepinsight.doctor.utils.dt_write_mode_for_pickling()¶

deepinsight.doctor.utils.make_running_traininfo(folder, start_time, listener)¶

deepinsight.doctor.utils.merge_listeners(plistener, mlistener)¶

deepinsight.doctor.utils.ml_dtype_from_deepinsight_column(schema_column, feature_type, feature_role, prediction_type=None)¶

deepinsight.doctor.utils.ml_dtypes_from_deepinsight_schema(schema, params, prediction_type=None)¶

deepinsight.doctor.utils.normalize_dataframe(df, params, missing_columns='ERROR')¶

Normalizes a dataframe so that it can be used as input for a preprocessing pipeline. You should not have to add anything here …

Does 2 things:

Add missing columns (for API node)
Converts datetime to epoch

deepinsight.doctor.utils.remove_all_nan(obj)¶: Removes all nan values from an object, recursively. No thanks to the stupid JSON spec

deepinsight.doctor.utils.strip_accents(s)¶

deepinsight.doctor.utils.update_gridsearch_info(folder, grid_search_scores)¶

deepinsight.doctor.utils.write_done_traininfo(folder, start_time, start_training_time, end_time, listener, end_preprocessing_time=None)¶

deepinsight.doctor.utils.write_model_status(modeling_set, status)¶

deepinsight.doctor.utils.write_preproc_file(run_folder, filename, obj)¶

deepinsight.doctor.utils.write_running_traininfo(folder, start_time, listener)¶