deepinsight.doctor.utils package¶
Submodules¶
deepinsight.doctor.utils.calibration module¶
-
deepinsight.doctor.utils.calibration.dt_calibration_curve(y_true, y_prob, sample_weight=None, n_bins=10, pos_label=None)¶
-
deepinsight.doctor.utils.calibration.dt_calibration_loss(freqs, avg_preds, weights, reducer='sum', normalize=True)¶
deepinsight.doctor.utils.crossval module¶
deepinsight.doctor.utils.dataframe_cache module¶
-
deepinsight.doctor.utils.dataframe_cache.clear_cache()¶
-
deepinsight.doctor.utils.dataframe_cache.get_dataframe(dataset, *args, **kwargs)¶
-
deepinsight.doctor.utils.dataframe_cache.hashablify(c)¶
deepinsight.doctor.utils.interrupt_optimization module¶
-
deepinsight.doctor.utils.interrupt_optimization.create_interrupt_file()¶
-
deepinsight.doctor.utils.interrupt_optimization.must_interrupt()¶
-
deepinsight.doctor.utils.interrupt_optimization.set_before_interrupt_check_callback(new_callback)¶
-
deepinsight.doctor.utils.interrupt_optimization.set_interrupt_folder(folder_p)¶
deepinsight.doctor.utils.lift_curve module¶
deepinsight.doctor.utils.listener module¶
-
class
deepinsight.doctor.utils.listener.ExitState(listener)¶ Bases:
object
-
class
deepinsight.doctor.utils.listener.ProgressListener(verbose=True)¶ Bases:
object-
add_future_step(name)¶
-
add_future_steps(names)¶
-
pop_state()¶
-
push_state(name, target=None)¶
-
reset()¶
-
set_current_progress(progress)¶
-
to_jsonifiable()¶
-
-
deepinsight.doctor.utils.listener.unix_time_millis()¶
deepinsight.doctor.utils.metrics module¶
-
deepinsight.doctor.utils.metrics.check_test_set_ok_for_classification(y_true)¶
-
deepinsight.doctor.utils.metrics.log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None)¶ Log loss, aka logistic loss or cross-entropy loss.
sk-learn version is bugged when a class never appears in the predictions.
-
deepinsight.doctor.utils.metrics.log_odds(array, clip_min=0.0, clip_max=1.0)¶ Compute the log odd of each elements of a numpy array logodd = p / (1-p) with p a probability :param array: (numpy array) :param clip_min: (float) minimum value :param clip_max: (float) maximum value :return: a numpy array with the same dimension as input array
-
deepinsight.doctor.utils.metrics.mcalibration_loss(y_true, y_pred, sample_weight=None)¶
-
deepinsight.doctor.utils.metrics.mean_absolute_percentage_error(y_true, y_pred, sample_weight=None)¶
-
deepinsight.doctor.utils.metrics.mroc_auc_score(y_true, y_predictions, sample_weight=None)¶ Returns a auc score. Handles multi-class
For multi-class, the AUC score is in fact the MAUC score described in
David J. Hand and Robert J. Till. 2001. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Mach. Learn. 45, 2 (October 2001), 171-186. DOI=10.1023/A:1010920819831
-
deepinsight.doctor.utils.metrics.rmse_score(y, y_pred, sample_weight=None)¶ Root Mean Square Error, more readable than MSE
-
deepinsight.doctor.utils.metrics.rmsle_score(y, y_pred, sample_weight=None)¶ Root Mean Square Logarithmic Error https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError
deepinsight.doctor.utils.split module¶
-
deepinsight.doctor.utils.split.df_from_split_desc(split_desc, split, feature_params, prediction_type=None)¶
-
deepinsight.doctor.utils.split.df_from_split_desc_no_normalization(split_desc, split, feature_params, prediction_type=None)¶
deepinsight.doctor.utils.subsampler module¶
-
class
deepinsight.doctor.utils.subsampler.Subsampler(df, variable, sampling_type='stratified', ratio=0.1)¶ Bases:
object-
balanced_subsampling()¶ Subsample targetting the representation of clusters in a scatter plot. This has really no statistical property whatsoever.
Proper stratified subsampling may lead to cluster with too few sample to be visible.
This method tries a same number of points for each class.
The number of rows outputted is ‘about’ ratio * nb_rows.
# TODO we may want to change this code to # make big cluster actually look big.
-
cluster_sampling()¶ Sample on the categories itself.
Select a proportion (prop) of the categories.
-
run()¶
-
stratified_forced_subsampling()¶ Pick samples from each category proportionally, but force a minimal sample size per category.
-
stratified_subsampling()¶ Pick samples from each category proportionally.
-
-
deepinsight.doctor.utils.subsampler.subsample(df, variable, sampling_type='stratified', ratio=0.1)¶
Module contents¶
-
deepinsight.doctor.utils.datetime_to_epoch(series)¶
-
deepinsight.doctor.utils.dt_isnan(val)¶ Safe isnan that accepts non-numeric
-
deepinsight.doctor.utils.dt_nonan(val)¶ Replaces numerical NaNs by None
-
deepinsight.doctor.utils.dt_nonaninf(val)¶ Replaces numerical NaNs and Inf by None
-
deepinsight.doctor.utils.dt_write_mode_for_pickling()¶
-
deepinsight.doctor.utils.make_running_traininfo(folder, start_time, listener)¶
-
deepinsight.doctor.utils.merge_listeners(plistener, mlistener)¶
-
deepinsight.doctor.utils.ml_dtype_from_deepinsight_column(schema_column, feature_type, feature_role, prediction_type=None)¶
-
deepinsight.doctor.utils.ml_dtypes_from_deepinsight_schema(schema, params, prediction_type=None)¶
-
deepinsight.doctor.utils.normalize_dataframe(df, params, missing_columns='ERROR')¶ Normalizes a dataframe so that it can be used as input for a preprocessing pipeline. You should not have to add anything here …
- Does 2 things:
- Add missing columns (for API node)
- Converts datetime to epoch
-
deepinsight.doctor.utils.remove_all_nan(obj)¶ Removes all nan values from an object, recursively. No thanks to the stupid JSON spec
-
deepinsight.doctor.utils.strip_accents(s)¶
-
deepinsight.doctor.utils.update_gridsearch_info(folder, grid_search_scores)¶
-
deepinsight.doctor.utils.write_done_traininfo(folder, start_time, start_training_time, end_time, listener, end_preprocessing_time=None)¶
-
deepinsight.doctor.utils.write_model_status(modeling_set, status)¶
-
deepinsight.doctor.utils.write_preproc_file(run_folder, filename, obj)¶
-
deepinsight.doctor.utils.write_running_traininfo(folder, start_time, listener)¶