deepinsight.doctor.utils package¶
Submodules¶
deepinsight.doctor.utils.calibration module¶
-
deepinsight.doctor.utils.calibration.
dt_calibration_curve
(y_true, y_prob, sample_weight=None, n_bins=10, pos_label=None)¶
-
deepinsight.doctor.utils.calibration.
dt_calibration_loss
(freqs, avg_preds, weights, reducer='sum', normalize=True)¶
deepinsight.doctor.utils.crossval module¶
deepinsight.doctor.utils.dataframe_cache module¶
-
deepinsight.doctor.utils.dataframe_cache.
clear_cache
()¶
-
deepinsight.doctor.utils.dataframe_cache.
get_dataframe
(dataset, *args, **kwargs)¶
-
deepinsight.doctor.utils.dataframe_cache.
hashablify
(c)¶
deepinsight.doctor.utils.interrupt_optimization module¶
-
deepinsight.doctor.utils.interrupt_optimization.
create_interrupt_file
()¶
-
deepinsight.doctor.utils.interrupt_optimization.
must_interrupt
()¶
-
deepinsight.doctor.utils.interrupt_optimization.
set_before_interrupt_check_callback
(new_callback)¶
-
deepinsight.doctor.utils.interrupt_optimization.
set_interrupt_folder
(folder_p)¶
deepinsight.doctor.utils.lift_curve module¶
deepinsight.doctor.utils.listener module¶
-
class
deepinsight.doctor.utils.listener.
ExitState
(listener)¶ Bases:
object
-
class
deepinsight.doctor.utils.listener.
ProgressListener
(verbose=True)¶ Bases:
object
-
add_future_step
(name)¶
-
add_future_steps
(names)¶
-
pop_state
()¶
-
push_state
(name, target=None)¶
-
reset
()¶
-
set_current_progress
(progress)¶
-
to_jsonifiable
()¶
-
-
deepinsight.doctor.utils.listener.
unix_time_millis
()¶
deepinsight.doctor.utils.metrics module¶
-
deepinsight.doctor.utils.metrics.
check_test_set_ok_for_classification
(y_true)¶
-
deepinsight.doctor.utils.metrics.
log_loss
(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None)¶ Log loss, aka logistic loss or cross-entropy loss.
sk-learn version is bugged when a class never appears in the predictions.
-
deepinsight.doctor.utils.metrics.
log_odds
(array, clip_min=0.0, clip_max=1.0)¶ Compute the log odd of each elements of a numpy array logodd = p / (1-p) with p a probability :param array: (numpy array) :param clip_min: (float) minimum value :param clip_max: (float) maximum value :return: a numpy array with the same dimension as input array
-
deepinsight.doctor.utils.metrics.
mcalibration_loss
(y_true, y_pred, sample_weight=None)¶
-
deepinsight.doctor.utils.metrics.
mean_absolute_percentage_error
(y_true, y_pred, sample_weight=None)¶
-
deepinsight.doctor.utils.metrics.
mroc_auc_score
(y_true, y_predictions, sample_weight=None)¶ Returns a auc score. Handles multi-class
For multi-class, the AUC score is in fact the MAUC score described in
David J. Hand and Robert J. Till. 2001. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Mach. Learn. 45, 2 (October 2001), 171-186. DOI=10.1023/A:1010920819831
-
deepinsight.doctor.utils.metrics.
rmse_score
(y, y_pred, sample_weight=None)¶ Root Mean Square Error, more readable than MSE
-
deepinsight.doctor.utils.metrics.
rmsle_score
(y, y_pred, sample_weight=None)¶ Root Mean Square Logarithmic Error https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError
deepinsight.doctor.utils.split module¶
-
deepinsight.doctor.utils.split.
df_from_split_desc
(split_desc, split, feature_params, prediction_type=None)¶
-
deepinsight.doctor.utils.split.
df_from_split_desc_no_normalization
(split_desc, split, feature_params, prediction_type=None)¶
deepinsight.doctor.utils.subsampler module¶
-
class
deepinsight.doctor.utils.subsampler.
Subsampler
(df, variable, sampling_type='stratified', ratio=0.1)¶ Bases:
object
-
balanced_subsampling
()¶ Subsample targetting the representation of clusters in a scatter plot. This has really no statistical property whatsoever.
Proper stratified subsampling may lead to cluster with too few sample to be visible.
This method tries a same number of points for each class.
The number of rows outputted is ‘about’ ratio * nb_rows.
# TODO we may want to change this code to # make big cluster actually look big.
-
cluster_sampling
()¶ Sample on the categories itself.
Select a proportion (prop) of the categories.
-
run
()¶
-
stratified_forced_subsampling
()¶ Pick samples from each category proportionally, but force a minimal sample size per category.
-
stratified_subsampling
()¶ Pick samples from each category proportionally.
-
-
deepinsight.doctor.utils.subsampler.
subsample
(df, variable, sampling_type='stratified', ratio=0.1)¶
Module contents¶
-
deepinsight.doctor.utils.
datetime_to_epoch
(series)¶
-
deepinsight.doctor.utils.
dt_isnan
(val)¶ Safe isnan that accepts non-numeric
-
deepinsight.doctor.utils.
dt_nonan
(val)¶ Replaces numerical NaNs by None
-
deepinsight.doctor.utils.
dt_nonaninf
(val)¶ Replaces numerical NaNs and Inf by None
-
deepinsight.doctor.utils.
dt_write_mode_for_pickling
()¶
-
deepinsight.doctor.utils.
make_running_traininfo
(folder, start_time, listener)¶
-
deepinsight.doctor.utils.
merge_listeners
(plistener, mlistener)¶
-
deepinsight.doctor.utils.
ml_dtype_from_deepinsight_column
(schema_column, feature_type, feature_role, prediction_type=None)¶
-
deepinsight.doctor.utils.
ml_dtypes_from_deepinsight_schema
(schema, params, prediction_type=None)¶
-
deepinsight.doctor.utils.
normalize_dataframe
(df, params, missing_columns='ERROR')¶ Normalizes a dataframe so that it can be used as input for a preprocessing pipeline. You should not have to add anything here …
- Does 2 things:
- Add missing columns (for API node)
- Converts datetime to epoch
-
deepinsight.doctor.utils.
remove_all_nan
(obj)¶ Removes all nan values from an object, recursively. No thanks to the stupid JSON spec
-
deepinsight.doctor.utils.
strip_accents
(s)¶
-
deepinsight.doctor.utils.
update_gridsearch_info
(folder, grid_search_scores)¶
-
deepinsight.doctor.utils.
write_done_traininfo
(folder, start_time, start_training_time, end_time, listener, end_preprocessing_time=None)¶
-
deepinsight.doctor.utils.
write_model_status
(modeling_set, status)¶
-
deepinsight.doctor.utils.
write_preproc_file
(run_folder, filename, obj)¶
-
deepinsight.doctor.utils.
write_running_traininfo
(folder, start_time, listener)¶