Parallelize the fit and decision_function methods of FeatureBagging (!197) · Merge requests · Yue Zhao / pyod

Open Shihab Shahriar Khan requested to merge github/fork/Shihab-Shahriar/parallel_feat_bag into development May 24, 2020

This PR Parallelize the fit and decision_function methods of FeatureBagging. The earlier implementation only used the n_jobs when base_estimator parameter is None. Apart from fixing that, the model level PR enables parallelism at more coarser level, thereby noticeably improving performance.

Benchmark results using n_estimators=20 and base_estimator=None, averaged over 3 runs. Values indicate fit time in seconds, the one inside bracket denote time for decision_function:

Dataset (shape)	Orig (n_jobs=1)	Orig (n_jobs=4)	This PR (n_jobs=4)
pima (768, 8)	0.19 (0.094)	2.30 (2.155)	0.64 (0.63)
vowels (1456, 12)	0.71 (0.42)	2.36 (2.17)	0.66 (0.64)
pendigits (6870, 16)	9.12 (5.02)	5.87 (4.32)	1.78 (1.42)
musk (3062, 166)	18.92 (8.32)	7.46 (5.88)	3.90 (2.79)
shuttle (49097, 9)	59.09 (38.67)	46.10 (28.11)	33.43 (18.01)

Performance can be slightly worse than single-process method for smaller datasets, but I think that is expected.

Please let me know if further changes are needed. Thanks.