IForest
ikpykit.anomaly.IForest ¶
IForest(
n_estimators=100,
max_samples="auto",
contamination=0.1,
max_features=1.0,
bootstrap=False,
n_jobs=1,
random_state=None,
verbose=0,
)
Bases: OutlierMixin
, BaseEstimator
Wrapper of scikit-learn Isolation Forest for anomaly detection.
The IsolationForest 'isolates' observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.
Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node.
This path length, averaged over a forest of such random trees, is a measure of normality and our decision function. Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_estimators |
int
|
The number of base estimators (trees) in the ensemble. |
100
|
max_samples |
int or float
|
The number of samples to draw from X to train each base estimator.
- If int, then draw |
"auto"
|
contamination |
float or auto
|
The proportion of outliers in the data set. Used to define the threshold on the scores of the samples. - If 'auto', the threshold is determined as in the original paper. - If float, the contamination should be in the range (0, 0.5]. |
0.1
|
max_features |
int or float
|
The number of features to draw from X to train each base estimator.
- If int, then draw |
1.0
|
bootstrap |
bool
|
If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. |
False
|
n_jobs |
int
|
The number of jobs to run in parallel for both |
1
|
random_state |
int, RandomState instance or None
|
Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest. Pass an int for reproducible results across multiple function calls. |
None
|
verbose |
int
|
Controls the verbosity of the tree building process. |
0
|
Attributes:
Name | Type | Description |
---|---|---|
detector_ |
IsolationForest
|
The underlying scikit-learn IsolationForest object. |
is_fitted_ |
bool
|
Indicates whether the estimator has been fitted. |
References
.. [1] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). "Isolation forest." In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.
.. [2] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 1-39.
Examples:
>>> from ikpykit.anomaly import IForest
>>> import numpy as np
>>> X = np.array([[-1.1, 0.2], [0.3, 0.5], [0.5, 1.1], [100, 90]])
>>> clf = IForest(contamination=0.25).fit(X)
>>> clf.predict([[0.1, 0.3], [0, 0.7], [90, 85]])
array([ 1, 1, -1])
Source code in ikpykit/anomaly/_iforest.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
fit ¶
fit(X, y=None)
Fit the isolation forest model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features)
|
The input samples. Use |
required |
y |
Ignored
|
Not used, present for API consistency by convention. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Fitted estimator. |
Source code in ikpykit/anomaly/_iforest.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
predict ¶
predict(X)
Predict if a particular sample is an outlier or not.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features)
|
The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
is_inlier |
ndarray of shape (n_samples,)
|
The predicted labels. +1 for inliers, -1 for outliers. |
Source code in ikpykit/anomaly/_iforest.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
decision_function ¶
decision_function(X)
Compute the anomaly score for each sample.
The anomaly score of an input sample is computed as the mean anomaly score of the trees in the forest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features)
|
The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
scores |
ndarray of shape (n_samples,)
|
The anomaly score of the input samples. The lower, the more abnormal. Negative scores represent outliers, positive scores represent inliers. |
Source code in ikpykit/anomaly/_iforest.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
score_samples ¶
score_samples(X)
Return the raw anomaly score of samples.
The anomaly score of an input sample is computed as the mean anomaly score of the trees in the forest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features)
|
The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
scores |
ndarray of shape (n_samples,)
|
The raw anomaly score of the input samples. The lower, the more abnormal. |
Source code in ikpykit/anomaly/_iforest.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
|