Isolation Distribution Kernel
ikpykit.kernel.IsoDisKernel ¶
IsoDisKernel(
method="anne",
n_estimators=200,
max_samples="auto",
random_state=None,
)
Bases: BaseEstimator
, TransformerMixin
Isolation Distributional Kernel is a new way to measure the similarity between two distributions.
It addresses two key issues of kernel mean embedding, where the kernel employed has: (i) a feature map with intractable dimensionality which leads to high computational cost; and (ii) data independency which leads to poor detection accuracy in anomaly detection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
method |
str
|
The method to compute the isolation kernel feature. The available methods are: |
"anne"
|
n_estimators |
int
|
The number of base estimators in the ensemble. |
200
|
max_samples |
int
|
The number of samples to draw from X to train each base estimator.
|
"auto"
|
random_state |
int, RandomState instance or None
|
Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest. Pass an int for reproducible results across multiple function calls.
See :term: |
None
|
References
.. [1] Kai Ming Ting, Bi-Cun Xu, Takashi Washio, and Zhi-Hua Zhou. 2020. "Isolation Distributional Kernel: A New Tool for Kernel based Anomaly Detection". In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). Association for Computing Machinery, New York, NY, USA, 198-206.
Examples:
>>> from ikpykit.kernel import IsoDisKernel
>>> import numpy as np
>>> X = [[0.4,0.3], [0.3,0.8], [0.5,0.4], [0.5,0.1]]
>>> idk = IsoDisKernel(max_samples=3,).fit(X)
>>> D_i = [[0.4,0.3], [0.3,0.8]]
>>> D_j = [[0.5, 0.4], [0.5, 0.1]]
>>> idk.similarity(D_j, D_j)
1.0
Source code in ikpykit/kernel/_isodiskernel.py
70 71 72 73 74 75 76 |
|
fit ¶
fit(X)
Fit the model on data X.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
np.array of shape (n_samples, n_features)
|
The input instances. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
|
Source code in ikpykit/kernel/_isodiskernel.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
kernel_mean ¶
kernel_mean(X)
Compute the kernel mean embedding of X.
Source code in ikpykit/kernel/_isodiskernel.py
96 97 98 99 100 |
|
similarity ¶
similarity(D_i, D_j, is_normalize=True)
Compute the isolation distribution kernel of D_i and D_j.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
D_i |
The input instances. |
required | |
D_j |
The input instances. |
required | |
is_normalize |
|
True
|
Returns:
Type | Description |
---|---|
The Isolation distribution similarity of given two dataset.
|
|
Source code in ikpykit/kernel/_isodiskernel.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
transform ¶
transform(D_i, D_j)
Compute the isolation kernel feature of D_i and D_j.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
D_i |
The input instances. |
required | |
D_j |
The input instances. |
required |
Returns:
Type | Description |
---|---|
The finite binary features based on the kernel feature map.
|
|
The features are organised as a n_instances by psi*t matrix.
|
|
Source code in ikpykit/kernel/_isodiskernel.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|