Skip to content

Isolation Kernel

ikpykit.kernel.IsoKernel

IsoKernel(
    method="anne",
    n_estimators=200,
    max_samples="auto",
    random_state=None,
)

Bases: TransformerMixin, BaseEstimator

Isolation Kernel.

Build Isolation Kernel feature vector representations via the feature map for a given dataset.

Isolation kernel is a data dependent kernel measure that is adaptive to local data distribution and has more flexibility in capturing the characteristics of the local data distribution. It has been shown promising performance on density and distance-based classification and clustering problems.

Parameters:

Name Type Description Default
method str

The method to compute the isolation kernel feature. The available methods are: anne, inne, and iforest.

"anne"
n_estimators int

The number of base estimators in the ensemble.

200
max_samples int

The number of samples to draw from X to train each base estimator.

- If int, then draw `max_samples` samples.
- If float, then draw `max_samples` * X.shape[0]` samples.
- If "auto", then `max_samples=min(8, n_samples)`.
"auto"
random_state int, RandomState instance or None

Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

Pass an int for reproducible results across multiple function calls. See :term:Glossary <random_state>.

None
References

.. [1] Qin, X., Ting, K.M., Zhu, Y. and Lee, V.C. "Nearest-neighbour-induced isolation similarity and its impact on density-based clustering". In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, July, pp. 4755-4762

Examples:

>>> from ikpykit.kernel import IsoKernel
>>> import numpy as np
>>> X = [[0.4,0.3], [0.3,0.8], [0.5, 0.4], [0.5, 0.1]]
>>> ik = IsoKernel().fit(X)
>>> X_trans = ik.transform(X)
>>> X_sim = ik.similarity(X)
Source code in ikpykit/kernel/_isokernel.py
75
76
77
78
79
80
81
def __init__(
    self, method="anne", n_estimators=200, max_samples="auto", random_state=None
) -> None:
    self.n_estimators = n_estimators
    self.max_samples = max_samples
    self.random_state = random_state
    self.method = method

fit

fit(X, y=None)

Fit the model on data X.

Parameters:

Name Type Description Default
X np.array of shape (n_samples, n_features)

The input instances.

required

Returns:

Name Type Description
self object
Source code in ikpykit/kernel/_isokernel.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def fit(self, X, y=None):
    """Fit the model on data X.
    Parameters
    ----------
    X : np.array of shape (n_samples, n_features)
        The input instances.
    Returns
    -------
    self : object
    """

    X = check_array(X)
    n_samples = X.shape[0]
    if isinstance(self.max_samples, str):
        if self.max_samples == "auto":
            max_samples = min(16, n_samples)
        else:
            raise ValueError(
                "max_samples (%s) is not supported."
                'Valid choices are: "auto", int or'
                "float" % self.max_samples
            )
    elif isinstance(self.max_samples, numbers.Integral):
        if self.max_samples > n_samples:
            warn(
                "max_samples (%s) is greater than the "
                "total number of samples (%s). max_samples "
                "will be set to n_samples for estimation."
                % (self.max_samples, n_samples)
            )
            max_samples = n_samples
        else:
            max_samples = self.max_samples
    else:  # float
        if not 0.0 < self.max_samples <= 1.0:
            raise ValueError(
                "max_samples must be in (0, 1], got %r" % self.max_samples
            )
        max_samples = int(self.max_samples * X.shape[0])
    self.max_samples_ = max_samples

    if self.method == "anne":
        self.iso_kernel_ = IK_ANNE(
            self.n_estimators, self.max_samples_, self.random_state
        )
    elif self.method == "inne":
        self.iso_kernel_ = IK_INNE(
            self.n_estimators, self.max_samples_, self.random_state
        )
    elif self.method == "iforest":
        self.iso_kernel_ = IK_IForest(
            self.n_estimators, self.max_samples_, self.random_state
        )
    else:
        raise ValueError(
            "method (%s) is not supported."
            'Valid choices are: "anne", "inne" or "iforest"' % self.method
        )

    self.iso_kernel_.fit(X)
    self.is_fitted_ = True
    return self

similarity

similarity(X, dense_output=True)

Compute the isolation kernel similarity matrix of X.

Parameters:

Name Type Description Default
X

The input instances.

required
dense_output

Whether to return dense matrix of output.

True

Returns:

Type Description
The simalarity matrix are organised as a n_instances * n_instances matrix.
Source code in ikpykit/kernel/_isokernel.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
def similarity(self, X, dense_output=True):
    """Compute the isolation kernel similarity matrix of X.
    Parameters
    ----------
    X: array-like of shape (n_instances, n_features)
        The input instances.
    dense_output: bool, default=True
        Whether to return dense matrix of output.
    Returns
    -------
    The simalarity matrix are organised as a n_instances * n_instances matrix.
    """
    check_is_fitted(self)
    X = check_array(X)
    embed_X = self.transform(X)
    return (
        safe_sparse_dot(embed_X, embed_X.T, dense_output=dense_output)
        / self.n_estimators
    )

transform

transform(X, dense_output=False)

Compute the isolation kernel feature of X.

Parameters:

Name Type Description Default
X

The input instances.

required
dense_output

Whether to return dense matrix of output.

False

Returns:

Type Description
The finite binary features based on the kernel feature map.
The features are organised as a n_instances by psi*t matrix.
Source code in ikpykit/kernel/_isokernel.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
def transform(self, X, dense_output=False):
    """Compute the isolation kernel feature of X.
    Parameters
    ----------
    X: array-like of shape (n_instances, n_features)
        The input instances.
    dense_output: bool, default=False
        Whether to return dense matrix of output.
    Returns
    -------
    The finite binary features based on the kernel feature map.
    The features are organised as a n_instances by psi*t matrix.
    """

    check_is_fitted(self)
    X = check_array(X)
    X_trans = self.iso_kernel_.transform(X)
    if dense_output:
        if sp.issparse(X_trans) and hasattr(X_trans, "toarray"):
            return X_trans.toarray()
        else:
            warn("The IsoKernel transform output is already dense.")
    return X_trans