Skip to content

IKGAD

ikpykit.group.IKGAD

IKGAD(
    n_estimators_1=200,
    max_samples_1="auto",
    n_estimators_2=200,
    max_samples_2="auto",
    method="inne",
    contamination="auto",
    random_state=None,
)

Bases: OutlierMixin, BaseEstimator

Isolation Kernel-based Group Anomaly Detection.

IKGAD applies isolation kernel techniques to detect anomalies in groups of data points. It leverages a two-step approach: first transforming the data using an isolation kernel, then calculating kernel mean embeddings for each group to detect anomalous groups. The algorithm is effective for detecting both global and local group anomalies.

Parameters:

Name Type Description Default
n_estimators_1 int

The number of base estimators in the first-level ensemble.

200
max_samples_1 int, float, or "auto"

The number of samples to draw for training each first-level base estimator:

  • If int, draws exactly max_samples_1 samples
  • If float, draws max_samples_1 * X.shape[0] samples
  • If "auto", uses min(8, n_samples)
"auto"
n_estimators_2 int

The number of base estimators in the second-level ensemble.

200
max_samples_2 int, float, or "auto"

The number of samples to draw for training each second-level base estimator:

  • If int, draws exactly max_samples_2 samples
  • If float, draws max_samples_2 * X.shape[0] samples
  • If "auto", uses min(8, n_samples)
"auto"
method (inne, anne, auto)

Isolation method to use. The "inne" option corresponds to the approach described in the original paper.

"inne"
contamination auto or float

Proportion of outliers in the data set:

  • If "auto", the threshold is determined as in the original paper
  • If float, should be in range (0, 0.5]
"auto"
random_state int, RandomState instance or None

Controls the random seed for reproducibility.

None

Attributes:

Name Type Description
iso_kernel_1_ IsoKernel

First-level trained isolation kernel.

offset_ float

Decision threshold for outlier detection.

References

.. [1] Kai Ming Ting, Bi-Cun Xu, Washio Takashi, Zhi-Hua Zhou (2022). Isolation Distributional Kernel: A new tool for kernel based point and group anomaly detections. IEEE Transactions on Knowledge and Data Engineering.

Examples:

>>> from ikpykit.group import IKGAD
>>> import numpy as np
>>> X =[[[1.0, 1.1], [1.2, 1.3]], [[1.3, 1.2], [1.1, 1.0]], [[1.0, 1.2], [1.4, 1.3]], [[5.0, 5.1], [5.2, 5.3]]]
>>> clf = IKGAD(max_samples_1=2, max_samples_2=2, contamination=0.25, random_state=42)
>>> clf = clf.fit(X)
>>> clf.predict(X)
array([ 1,  1,  1, -1])
Source code in ikpykit/group/anomaly/_ikgad.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def __init__(
    self,
    n_estimators_1=200,
    max_samples_1="auto",
    n_estimators_2=200,
    max_samples_2="auto",
    method="inne",
    contamination="auto",
    random_state=None,
):
    self.n_estimators_1 = n_estimators_1
    self.max_samples_1 = max_samples_1
    self.n_estimators_2 = n_estimators_2
    self.max_samples_2 = max_samples_2
    self.random_state = random_state
    self.contamination = contamination
    self.method = method

fit

fit(X)

Fit the IKGAD model.

Parameters:

Name Type Description Default
X array-like of shape (n_groups, n_samples, n_features)

The input data, where n_groups is the number of groups, n_samples is the number of instances per group, and n_features is the number of features.

required

Returns:

Name Type Description
self object

Fitted estimator.

Notes

Sets the is_fitted_ attribute to True.

Source code in ikpykit/group/anomaly/_ikgad.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def fit(self, X):
    """Fit the IKGAD model.

    Parameters
    ----------
    X : array-like of shape (n_groups, n_samples, n_features)
        The input data, where n_groups is the number of groups,
        n_samples is the number of instances per group, and
        n_features is the number of features.

    Returns
    -------
    self : object
        Fitted estimator.

    Notes
    -----
    Sets the `is_fitted_` attribute to `True`.
    """
    # Validate input data
    X = check_format(X)
    # Fit the model
    self._fit(X)
    self.is_fitted_ = True

    # Set threshold
    if self.contamination != "auto":
        if not (0.0 < self.contamination <= 0.5):
            raise ValueError(
                f"contamination must be in (0, 0.5], got: {self.contamination}"
            )
        # Define threshold based on contamination parameter
        self.offset_ = np.percentile(
            self.score_samples(X), 100.0 * self.contamination
        )
    else:
        # Use default threshold as described in the original paper
        self.offset_ = -0.5

    return self

predict

predict(X)

Predict if groups are outliers or inliers.

Parameters:

Name Type Description Default
X array-like of shape (n_groups, n_samples, n_features)

The input groups to evaluate

required

Returns:

Name Type Description
is_inlier ndarray of shape (n_groups,)

For each group, returns whether it is an inlier (+1) or outlier (-1) according to the fitted model.

Source code in ikpykit/group/anomaly/_ikgad.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
def predict(self, X):
    """Predict if groups are outliers or inliers.

    Parameters
    ----------
    X : array-like of shape (n_groups, n_samples, n_features)
        The input groups to evaluate

    Returns
    -------
    is_inlier : ndarray of shape (n_groups,)
        For each group, returns whether it is an inlier (+1) or
        outlier (-1) according to the fitted model.
    """
    check_is_fitted(self, "is_fitted_")
    decision_func = self.decision_function(X)
    is_inlier = np.ones_like(decision_func, dtype=int)
    is_inlier[decision_func < 0] = -1
    return is_inlier

decision_function

decision_function(X)

Compute decision scores for groups.

Parameters:

Name Type Description Default
X array-like of shape (n_groups, n_samples, n_features)

The input groups to evaluate

required

Returns:

Name Type Description
scores ndarray of shape (n_groups,)

Decision scores. Negative scores represent outliers, positive scores represent inliers.

Source code in ikpykit/group/anomaly/_ikgad.py
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def decision_function(self, X):
    """Compute decision scores for groups.

    Parameters
    ----------
    X : array-like of shape (n_groups, n_samples, n_features)
        The input groups to evaluate

    Returns
    -------
    scores : ndarray of shape (n_groups,)
        Decision scores. Negative scores represent outliers,
        positive scores represent inliers.
    """
    return self.score_samples(X) - self.offset_

score_samples

score_samples(X)

Compute anomaly scores for groups.

Parameters:

Name Type Description Default
X array-like of shape (n_groups, n_samples, n_features)

The input groups to evaluate

required

Returns:

Name Type Description
scores ndarray of shape (n_groups,)

Anomaly scores where lower values indicate more anomalous groups.

Source code in ikpykit/group/anomaly/_ikgad.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
def score_samples(self, X):
    """Compute anomaly scores for groups.

    Parameters
    ----------
    X : array-like of shape (n_groups, n_samples, n_features)
        The input groups to evaluate

    Returns
    -------
    scores : ndarray of shape (n_groups,)
        Anomaly scores where lower values indicate more anomalous groups.
    """
    check_is_fitted(self, "is_fitted_")
    X = check_format(X)

    X_full = np.vstack(X)
    # Create kernel mean embeddings for each group
    split_idx = np.cumsum([len(x) for x in X])
    X_trans = self.iso_kernel_1_.transform(X_full)
    group_embeddings = np.split(X_trans.toarray(), split_idx[:-1], axis=0)
    X_embeddings = np.asarray(
        [
            self._kernel_mean_embedding(x, self.n_estimators_1)
            for x in group_embeddings
        ]
    )

    # Second level isolation kernel on the embeddings
    iso_kernel_2 = IsoKernel(
        n_estimators=self.n_estimators_2,
        max_samples=self.max_samples_2,
        random_state=self.random_state,
        method=self.method,
    )

    X_trans = iso_kernel_2.fit_transform(X_embeddings)
    kme = self._kernel_mean_embedding(X_trans, self.n_estimators_2)

    # For sparse matrices, .A1 converts to 1D array
    if hasattr(X_trans, "A1"):
        scores = safe_sparse_dot(X_trans, kme.T).A1
    else:
        scores = safe_sparse_dot(X_trans, kme.T)
        if hasattr(scores, "A1"):
            scores = scores.A1
        elif scores.ndim > 1:
            scores = scores.ravel()

    return -scores