IDKC
ikpykit.cluster.IDKC ¶
IDKC(
n_estimators,
max_samples,
method,
k,
kn,
v,
n_init_samples,
init_center=None,
is_post_process=True,
random_state=None,
)
Bases: BaseEstimator
, ClusterMixin
Isolation Distributional Kernel Clustering.
A clustering algorithm that leverages Isolation Kernels to transform data into a feature space where cluster structures are more distinguishable. The algorithm first constructs Isolation Kernel representations, then performs clustering in this transformed space using a threshold-based assignment mechanism.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_estimators |
int
|
Number of base estimators in the ensemble for the Isolation Kernel. Higher values generally lead to more stable results but increase computation time. |
required |
max_samples |
int
|
Number of samples to draw from X to train each base estimator in the Isolation Kernel. Controls the granularity of the kernel representation. |
required |
method |
(inne, anne, iforest)
|
Method used to calculate the Isolation Kernel: - 'inne': Isolation Nearest Neighbor Ensemble - 'anne': Approximate Nearest Neighbor Ensemble - 'iforest': Isolation Forest |
'inne'
|
k |
int
|
Number of clusters to form in the dataset. |
required |
kn |
int
|
Number of nearest neighbors used for local contrast density calculation during initialization. Higher values consider more neighbors when determining density. |
required |
v |
float
|
Decay factor (0 < v < 1) for reducing the similarity threshold during clustering. Smaller values cause faster decay, leading to more aggressive cluster assignments. |
required |
n_init_samples |
int or float
|
If int, number of samples to consider when initializing cluster centers. If float, fraction of total samples to consider when initializing cluster centers. Number of samples to consider when initializing cluster centers. Larger values may produce better initial centers but increase computation. |
required |
init_center |
int or array-like of shape (k,)
|
Index or indices of initial cluster centers. If None, centers are selected automatically based on density and distance considerations. |
None
|
is_post_process |
bool
|
Whether to perform post-processing refinement of clusters through iterative reassignment. Improves cluster quality but adds computational overhead. |
True
|
random_state |
int, RandomState instance or None
|
Controls the randomness of the algorithm. Pass an int for reproducible results. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
clusters_ |
list of KCluster objects
|
The cluster objects containing assignment and centroid information. |
it_ |
int
|
Number of iterations performed during the initial clustering phase. |
labels_ |
ndarray of shape (n_samples,)
|
Cluster labels for each point. Points not assigned to any cluster have label -1 (outliers). |
is_fitted_ |
bool
|
Whether the model has been fitted to data. |
Examples:
>>> from ikpykit.cluster import IDKC
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [5, 2], [5, 5], [1, 0], [5, 0]])
>>> clustering = IDKC(
... n_estimators=100, max_samples=3, method='anne',
... k=2, kn=5, v=0.5, n_init_samples=4, random_state=42
... )
>>> clustering.fit_predict(X)
array([1, 1, 0, 0, 1, 0])
References
.. [1] Ye Zhu, Kai Ming Ting (2023). Kernel-based Clustering via Isolation Distributional Kernel. Information Systems.
Source code in ikpykit/cluster/_idkc.py
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
fit ¶
fit(X, y=None)
Fit the IDKC clustering model on data X.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray of shape (n_samples, n_features)
|
The input instances to cluster. |
required |
y |
Ignored
|
Not used, present for API consistency by convention. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Fitted estimator. |
Source code in ikpykit/cluster/_idkc.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 |
|
predict ¶
predict(X)
Predict the cluster labels for each point in X.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
ndarray of shape (n_samples, n_features)
|
The input instances to predict cluster labels for. |
required |
Returns:
Name | Type | Description |
---|---|---|
labels |
ndarray of shape (n_samples,)
|
Cluster labels for each point. Points not assigned to any cluster have label -1 (outliers). |
Source code in ikpykit/cluster/_idkc.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
|