IKAHC
ikpykit.cluster.IKAHC ¶
IKAHC(
n_estimators=200,
max_samples="auto",
lk_method="single",
ik_method="anne",
return_flat=False,
t=None,
n_clusters=None,
criterion="distance",
random_state=None,
)
Bases: BaseEstimator
, ClusterMixin
IKAHC is a novel hierarchical clustering algorithm. It uses a data-dependent kernel called Isolation Kernel to measure the similarity between clusters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_estimators |
int
|
The number of base estimators in the ensemble. |
200
|
max_samples |
int or float or str
|
The number of samples to draw from X to train each base estimator.
|
"auto"
|
ik_method |
Literal['inne', 'anne']
|
Isolation method to use. The original algorithm in paper is |
'anne'
|
lk_method |
(single, complete, average, weighted)
|
The linkage algorithm to use. The supported Linkage Methods are 'single', 'complete', 'average' and 'weighted'. |
"single"
|
return_flat |
bool
|
Whether to return flat clusters that extract from the fitted dendrogram. |
False
|
t |
float
|
The threshold to apply when forming flat clusters. Either t or n_clusters should be provided. |
None
|
n_clusters |
int
|
The number of flat clusters to form. Either t or n_clusters should be provided. |
None
|
criterion |
str
|
The criterion to use in forming flat clusters. Valid options are 'distance', 'inconsistent', 'maxclust', or 'monocrit'. |
'distance'
|
random_state |
int, RandomState instance or None
|
Controls the pseudo-randomness of the selection of the samples to fit the Isolation Kernel. Pass an int for reproducible results across multiple function calls.
See :term: |
None
|
Attributes:
Name | Type | Description |
---|---|---|
isokernel |
IsoKernel
|
Fitted isolation kernel. |
dendrogram |
ndarray
|
Cluster hierarchy as computed by scipy.cluster.hierarchy.linkage. |
References
.. [1] Xin Han, Ye Zhu, Kai Ming Ting, and Gang Li, "The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms", Pattern Recognition, 2023, 139: 109517.
Examples:
>>> from ikpykit.cluster import IKAHC
>>> import numpy as np
>>> X = [[0.4,0.3], [0.3,0.8], [0.5, 0.4], [0.5, 0.1]]
>>> clf = IKAHC(n_estimators=200, max_samples=2, lk_method='single', n_clusters=2, return_flat=True)
>>> clf.fit_predict(X)
array([1, 2, 1, 1], dtype=int32)
Source code in ikpykit/cluster/_ikahc.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
dendrogram
property
¶
dendrogram
Get the dendrogram of the hierarchical clustering.
Returns:
Name | Type | Description |
---|---|---|
dendrogram_ |
ndarray
|
The dendrogram representing the hierarchical clustering. |
isokernel
property
¶
isokernel
Get the fitted isolation kernel.
Returns:
Name | Type | Description |
---|---|---|
isokernel_ |
IsoKernel
|
The fitted isolation kernel. |
fit ¶
fit(X)
Fit the IKAHC clustering model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features)
|
The input samples. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Fitted estimator. |
Source code in ikpykit/cluster/_ikahc.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
fit_transform ¶
fit_transform(X, y=None)
Fit algorithm to data and return the dendrogram.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features)
|
The input samples. |
required |
y |
Ignored
|
Not used, present for API consistency by convention. |
None
|
Returns:
Name | Type | Description |
---|---|---|
dendrogram |
ndarray
|
Dendrogram representing the hierarchical clustering. |
Source code in ikpykit/cluster/_ikahc.py
236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
|
fit_predict ¶
fit_predict(X, y=None)
Fit algorithm to data and return the cluster labels.
Source code in ikpykit/cluster/_ikahc.py
255 256 257 |
|