Skip to content

echo

Echo

Bases: NoveltyDetectionClassifier

Implementation of the ECHO algorithm for novelty detection. [1]

[1] Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data." 2016 IEEE 32nd international conference on data engineering (ICDE). IEEE, 2016.

Parameters:

Name Type Description Default
K int

Number of pseudopoints per classifier (number of K clusters for the clustering algorithm)

required
min_examples_cluster int

Minimum number of examples to declare a novel class

required
ensemble_size int

Number of classifiers to use to create the ensemble

required
W int

Maximum allowable size for the dynamic sliding window

required
tau float

Confidence threshold for labeling data points

0.9
verbose int

Controls the level of verbosity, the higher, the more messages are displayed. Can be '1', '2', or '3'.

0
random_state int

Seed for the random number generation. Note: Due to the nature of the algorithm, a same seed won't lead to the exact same results.

None
init_algorithm str

Clustering algorithm to use to initialize the clusters, supports 'kmeans' and 'mcikmeans'

'mcikmeans'

Attributes:

Name Type Description
models list of ClusterModel

List containing the models of the ensemble.

short_mem ShortMem

Buffer memory containing the samples labeled as unknown temporarily for the novelty detection process

association_coefficients list of float

Point-biserial correlation coefficients for the association heuristic of each model

purity_coefficients list of float

Point-biserial correlation coefficients for the purity heuristic of each model

confidence_window deque

Sliding window of confidence scores

window deque

Sliding window of ShortMemInstance samples

Source code in streamndr/model/echo.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
class Echo(NoveltyDetectionClassifier):
    """Implementation of the ECHO algorithm for novelty detection. [1]

    [1] Haque, Ahsanul, et al. "Efficient handling of concept drift and concept evolution over stream data."
    2016 IEEE 32nd international conference on data engineering (ICDE). IEEE, 2016.

    Parameters
    ----------
    K : int
        Number of pseudopoints per classifier (number of K clusters for the clustering algorithm)
    min_examples_cluster : int
        Minimum number of examples to declare a novel class
    ensemble_size : int
        Number of classifiers to use to create the ensemble
    W : int
        Maximum allowable size for the dynamic sliding window
    tau : float
        Confidence threshold for labeling data points
    verbose : int
        Controls the level of verbosity, the higher, the more messages are displayed. Can be '1', '2', or '3'.
    random_state : int
        Seed for the random number generation. Note: Due to the nature of the algorithm, a same seed won't lead to the exact same results.
    init_algorithm : str
        Clustering algorithm to use to initialize the clusters, supports 'kmeans' and 'mcikmeans'

    Attributes
    ----------
    models : list of ClusterModel
        List containing the models of the ensemble.
    short_mem : ShortMem
        Buffer memory containing the samples labeled as unknown temporarily for the novelty detection process
    association_coefficients : list of float
        Point-biserial correlation coefficients for the association heuristic of each model
    purity_coefficients : list of float
        Point-biserial correlation coefficients for the purity heuristic of each model
    confidence_window : deque
        Sliding window of confidence scores
    window : deque
        Sliding window of ShortMemInstance samples
    """
    def __init__(self, 
                 K,
                 min_examples_cluster,
                 ensemble_size,
                 W,
                 tau=0.9,
                 verbose=0,
                 random_state=None, #Note: Due to the nature of the algorithm, a same seed won't lead to the exact same results
                 init_algorithm="mcikmeans"):

        super().__init__(verbose, random_state)
        self.K = K
        self.min_examples_cluster = min_examples_cluster
        self.ensemble_size = ensemble_size
        self.W = W # Maximum allowable size for the dynamic sliding window
        self.tau = tau # Confidence threshold

        accepted_algos = ['kmeans', 'mcikmeans']
        if init_algorithm not in accepted_algos:
            raise ValueError(f"Invalid algorithm '{init_algorithm}'. Available algorithms: {', '.join(accepted_algos)}")
        self.init_algorithm = init_algorithm

        self.models = []
        self.short_mem = ShortMem() # Potential novel class instances
        self.association_coefficients = []
        self.purity_coefficients = []
        self.confidence_window = deque(maxlen=self.W)
        self.window = deque(maxlen=self.W)

    def learn_one(self, x, y, w=1.0):
        # Function used by river algorithms to learn one sample. It is not applicable to this algorithm since the offline phase requires all samples
        # to arrive at once. It is only added as to follow River's API.
        pass

    def learn_many(self, X, y, w=1.0):
        """Represents the offline phase of the algorithm. Receives a number of samples and their given labels and learns all of the known classes.

        Parameters
        ----------
        X : pandas.DataFrame or numpy.ndarray
            Samples to be learned by the model
        y : list of int
            Labels corresponding to the given samples, must be the same length as the number of samples
        w : float, optional
            Weights, not used, by default 1.0

        Returns
        -------
        Echo
            Fitted estimator
        """
        if isinstance(X, pd.DataFrame):
            X = X.to_numpy()
        y = np.asarray(y)
        if len(X) != len(y):
            raise ValueError("X and y must contain the same number of samples.")

        # in offline phase, consider all instances arriving at the same time in the microclusters:
        timestamp = len(X)

        for i in range(0, self.ensemble_size):
            state = None if self.random_state is None else self.random_state + i
            microclusters = generate_microclusters(X, y, timestamp, self.K, min_samples=0, algorithm=self.init_algorithm, random_state=state)

            model = ClusterModel(microclusters, list(np.unique(y)))
            if len(microclusters) > 0:
                self.models.append(model)

        # Calculate the heuristic values - Iterate over all of the models in the ensemble
        for model in self.models:
            # Get the model's closest microcluster and its corresponding distance for each X
            closest_clusters_model, dist = get_closest_clusters(X, [microcluster.centroid for microcluster in model.microclusters])
            model_label = [model.microclusters[closest_cluster].label for closest_cluster in closest_clusters_model]

            # Compute the association with: {Radius of closest microcluster} - {Distance of x from microcluster's center}
            associations = np.array([model.microclusters[closest_cluster].max_distance for closest_cluster in closest_clusters_model]) - dist

            # Compute the purity with: {Number of samples of the most occuring class} / {Number of all samples}
            purities = np.array([model.microclusters[closest_cluster].n_label_instances for closest_cluster in closest_clusters_model]) / np.array([model.microclusters[closest_cluster].n for closest_cluster in closest_clusters_model])

            # Compute the vector containing if the classification are correct or not
            vector = [1 if y1 == y2 else 0 for y1, y2 in zip(y, model_label)]

            # Compute the Point-biserial correlation coefficients between the heuristic values and the vector
            self.association_coefficients.append(pointbiserialr(associations, vector).statistic)
            self.purity_coefficients.append(pointbiserialr(purities, vector).statistic)

        self.before_offline_phase = False

        return self


    def predict_one(self, X, y):
        """Represents the online phase. Equivalent to predict_many() with only one sample. Receives only one sample, predict its label if it's 
        within the decision boundary of the ensemble. Otherwise, if it's unknown, it is added to the short term memory and novelty detection is 
        performed.

        Parameters
        ----------
        X : dict
            Sample
        y : int
            True y value of the sample
        """
        return self.predict_many(np.array(list(X.values()))[None,:], [y])

    def predict_many(self, X, y):
        """Represents the online phase. Receives multiple samples, for each sample predict its label predict its label if it's within the decision 
        boundary of the ensemble. Otherwise, if it's unknown, it is added to the short term memory and novelty detection is performed once the trigger has been reached.

        Parameters
        ----------
        X : pandas.DataFrame or numpy.ndarray
            Samples
        y : list of int
            True y values of the samples

        Returns
        -------
        numpy.ndarray
            Array of length len(X) containing the predicted labels, predicts -1 if the corresponding sample is labeled as unknown

        Raises
        ------
        Exception
            If the model has not been trained first with learn_many() (offline phase)
        """
        if self.before_offline_phase:
            raise Exception("Model must be fitted first")

        if isinstance(X, pd.DataFrame):
            X = X.to_numpy() #Converting DataFrame to numpy array
        y = np.asarray(y)
        if len(X) != len(y):
            raise ValueError("X and y must contain the same number of samples.")

        f_outliers = check_f_outlier(X, self.models)
        closest_model_cluster, average_confidences, y_preds = self._majority_voting(X, True)

        pred_label = []
        for i in range(len(X)):
            self.sample_counter += 1
            if y is not None:
                if y[i] not in self.class_sample_counter:
                    self.class_sample_counter[y[i]] = 1
                else:
                    self.class_sample_counter[y[i]] += 1

            #Get the closest microcluster with our list of tuples self.models[closest_model_index][closest_cluster_index]
            closest_cluster = self.models[closest_model_cluster[i][0]].microclusters[closest_model_cluster[i][1]]

            #If X is not an F-outlier (inside the closest cluster radius), then we classify it with the label from the majority voting
            if not f_outliers[i]:
                pred_label.append(y_preds[i])
                closest_cluster.update_cluster(X[i], self.sample_counter, False)

            else: #X is an F-outlier (outside the boundary of all classifiers)
                pred_label.append(-1)

                if y is not None:
                    self.short_mem.append(ShortMemInstance(X[i], self.sample_counter, y[i]))
                    if y[i] not in self.nb_class_unknown:
                        self.nb_class_unknown[y[i]] = 1
                    else:
                        self.nb_class_unknown[y[i]] += 1
                else:
                    self.short_mem.append(ShortMemInstance(X[i], self.sample_counter))

                if (len(self.short_mem) > self.min_examples_cluster):
                    #Find the novel cluster, if any
                    novel_cluster = self._novelty_detect()
                    if novel_cluster is not None:
                        if self.verbose > 1:
                            print("Novel cluster: ", novel_cluster)
                        elif self.verbose > 0:
                            print("Novel cluster: ", novel_cluster.small_str())

                        #Change the predicted label for the new class label
                        pred_label[-1] = novel_cluster.label

                        #Add the novel cluster to all models
                        for model in self.models:
                            model.microclusters.append(novel_cluster)
                            model.labels.append(novel_cluster.label)

                    #Remove all instances from the buffer since if they were not detected as a novel classes, they are classified, as per ECHO paper
                    for _ in range(len(self.short_mem)):
                        self._remove_sample_from_short_mem(0)

            #Add point and confidence to window
            self.window.append(ShortMemInstance(X[i], self.sample_counter, y[i], pred_label[-1]))
            self.confidence_window.append(average_confidences[i])

            change_point = self._detect_change()

            if change_point != -1:
                if self.verbose > 1:
                    print("Change detected at point: ", change_point)
                self._update_classifier(change_point)


        return np.array(pred_label)

    def _majority_voting(self, X, return_labels=True):
        closest_clusters = []
        labels = []
        dists = []
        confidences = []

        #Iterate over all of the models in the ensemble
        for i, model in enumerate(self.models):
            #Get the model's closest microcluster and its corresponding distance for each X
            closest_clusters_model, dist = get_closest_clusters(X, [microcluster.centroid for microcluster in model.microclusters])
            closest_clusters.append(closest_clusters_model)
            model_label = [model.microclusters[closest_cluster].label for closest_cluster in closest_clusters_model]
            labels.append(model_label)
            dists.append(dist)

            #Compute the heuristic values
            #Compute the association with: {Radius of closest microcluster} - {Distance of x from microcluster's center}
            associations = np.array([model.microclusters[closest_cluster].max_distance for closest_cluster in closest_clusters_model]) - dist
            #Compute the purity with: {Number of samples of the most occuring class} / {Number of all samples}
            purities = np.array([model.microclusters[closest_cluster].n_label_instances for closest_cluster in closest_clusters_model]) / np.array([model.microclusters[closest_cluster].n for closest_cluster in closest_clusters_model])

            #Compute the confidence score on each X sample using the dot product between the heuristics and coefficients
            confidences.append(np.dot(associations, self.association_coefficients[i]) + np.dot(purities, self.purity_coefficients[i]))

        #Normalize the confidence score between 0 and 1 and compute the average for each sample independantly
        scaler = MinMaxScaler()
        average_confidences = np.mean(scaler.fit_transform(confidences), axis=0)

        #From all the closest microclusters of each model, get the index of the closest model for each X
        best_models = np.argmin(dists, axis=0)

        #Finally, create a list of tuples, which contain the index of the closest model and the index of the closest microcluster within that model for each X
        closest_model_cluster = []
        for i in range(len(X)):
            closest_model_cluster.append((best_models[i], closest_clusters[best_models[i]][i]))

        #Return the list of tuples (index of closest model, index of closest microcluster within that model), 
        # and a list containing the label Y with the most occurence between all of the models (majority voting) for each X. 
        if return_labels:
            return closest_model_cluster, average_confidences, get_most_occurring_by_column(labels)
        else:
            return closest_model_cluster, average_confidences

    def _novelty_detect(self):
        if self.verbose > 1:
            print("Novelty detection started")

        X = self.short_mem.get_all_points()
        new_class_vote = 0
        potential_novel_points_idx = {index: 0 for index in range(len(X))}

        for model in self.models:
            qnscs = qnsc(X, model.microclusters, self.min_examples_cluster)
            nb_potential_novel_points = 0

            for i, qnsc_value in enumerate(qnscs):
                if qnsc_value > 0:
                    potential_novel_points_idx[i] += 1
                    nb_potential_novel_points += 1

            if nb_potential_novel_points > self.min_examples_cluster:
                new_class_vote += 1

        if new_class_vote == len(self.models):
            novel_points_idx = [index for index, vote in potential_novel_points_idx.items() if vote == len(self.models)]

            if len(novel_points_idx) >= self.min_examples_cluster:
                novel_points = [X[i] for i in novel_points_idx]
                label = max(set(element for sublist in self.models for element in sublist.labels)) + 1
                return MicroCluster(label, instances=np.array(novel_points), timestamp=self.sample_counter, n_label_instances=len(novel_points))

        return None

    def _detect_change(self, alpha=0.05, gamma=100):
        """
        Detect-Change algorithm implementation

        Parameters:
        alpha: Sensitivity
        gamma: Cushion period size
        W: The dynamic sliding window (list of confidence scores)

        Returns:
        The change point if exists; -1 otherwise
        """
        Th = -np.log(alpha)
        n = len(self.confidence_window)
        omega_n = 0
        k_max = -1

        confidence_list = list(self.confidence_window)

        if n <= self.W and np.mean(self.confidence_window) > 0.3:
            for k in range(gamma, n - gamma):
                # Estimate pre and post-beta distributions
                pre_data = confidence_list[:k]
                post_data = confidence_list[k:]

                alpha0, beta0 = self._estimate_beta_params(pre_data)
                alpha1, beta1 = self._estimate_beta_params(post_data)

                # Calculate S_k,n
                S_k_n = self._calculate_S_k_n(post_data, alpha0, beta0, alpha1, beta1)

                # Update omega_n and k_max
                if S_k_n > omega_n:
                    omega_n = S_k_n
                    k_max = k

            if omega_n >= Th:
                return k_max
            else:
                return -1
        else:
            print(f'Returning n because n={n} and mean={np.mean(self.confidence_window)}')
            return n

    def _estimate_beta_params(self, data):
        """Estimate beta distribution parameters using method of moments"""
        mean = np.mean(data)
        var = np.var(data)
        if var == 0:
            return 1, 1  # Default to uniform distribution if variance is zero
        alpha = mean * (mean * (1 - mean) / var - 1)
        beta_param = (1 - mean) * (mean * (1 - mean) / var - 1)
        return max(alpha, 0.01), max(beta_param, 0.01)  # Ensure positive parameters

    def _calculate_S_k_n(self, data, alpha0, beta0, alpha1, beta1):
        """Calculate S_k,n using log likelihood ratios"""
        pdf1 = beta.pdf(data, alpha1, beta1)
        pdf0 = beta.pdf(data, alpha0, beta0)

        # Avoid division by zero or log(0)
        ratio = np.divide(pdf1, pdf0, out=np.ones_like(pdf1), where=pdf0!=0)
        log_ratio = np.log(ratio, out=np.zeros_like(ratio), where=ratio>0)

        return np.sum(log_ratio)

    def _update_classifier(self, change_point):
        labeled_data = [self.window[i] for i, confidence in enumerate(self.confidence_window) if confidence <= self.tau]
        unlabeled_data = [self.window[i] for i, confidence in enumerate(self.confidence_window) if confidence > self.tau]

        labeled_X = [instance.point for instance in labeled_data]
        labeled_y = [instance.y_true for instance in labeled_data]

        unlabeled_X = [instance.point for instance in unlabeled_data]
        unlabeled_y = [instance.y_pred for instance in unlabeled_data]

        X_train = np.array(labeled_X + unlabeled_X)
        y_train = np.array(labeled_y + unlabeled_y)

        new_model = self._train_new_model(X_train, y_train)

        if len(self.models) < self.ensemble_size:
            self.models.append(new_model)
        else:
            # Replace the oldest model with the new one
            oldest_model_index = np.argmin([model.microclusters[0].timestamp for model in self.models])
            self.models[oldest_model_index] = new_model

        # Clear the window and confidence window from the change point onwards
        self.window = deque(list(self.window)[change_point:], maxlen=self.W)
        self.confidence_window = deque(list(self.confidence_window)[change_point:], maxlen=self.W)


    def _train_new_model(self, X, y):
        K0 = min(self.K, len(X)) # Can't create K clusters if K is higher than the number of samples
        microclusters = generate_microclusters(X, y, self.sample_counter, K0, min_samples=0, algorithm=self.init_algorithm, random_state=self.random_state)
        return ClusterModel(microclusters, list(np.unique(y)))

learn_many(X, y, w=1.0)

Represents the offline phase of the algorithm. Receives a number of samples and their given labels and learns all of the known classes.

Parameters:

Name Type Description Default
X DataFrame or ndarray

Samples to be learned by the model

required
y list of int

Labels corresponding to the given samples, must be the same length as the number of samples

required
w float

Weights, not used, by default 1.0

1.0

Returns:

Type Description
Echo

Fitted estimator

Source code in streamndr/model/echo.py
def learn_many(self, X, y, w=1.0):
    """Represents the offline phase of the algorithm. Receives a number of samples and their given labels and learns all of the known classes.

    Parameters
    ----------
    X : pandas.DataFrame or numpy.ndarray
        Samples to be learned by the model
    y : list of int
        Labels corresponding to the given samples, must be the same length as the number of samples
    w : float, optional
        Weights, not used, by default 1.0

    Returns
    -------
    Echo
        Fitted estimator
    """
    if isinstance(X, pd.DataFrame):
        X = X.to_numpy()
    y = np.asarray(y)
    if len(X) != len(y):
        raise ValueError("X and y must contain the same number of samples.")

    # in offline phase, consider all instances arriving at the same time in the microclusters:
    timestamp = len(X)

    for i in range(0, self.ensemble_size):
        state = None if self.random_state is None else self.random_state + i
        microclusters = generate_microclusters(X, y, timestamp, self.K, min_samples=0, algorithm=self.init_algorithm, random_state=state)

        model = ClusterModel(microclusters, list(np.unique(y)))
        if len(microclusters) > 0:
            self.models.append(model)

    # Calculate the heuristic values - Iterate over all of the models in the ensemble
    for model in self.models:
        # Get the model's closest microcluster and its corresponding distance for each X
        closest_clusters_model, dist = get_closest_clusters(X, [microcluster.centroid for microcluster in model.microclusters])
        model_label = [model.microclusters[closest_cluster].label for closest_cluster in closest_clusters_model]

        # Compute the association with: {Radius of closest microcluster} - {Distance of x from microcluster's center}
        associations = np.array([model.microclusters[closest_cluster].max_distance for closest_cluster in closest_clusters_model]) - dist

        # Compute the purity with: {Number of samples of the most occuring class} / {Number of all samples}
        purities = np.array([model.microclusters[closest_cluster].n_label_instances for closest_cluster in closest_clusters_model]) / np.array([model.microclusters[closest_cluster].n for closest_cluster in closest_clusters_model])

        # Compute the vector containing if the classification are correct or not
        vector = [1 if y1 == y2 else 0 for y1, y2 in zip(y, model_label)]

        # Compute the Point-biserial correlation coefficients between the heuristic values and the vector
        self.association_coefficients.append(pointbiserialr(associations, vector).statistic)
        self.purity_coefficients.append(pointbiserialr(purities, vector).statistic)

    self.before_offline_phase = False

    return self

predict_many(X, y)

Represents the online phase. Receives multiple samples, for each sample predict its label predict its label if it's within the decision boundary of the ensemble. Otherwise, if it's unknown, it is added to the short term memory and novelty detection is performed once the trigger has been reached.

Parameters:

Name Type Description Default
X DataFrame or ndarray

Samples

required
y list of int

True y values of the samples

required

Returns:

Type Description
ndarray

Array of length len(X) containing the predicted labels, predicts -1 if the corresponding sample is labeled as unknown

Raises:

Type Description
Exception

If the model has not been trained first with learn_many() (offline phase)

Source code in streamndr/model/echo.py
def predict_many(self, X, y):
    """Represents the online phase. Receives multiple samples, for each sample predict its label predict its label if it's within the decision 
    boundary of the ensemble. Otherwise, if it's unknown, it is added to the short term memory and novelty detection is performed once the trigger has been reached.

    Parameters
    ----------
    X : pandas.DataFrame or numpy.ndarray
        Samples
    y : list of int
        True y values of the samples

    Returns
    -------
    numpy.ndarray
        Array of length len(X) containing the predicted labels, predicts -1 if the corresponding sample is labeled as unknown

    Raises
    ------
    Exception
        If the model has not been trained first with learn_many() (offline phase)
    """
    if self.before_offline_phase:
        raise Exception("Model must be fitted first")

    if isinstance(X, pd.DataFrame):
        X = X.to_numpy() #Converting DataFrame to numpy array
    y = np.asarray(y)
    if len(X) != len(y):
        raise ValueError("X and y must contain the same number of samples.")

    f_outliers = check_f_outlier(X, self.models)
    closest_model_cluster, average_confidences, y_preds = self._majority_voting(X, True)

    pred_label = []
    for i in range(len(X)):
        self.sample_counter += 1
        if y is not None:
            if y[i] not in self.class_sample_counter:
                self.class_sample_counter[y[i]] = 1
            else:
                self.class_sample_counter[y[i]] += 1

        #Get the closest microcluster with our list of tuples self.models[closest_model_index][closest_cluster_index]
        closest_cluster = self.models[closest_model_cluster[i][0]].microclusters[closest_model_cluster[i][1]]

        #If X is not an F-outlier (inside the closest cluster radius), then we classify it with the label from the majority voting
        if not f_outliers[i]:
            pred_label.append(y_preds[i])
            closest_cluster.update_cluster(X[i], self.sample_counter, False)

        else: #X is an F-outlier (outside the boundary of all classifiers)
            pred_label.append(-1)

            if y is not None:
                self.short_mem.append(ShortMemInstance(X[i], self.sample_counter, y[i]))
                if y[i] not in self.nb_class_unknown:
                    self.nb_class_unknown[y[i]] = 1
                else:
                    self.nb_class_unknown[y[i]] += 1
            else:
                self.short_mem.append(ShortMemInstance(X[i], self.sample_counter))

            if (len(self.short_mem) > self.min_examples_cluster):
                #Find the novel cluster, if any
                novel_cluster = self._novelty_detect()
                if novel_cluster is not None:
                    if self.verbose > 1:
                        print("Novel cluster: ", novel_cluster)
                    elif self.verbose > 0:
                        print("Novel cluster: ", novel_cluster.small_str())

                    #Change the predicted label for the new class label
                    pred_label[-1] = novel_cluster.label

                    #Add the novel cluster to all models
                    for model in self.models:
                        model.microclusters.append(novel_cluster)
                        model.labels.append(novel_cluster.label)

                #Remove all instances from the buffer since if they were not detected as a novel classes, they are classified, as per ECHO paper
                for _ in range(len(self.short_mem)):
                    self._remove_sample_from_short_mem(0)

        #Add point and confidence to window
        self.window.append(ShortMemInstance(X[i], self.sample_counter, y[i], pred_label[-1]))
        self.confidence_window.append(average_confidences[i])

        change_point = self._detect_change()

        if change_point != -1:
            if self.verbose > 1:
                print("Change detected at point: ", change_point)
            self._update_classifier(change_point)


    return np.array(pred_label)

predict_one(X, y)

Represents the online phase. Equivalent to predict_many() with only one sample. Receives only one sample, predict its label if it's within the decision boundary of the ensemble. Otherwise, if it's unknown, it is added to the short term memory and novelty detection is performed.

Parameters:

Name Type Description Default
X dict

Sample

required
y int

True y value of the sample

required
Source code in streamndr/model/echo.py
def predict_one(self, X, y):
    """Represents the online phase. Equivalent to predict_many() with only one sample. Receives only one sample, predict its label if it's 
    within the decision boundary of the ensemble. Otherwise, if it's unknown, it is added to the short term memory and novelty detection is 
    performed.

    Parameters
    ----------
    X : dict
        Sample
    y : int
        True y value of the sample
    """
    return self.predict_many(np.array(list(X.values()))[None,:], [y])