Reconstruction Detector
Base class for reconstruction based AD and implementation of such algorithms.
- class ecgan.anomaly_detection.detector.reconstruction_detector.ReconstructionDetector(module, reconstructor, tracker)[source]
Bases:
ecgan.anomaly_detection.detector.base_detector.AnomalyDetector
,abc.ABC
Base class for anomaly detectors which reconstruct data.
The reconstructed data is used to calculate the anomalousness of the data.
- class ecgan.anomaly_detection.detector.reconstruction_detector.GANAnomalyDetector(module, reconstructor, tracker)[source]
Bases:
ecgan.anomaly_detection.detector.reconstruction_detector.ReconstructionDetector
A GAN based anomaly detector which utilizes a reconstructed series.
Data is reconstructed by latent interpolation from AnoGAN. Given an input sample x, an \(\epsilon\) similar sample \(\hat{x}\) is retrieved by interpolating through latent space (see
ecgan.detection.reconstruction.InterpolationReconstructor
). Afterwards an anomaly score is calculated byComparing real and synthetic data in data space (using e.g. \(L_1\) /\(L_2\) distance) using reconstruction error R(x). R(x) is e.g. the L2 distance or any other distance in data space.
Comparing real and synthetic data using the output of the discriminator using discrimination error D(x). D(x) is e.g. the deviation of the output score from a target value. Since this can be unreliable and depends on the training progress of the discriminator, feature matching is most commonly used.
Both components are weighted using \(\lambda\) according to the AnoGAN paper. Additionally we allow using a second weight \(\gamma\) to incorporate a third variable, the latent norm Z(x). Z(x) compares the norm of the latent vector: The distribution of the norm of training data usually follows the Chi distribution (albeit depending on the generative net used). The deviation from its mode can be used to measure how likely it is that the latent vector has produced the output data.
- detect(test_x, test_y)[source]
Detect anomalies in the test data and return predicted labels.
Original detect method has to be overridden since part of our score (and label) optimization requires all reconstructed samples.
- Return type
ndarray
- _optimize_metric(reconstruction_error, discriminator_error, test_y)[source]
Predict labels based on the selected strategy.
The deciding parameters (SVM or manual params) should be retrieved from the validation dataset. If none exist: train and evaluate on test data. This should be avoided.
- Return type
Tensor
- Returns
The predicted labels.
- _reconstruct(data)[source]
Detect anomalies inside the data Tensor.
- Parameters
data (
Tensor
) -- Tensor (usually of size [series, channel, seq_len]) of real data- Return type
Tensor
- Returns
Tensor with the corresponding predicted labels.
- track_batch(reconstructed_series, real_series, labels, plot_num_samples=4)[source]
Track a batch of reconstructed data.
By default: visualize the first plot_num_samples samples for a visual comparison. If the reconstruction is an interpolation reconstructed: Track further metrics. This includes the norm of reconstructed data in the latent space, the L1 distance between real and fake samples and interpolation grids.
- Parameters
reconstructed_series (
Tensor
) -- Reconstructed data.real_series (
Tensor
) -- Real data.labels (
Tensor
) -- Labels corresponding to real data.plot_num_samples (
int
) -- Amount of samples that shall be plotted.
- Return type
None
- class ecgan.anomaly_detection.detector.reconstruction_detector.GANInverseAnomalyDetector(module, reconstructor, tracker)[source]
Bases:
ecgan.anomaly_detection.detector.reconstruction_detector.GANAnomalyDetector
Anomaly detector with an inverse mapping from data to latent space.
The detector can use a pretrained network for mapping a datum to a latent vector. Alternatively, a novel mapping can be trained using a fully trained GAN. The resulting detection follows the
ecgan.anomaly_detection.reconstruction_detector.GANAnomalyDetector
, the only difference is how the reconstructed sample is retrieved. Using the inverse mapping the sample is not necessarily \(\epsilon\) similar but the process is significantly sped up and the runtime is linear.