Detection of feed space anomalies
Our variation encoders have been trained in the “healthy” wear data of tools. As such, if we enter trained VAEs with unhealthy information or simply abnormal, we should produce a major reconstruction error. A threshold can be set for this reconstruction error, in which case data that produces a reconstruction error above the threshold is considered an anomaly. This is the detection of input mode anomalies.
Note: In short, I will not cover all the code in this message – open Colab notebook for an interactive experience and see all the codes.
The reconstruction error is measured using the mean square error (MSE). Since the reconstruction is for all six signals, we can calculate the MSE for each individual signal (
mse function), and for all six signals together (
mse_total function). These two functions look like this:
Reconstruction values (
recon) is produced by input windowed shear signals (also called sub-shears) to a trained VAE as follows:
recon = model.predict(X, batch_size=64).
Reconstruction probabilities are another way to detect anomalies in the input space (kind?). An and Cho presented the method in 2015 paper. 
I’m not as familiar with the reconstruction probability method, but James McCaffrey has a good explanation (and implementation in PyTorch) his blog. He says: “The idea of detecting reconstruction probability deviations is to calculate another probability distribution and use it to calculate the probability that the input object is from the distribution. Data objects with a low reconstruction probability are unlikely to come from the distribution, so they are somehow abnormal.”
We do not use reconstruction probabilities to detect anomalies, but it would be interesting to implement. Maybe you can try it?
Detection of latent space abnormalities
Detection of anomalies can also be performed by latent state standard and standard deviation coding, which we do. Here is a general method:
- Measure the relative difference in entropy between data samples with KL divergence. A threshold can be set for this relative difference, which indicates when the data sample is anomalous.
Adam Lineberry is a good example of the detection of KL divergence abnormalities in Py-Torch his blog. Here is the KL divergence function (implemented with Keras and TensorFlow) that we use:
mu is the average (u) and
log_var is the logarithm of the variance (log σ²). The variance log is used to practice VAE because it is more stable than the variance alone.
To generate KL divergence points, we use the following function:
Once we have calculated the reconstruction errors or KL divergence points, we are ready to set a decision threshold. All values above the threshold are abnormal (probably a worn tool) and all values below are normal (healthy tool).
In order to fully assess the performance of the model, we need to look at several possible decision thresholds. There are two general approaches receiver operating feature (ROC) and accuracy recovery curve. The ROC curve represents the true positive quantity with respect to the false positive quantity. The accuracy and recovery curve, as the name implies, illustrates the accuracy between recovery and recovery. Measuring the area under the curve then provides a good method to compare different models.
We use the accuracy recovery area curve (PR-AUC) to evaluate model performance. PR-AUC works well in unbalanced data, in contrast to ROC-AUC. [2, 3] Below is a picture that explains the importance of accuracy and recovery and how to construct an accuracy and recovery curve.