Production machine learning patterns should be monitored for data changes, anomalies, anomalies, and unexpected behavior in the event of anomalies.
When a machine learning model is introduced in production, it becomes part of the production application. It changes the context from the educational environment to the production stack. As a result, the problems of the machine learning models introduced must be addressed in the context of the production application. One important difference between model training and production environments is data dependency.
Model training datasets can have many fOrms: local files, S3 groups, APIs, and other sources. This largely depends on the use case. Use cases can range from one-time manual training to fully automatic retraining. In most cases, however, the prediction / scoring of the model takes place in a different environment. For example, a fraud detection model is trained with features that are downloaded and built from a database but predict real-time data in a web application that serves API requests and responds with the probability that the input data is fraudulent.
Dependencies in many applications are typically:
- Data warehouses – relational databases, chart databases, NoSQL, cloud repositories, etc.
- Upstream APIs – internal micro-services, third party APIs, etc.
- Incoming HTTP requests – user inputs, downstream API calls, etc.
- Local files – configuration files, temporary image files, etc.
From a model perspective, the dependencies can be direct, i.e., the script implements the conclusion. An example is a database query to retrieve data that is converted and passed to a model
predict() method. For model servers that serve stored models as an API, the dependencies are implemented in downstream components that consume the model server’s dotted REST API endpoint. Examples of such template servers are TensorFlow service, Torch Service and KFServices.
It is practical to distinguish between them code and information dependencies to ensure proper determination of testing and control.
- Code and infrastructure issues – connection problems, software errors, runtime exceptions, hardware failures, and so on.
- Data problems unexpected values, missing characteristics, deviations, deviations, data migration, etc.
Code and infrastructure issues have been with us for decades. There are many advanced policies and tools to address them, including advanced tracking and traceability solutions that provide log management, metric collection, and tracing to detect and troubleshoot such issues.
Data problems there is a relatively new dimension of failures in the context of the model service. The challenge with data questions is that, unlike code exceptions, they are difficult to detect. Because the models are fully data-driven, they pose a risk of silent failures; incorrect model input data do not cause exceptions, but lead to junk model results. I will discuss this topic in more detail in my second article Control of machine learning models in case of silent faults.
In real applications, data dependencies are not constant, they can change, for example, with application releases. Publishes to fix bugs, add product features, add new services, change configuration, modify data processing, and so on. In particular, they may change the data dependence and / or internal behavior of the model. This is another aspect that needs to be taken into account.
The model development process is different from the software development process. Depending on the model type and use case, model training can be manual or automated, performed locally or in a pipeline. In other words, the implementation of the model is the timing of the model development process and not the application release cycle.
This means that from an application perspective, the model can be replaced with a new version regardless of the application’s CI / CD and releases. And while data dependencies can remain unchanged, the model itself changes. Although the model has been validated and tested before deployment, the new version can still cause problems with invisible production data.
The changing nature of production applications places a new requirement on the continuous monitoring of model input and output data for consistency and quality in order to detect and resolve data problems.
During the internship, we can perform tests on the training materials, but it becomes challenging for large-scale online production applications where the models serve forecast requests 24/7. In this case, performing data set tests becomes impractical because the data set is in fact a continuous flow of data. And depending on how forecast requests are handled, i.e. one at a time or in batches, the collection of forecast data is necessary for the calculation of statistics.
To better understand what needs to be presented as a metric to monitor, it is useful to classify potential data problems. The following list is by no means exhaustive.
- Input data model change, eg change of value type from value
- Input data deviation, e.g., a property value that the model should never see.
- Input of input datathat is, the change in the distribution of the property value.
- Missing featuresthat is, property values that include
Noneset of values.
- Output data deviationfor example, due to input data problems.
- Transfer of input datafor example, due to input data problems.
In order to detect and resolve the data issues listed above, the model tracking system should provide two main features:
- Automatic problem detection – This includes real-time monitoring of feature and category statistics for each forecast and alert of problems.
- Root cause analysis methods – allows a more accurate sample of the problem detected or visual inspection.
To implement these features calculation of data statistics and storage of data samples is necessary. I will discuss the strategy for recording a forecast in a blog post Prediction log for displaying models.
The data tracking system has two parts; a registrar / agent who calculates time window or data item statistics and stores samples, and a server section where these statistics are tracked for problems and visualized.
In some applications, it may be good enough to implement such a monitoring system manually ALL stacks or Prometheus together with a logger / agent to compute data statistics. ML-specific and integrated solution Graph signal There is another option if an end-to-end, out-of-the-box system is more beneficial.
When determining the monitoring of model applications used in production, it is important to consider it in the context of the production application environment and not just as part of a machine learning training environment or process.