What’s LLM Observability ? Latest tools to look out for

2024 is looking to be the year where a lot of applied Large Language Models (LLMs) from enterprise companies, other than the creators of the foundation LLMs, are going to come out of the Proof of Concept (POC) phase to actually being used by their customers. It’s gonna be a year of trial and error, some with fantastic user metrics, some with less desirable ones, but mostly all may come out with some unbelievable insights. But we all know – what’s not measured, is not gonna be improved. So how do we measure and quantify how our deployed LLMs are doing ? Especially with significant upfront and hidden costs of training, fine-tuning and maintaining these LLMs, it’s high time AI product leaders spend a significant time and effort in post model deployment metrics and insights. Historically, this phase has been the one with the least effort and time spent on – but the more we measure earlier, the more Return On Investments (ROI) we can gain.

What is ML / LLM Observability ?

Observability in the context of Machine Learning (ML) and Large Language Models (LLMs) refers to the ability to understand, monitor, and gain insights into the internal workings and behaviors of these models during training, validation, and inference. It involves tracking various metrics, logging relevant information, and visualizing key aspects to ensure that these systems are performing as expected and to troubleshoot issues when they arise. ML and LLM observability is crucial for maintaining model performance, understanding model behavior, and ensuring the reliability and effectiveness of these systems.

Here are some aspects of ML and LLM observability:

Data Monitoring:
- Observing the input data distribution and characteristics to ensure that the model is trained on representative and relevant data.
Model Performance Metrics:
- Tracking key metrics related to model performance, accuracy, precision, recall, and other relevant metrics during both training and inference phases.
Resource Utilization:
- Monitoring the utilization of computational resources such as GPU/CPU usage, memory consumption, and storage to ensure efficient usage and identify potential bottlenecks.
Training Progress:
- Observing the progress of model training, including metrics such as loss functions, learning curves, and convergence, to understand how well the model is learning from the data.
Inference Latency:
- Measuring the time it takes for the model to make predictions during inference and ensuring that it meets any latency requirements, especially for real-time applications.
Model Explainability:
- Implementing tools and methods to explain the decisions made by the model, providing insights into why a particular prediction was made, especially in critical applications where interpretability is essential.
Error Analysis:
- Analyzing and understanding the types of errors the model is making, identifying patterns, and iteratively improving the model based on this feedback.
Data Drift and Concept Drift:
- Detecting changes in the input data distribution (data drift) and changes in the relationship between input and output (concept drift) over time, and adapting the model accordingly.
Model Versioning:
- Managing different versions of models, tracking changes, and maintaining a clear history to facilitate reproducibility and model governance.
Security Monitoring:
- Observing and responding to potential security threats, such as adversarial attacks, to ensure the robustness and security of the ML or LLM system.

MLOps / LLOps for AI Product Leaders

It’s time Product Managers working in the AI / LLM domain understand not just the technical concepts of ML and LLM Ops, but also keep up with the latest tools and frameworks that come up post the explosion of updates in the LLM space. Here are some of the newest (few of them already established) LLMOps tools / companies that look promising with their Observability frameworks and offerings:

Image credits: Photo by Shubham Dhage on Unsplash

Product Bulb