Monitor OpenAI API Latency, Tokens, Rate Limits, and More
By Dmitri Melikyan | | 2 min read

Learn how to monitor and troubleshoot OpenAI API based applications in production using Graphsignal.

OpenAI APIs in Production Applications

OpenAI API is very easy to use, however, there are few challenges when it comes to production applications. Some of the production aspects of using the API include latency, rate limits and costs. The use of the API should be designed with these in mind. It includes properly managing rate limits, retrying requests, making sure completions fit within max_tokens limit, etc. Here is a couple of useful guides directly from OpenAI that explain many of the solutions.

Additionally, to make sure the use of API is continuously reliable and scalable in the constantly evolving production environments, visibility into API performance including data statistics is necessary. A good start would be logging some requests or reporting metrics.

Graphsignal offers a much simpler and at the same time a more specialized way to achieve visibility and observability of AI applications that use hosted inference APIs or serve models.

Monitoring OpenAI API Calls With Graphsignal

Graphsignal can automatically instrument and start tracing and monitoring OpenAI API calls. It's only necessary to setup the agent by providing Graphsignal API key and a deployment name. Sign up for a free account to get an API key.

import graphsignal

# Provide an API key directly or via GRAPHSIGNAL_API_KEY environment variable
graphsignal.configure(api_key='my-api-key' deployment='my-openai-app-prod')

See the Quick Start for complete setup instructions.

To demonstrate, I run this example app that makes OpenAI completion requests continuously. After running, sample traces and metrics are continuously recorded and available in the dashboard for analysis. Exceptions, such as RateLimitError, and latency outliers are also recorded automatically.

Graphsignal traces dashboard

We can look into a particular trace sample to answer questions about slow latency, rate limits, see the data statistics of that call and whether it may have been the reason. Or, we may also want to check the token count and if the completion generation finish reason was length instead of a stop.

Graphsignal trace dashboard

Additionally, performance, data metrics and resource utilization is available for every worker, to monitor applications over time and correlate any changes or issues.

Graphsignal metrics dashboard

Give it a try and let us know what you think. Follow us at @GraphsignalAI for updates.

Graphsignal also allows you to easily trace and monitor any function or code segment. See the Quick Start guide for instructions.