Observability

Introduction

Observability is the ability to understand the internal state of a system (e.g., a software application, microservices, or infrastructure) by analyzing its external outputs, such as logs, metrics, and traces. It enables teams to diagnose issues, optimize performance, and answer questions about system behavior without prior knowledge of what might be wrong.

Logs
- Timestamped records of discrete events (e.g., errors, user actions).
- Example: A FastAPI app logging an HTTP request error.
- Describe What is the error
Metrics
- Numerical measurements of system performance over time (e.g., CPU usage, request latency).
- Example: Prometheus tracking the number of API requests per second.
- Describe Which type the error is
Traces
- Records of the end-to-end journey of a request through distributed systems.
- Describe Where to find error
- Example: OpenTelemetry visualizing how a user login flows through authentication, database, and payment services.

Last updated 2 months ago

Was this helpful?