BUILDING A PRODUCTION-GRADE MONITORING SYSTEM FOR FLASK WITH PROMETHEUS & GRAFANA

Building a Production-Grade Monitoring System for Flask with Prometheus & Grafana

Introduction

In modern DevOps, monitoring is critical. Without visibility into your application’s performance, errors can go unnoticed until users complain.

In this guide, I’ll walk through how I built a Dockerized monitoring system for a Flask app using: ✅ Prometheus (metrics collection) ✅ Grafana (visualization) ✅ Docker Compose (easy deployment)

Step 1: Setting Up the Flask App with Metrics

1.1 Instrumenting Flask with Prometheus

First, I added Prometheus metrics to a simple Flask app:

from flask import Flask
from prometheus_client import Counter, Histogram, generate_latest

app = Flask(__name__)

# Define metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP Requests', ['method', 'endpoint'])
RESPONSE_TIME = Histogram('http_response_time_seconds', 'Response time in seconds')

@app.route('/')
def home():
    start_time = time.time()
    REQUEST_COUNT.labels(method='GET', endpoint='/').inc()

    # Simulate work
    time.sleep(0.2)

    # Record response time
    RESPONSE_TIME.observe(time.time() - start_time)
    return "Hello, monitored world!"

@app.route('/metrics')
def metrics():
    return generate_latest(), 200, {'Content-Type': 'text/plain'}

1.2 Dockerizing the Flask App

I containerized the app for reproducibility:

# app/Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]

Step 2: Configuring Prometheus

2.1 Prometheus Configuration

I set up prometheus.yml to scrape Flask metrics:

# prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'flask-app'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['flask-app:8000']  # Docker service name
Why This Matters:

scrape_interval: 15s ensures frequent metric collection.

Prometheus auto-discovers targets via Docker networking.

2.2 Adding System Metrics (Node Exporter)
For server monitoring, I added Node Exporter:
# In docker-compose.yml
services:
  node-exporter:
    image: prom/node-exporter
    ports:
      - "9100:9100"

Then updated prometheus.yml:
scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

Step 3: Visualizing Data in Grafana

3.1 Pre-Built Dashboards

I imported these dashboards (Grafana IDs):

  1. Node Exporter Full (1860) (CPU/RAM/Disk)

  2. HTTP Metrics (13230) (Flask monitoring)

Steps: Go to Grafana → "+" → "Import" → Enter ID. Select Prometheus as the data source.

3.2 Custom Dashboard for Flask

I created a custom dashboard to track:

  1. Request rate (rate(http_requests_total[5m]))

  2. Error rate (rate(http_requests_total{status=~"5.."}[5m]))

  3. Latency (histogram_quantile(0.95, rate(http_response_time_seconds_bucket[5m])))

Step 4: Setting Up Alerts

4.1 Alert Rules in Grafana

I configured alerts for: 1. High error rate (>5% 5xx responses for 5min) 2. Slow responses (p95 latency >1s)

Example alert rule

- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "High error rate on {{ $labels.instance }}"

4.2 Slack Notifications

I integrated Slack using Grafana’s webhook:

Created a Slack incoming webhook. Added it in Grafana → "Alerting" → "Notification channels".

Step 5: Deployment with Docker Compose

Final docker-compose.yml:

version: '3.8'

services:
  flask-app:
    build: ./app
    ports:
      - "5000:5000"
      - "8000:8000"

  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus:/etc/prometheus
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana-storage:/var/lib/grafana
    ports:
      - "3000:3000"

volumes:
  grafana-storage:

Deploy with one command:

docker-compose up --build -d

Lessons Learned

🔹 Docker networking is crucial – Services communicate via hostnames (flask-app:8000). 🔹 Alert thresholds should be realistic – Avoid alert fatigue. 🔹 Grafana variables make dashboards reusable – Filter by endpoint, status_code, etc.

GitHub Repo HERE

×