How to Deploy FastAPI on VPS (2025) — Production Guide

Why FastAPI for Production APIs

FastAPI is a Python web framework for high-performance APIs. For Docker deployment, see Docker on Ubuntu VPS APIs. It is built on top of Starlette. For chat bots, see VPS for Telegram Bots and Pydantic for data validation. FastAPI applications run on the ASGI standard, which enables async request handling. For EU hosting, see Germany VPS without the limitations of synchronous WSGI frameworks.

Performance benchmarks consistently place FastAPI among the fastest Python frameworks, rivaling Node.js and Go frameworks for common API workloads. Its performance comes from leveraging Python's asyncio for concurrent request handling and uvicorn for the ASGI server. In production, Gunicorn manages multiple uvicorn worker processes, providing process management, graceful restarts, and multi-core utilization.

FastAPI's automatic OpenAPI documentation (Swagger UI and ReDoc) is generated from your code at runtime. Every endpoint, request model, and response model is documented without writing separate documentation files. This feature alone saves significant development time for teams building APIs that need to be consumed by frontend applications, mobile clients, or third-party integrations.

Pydantic's type validation catches invalid data at the request boundary before it reaches your business logic. This eliminates an entire class of bugs related to malformed input and provides clear, structured error responses. Combined with Python's type hints, FastAPI provides IDE autocompletion and static type checking that make large codebases maintainable.

Prerequisites

Requirement	Details
Operating System	Ubuntu 22.04 LTS or 24.04 LTS
Root Access	SSH access with root privileges or a sudo user
RAM	Minimum 1GB (2GB+ recommended for multiple Gunicorn workers)
Disk Space	Minimum 5GB free
Domain Name	A domain with A record pointing to your VPS IP (for SSL)
Ports	80 and 443 open

Step 1: Update System and Install Dependencies

Start by updating the system packages and installing the build dependencies required for Python packages that include C extensions (such as uvloop and cryptography).

ssh root@your-vps-ip
apt update && apt upgrade -y
apt install -y python3 python3-venv python3-pip build-essential libssl-dev libffi-dev curl git nginx certbot python3-certbot-nginx

Step 2: Install uv Package Manager

uv is a fast Python package manager written in Rust that replaces pip and virtualenv. It installs packages 10-100x faster than pip and manages virtual environments automatically.

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Verify the installation:

uv --version

Step 3: Create the Project Structure

A well-organized project structure makes deployment, maintenance, and scaling straightforward. Create the following directory layout:

mkdir -p /opt/fastapi-app/{app,config,logs}
cd /opt/fastapi-app

The final structure looks like this:

/opt/fastapi-app/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI application entry point
│   ├── api/
│   │   ├── __init__.py
│   │   └── routes.py    # API route definitions
│   └── core/
│       ├── __init__.py
│       └── config.py    # Configuration and environment variables
├── config/
│   ├── gunicorn.conf.py # Gunicorn configuration
│   └── .env             # Environment variables
├── logs/
│   ├── app.log
│   └── access.log
├── requirements.txt
└── start.sh

Step 4: Create the FastAPI Application

Create the application code files. Start with the configuration module that handles environment variables.

mkdir -p /opt/fastapi-app/app/{api,core}
touch /opt/fastapi-app/app/__init__.py
touch /opt/fastapi-app/app/api/__init__.py
touch /opt/fastapi-app/app/core/__init__.py

cat > /opt/fastapi-app/app/core/config.py << 'EOF'
import os
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    app_name: str = "FastAPI Production App"
    debug: bool = False
    log_level: str = "info"
    workers: int = 4
    host: str = "127.0.0.1"
    port: int = 8000
    allowed_origins: list[str] = ["*"]

    class Config:
        env_file = "/opt/fastapi-app/config/.env"

settings = Settings()
EOF

cat > /opt/fastapi-app/app/api/routes.py << 'EOF'
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from datetime import datetime

router = APIRouter()

class HealthResponse(BaseModel):
    status: str
    timestamp: datetime
    version: str

class ItemResponse(BaseModel):
    id: int
    name: str
    description: str

ITEMS_DB = [
    {"id": 1, "name": "First Item", "description": "Description for item 1"},
    {"id": 2, "name": "Second Item", "description": "Description for item 2"},
]

@router.get("/health", response_model=HealthResponse)
async def health_check():
    return HealthResponse(
        status="ok",
        timestamp=datetime.utcnow(),
        version="1.0.0"
    )

@router.get("/items", response_model=list[ItemResponse])
async def get_items():
    return ITEMS_DB

@router.get("/items/{item_id}", response_model=ItemResponse)
async def get_item(item_id: int):
    for item in ITEMS_DB:
        if item["id"] == item_id:
            return item
    raise HTTPException(status_code=404, detail="Item not found")

@router.post("/items", response_model=ItemResponse, status_code=201)
async def create_item(item: ItemResponse):
    ITEMS_DB.append(item.dict())
    return item
EOF

cat > /opt/fastapi-app/app/main.py << 'EOF'
import logging
import sys
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.gzip import GZipMiddleware
from fastapi.responses import JSONResponse
from app.core.config import settings
from app.api.routes import router

# Configure logging
logging.basicConfig(
    level=getattr(logging, settings.log_level.upper()),
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
    handlers=[
        logging.StreamHandler(sys.stdout),
        logging.FileHandler("/opt/fastapi-app/logs/app.log"),
    ]
)
logger = logging.getLogger("fastapi-app")

app = FastAPI(
    title=settings.app_name,
    version="1.0.0",
    docs_url="/docs" if settings.debug else None,
    redoc_url="/redoc" if settings.debug else None,
)

# Middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.allowed_origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
app.add_middleware(GZipMiddleware, minimum_size=1000)

# Include routers
app.include_router(router, prefix="/api/v1")

@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
    logger.error(f"Unhandled exception: {exc}", exc_info=True)
    return JSONResponse(
        status_code=500,
        content={"detail": "Internal server error"}
    )

@app.on_event("startup")
async def startup_event():
    logger.info(f"Starting {settings.app_name}")

@app.on_event("shutdown")
async def shutdown_event():
    logger.info("Shutting down application")
EOF

Step 5: Create Requirements and Install Dependencies

cat > /opt/fastapi-app/requirements.txt << 'EOF'
fastapi==0.115.0
uvicorn[standard]==0.30.0
gunicorn==22.0.0
pydantic-settings==2.3.0
python-multipart==0.0.9
EOF

Create a virtual environment with uv and install the dependencies:

cd /opt/fastapi-app
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

Step 6: Configure Gunicorn

Gunicorn is the production WSGI/ASGI server that manages multiple worker processes. For FastAPI, Gunicorn uses the uvicorn worker class. The configuration file controls the number of workers, timeouts, logging, and process management behavior.

cat > /opt/fastapi-app/config/gunicorn.conf.py << 'EOF'
import multiprocessing
import os

# Server socket
bind = "127.0.0.1:8000"
backlog = 2048

# Worker processes
workers = int(os.environ.get("WORKERS", multiprocessing.cpu_count() * 2 + 1))
worker_class = "uvicorn.workers.UvicornWorker"
worker_connections = 1000
timeout = 120
keepalive = 5
max_requests = 5000
max_requests_jitter = 500

# Logging
accesslog = "/opt/fastapi-app/logs/access.log"
errorlog = "/opt/fastapi-app/logs/app.log"
loglevel = "info"
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" %(D)s'

# Security
limit_request_line = 8190
limit_request_fields = 100
limit_request_field_size = 8190

# Process naming
proc_name = "fastapi-app"

# Preload application (shares memory between workers)
preload_app = True
EOF

Key configuration decisions: The worker count defaults to (CPU cores * 2) + 1, which is the standard recommendation for I/O-bound applications. The max_requests and max_requests_jitter settings enable graceful worker recycling — each worker handles a random number of requests between 4500 and 5500 before being restarted. This prevents memory leaks from accumulating in long-running workers and distributes restarts evenly over time.

preload_app = True loads the application code before forking workers. This shares the loaded code pages across all workers via copy-on-write, reducing overall memory usage by 30-50% on typical Python applications.

Important: If your application uses database connection pools or similar resources, you may need to close and recreate them in the post_fork hook when using preload_app. Without this, all workers share the same connection objects, which can cause protocol errors with PostgreSQL and other databases.

Step 7: Create the Environment File

cat > /opt/fastapi-app/config/.env << 'EOF'
APP_NAME=FastAPI Production App
DEBUG=false
LOG_LEVEL=info
WORKERS=4
HOST=127.0.0.1
PORT=8000
ALLOWED_ORIGINS=["https://yourdomain.com"]
EOF

Secure the environment file:

chmod 600 /opt/fastapi-app/config/.env

Step 8: Create the Systemd Service File

Systemd manages the FastAPI process lifecycle: it starts the application on boot, restarts it on failure, captures logs, and allows you to manage the service with standard systemctl commands. This is more reliable than running a screen or tmux session, which terminates when you disconnect.

cat > /etc/systemd/system/fastapi-app.service << 'EOF'
[Unit]
Description=FastAPI Application
After=network.target

[Service]
Type=notify
User=root
Group=root
WorkingDirectory=/opt/fastapi-app
Environment="PATH=/opt/fastapi-app/.venv/bin"
EnvironmentFile=/opt/fastapi-app/config/.env
ExecStart=/opt/fastapi-app/.venv/bin/gunicorn -c /opt/fastapi-app/config/gunicorn.conf.py app.main:app
ExecReload=/bin/kill -s HUP $MAINPID
Restart=always
RestartSec=5
TimeoutStopSec=30

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/fastapi-app/logs
PrivateTmp=true

# Resource limits
LimitNOFILE=65536

# Logging
StandardOutput=append:/opt/fastapi-app/logs/app.log
StandardError=append:/opt/fastapi-app/logs/app.log

[Install]
WantedBy=multi-user.target
EOF

Enable and start the service:

systemctl daemon-reload
systemctl enable fastapi-app
systemctl start fastapi-app

Check the service status:

systemctl status fastapi-app

Test that the application is responding:

curl http://127.0.0.1:8000/api/v1/health

The response should be: {"status":"ok","timestamp":"...","version":"1.0.0"}

Useful Systemd Commands

# View live logs
journalctl -u fastapi-app -f

# Restart the application
systemctl restart fastapi-app

# Reload without downtime (if supported by your app)
systemctl reload fastapi-app

# Stop the application
systemctl stop fastapi-app

# Check recent logs
journalctl -u fastapi-app --no-pager -n 50

Step 9: Configure Nginx Reverse Proxy

Nginx sits in front of Gunicorn and handles TLS termination, static file serving, request buffering, and rate limiting. It should never expose the Gunicorn port (8000) directly to the internet.

cat > /etc/nginx/sites-available/fastapi-app << 'EOF'
# Rate limiting zone
limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;

# Upstream Gunicorn server
upstream fastapi_backend {
    server 127.0.0.1:8000;
    keepalive 32;
}

server {
    listen 80;
    server_name yourdomain.com www.yourdomain.com;

    # Certbot challenge path
    location /.well-known/acme-challenge/ {
        root /var/www/html;
    }

    # Redirect all HTTP to HTTPS
    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl http2;
    server_name yourdomain.com www.yourdomain.com;

    # SSL certificates (will be configured by certbot)
    ssl_certificate /etc/letsencrypt/live/yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourdomain.com/privkey.pem;

    # SSL configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;

    # Security headers
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;

    # Gzip compression
    gzip on;
    gzip_types text/plain application/json application/javascript text/css;
    gzip_min_length 1000;

    # Request size limit
    client_max_body_size 10M;

    # API proxy
    location / {
        limit_req zone=api burst=50 nodelay;

        proxy_pass http://fastapi_backend;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection "";

        # Timeouts
        proxy_connect_timeout 10s;
        proxy_read_timeout 120s;
        proxy_send_timeout 120s;

        # Buffering
        proxy_buffering on;
        proxy_buffer_size 4k;
        proxy_buffers 8 16k;
        proxy_busy_buffers_size 32k;
    }

    # Health check endpoint (no rate limiting)
    location /api/v1/health {
        proxy_pass http://fastapi_backend;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header Connection "";
        access_log off;
    }

    # Block access to hidden files
    location ~ /\. {
        deny all;
        access_log off;
        log_not_found off;
    }
}
EOF

Enable the Nginx configuration by creating a symlink and removing the default configuration:

ln -sf /etc/nginx/sites-available/fastapi-app /etc/nginx/sites-enabled/
rm -f /etc/nginx/sites-enabled/default
nginx -t

The nginx -t command validates the configuration syntax. If it reports "syntax is ok" and "test is successful," reload Nginx. If SSL certificate files do not exist yet (first deployment), temporarily comment out the SSL server block, start Nginx, obtain the certificate, then uncomment and reload.

Step 10: Obtain SSL Certificate

Obtain a free SSL certificate from Let's Encrypt using Certbot. Ensure your domain DNS A record is already pointing to your VPS IP before running this command.

certbot --nginx -d yourdomain.com -d www.yourdomain.com --non-interactive --agree-tos --email your-email@example.com --redirect

Certbot automatically modifies your Nginx configuration to include the SSL certificate paths and sets up HTTP to HTTPS redirection. Verify the SSL certificate is working:

curl -I https://yourdomain.com/api/v1/health

The response should show HTTP/2 200 with the Strict-Transport-Security header.

Set up automatic certificate renewal. Certbot installs a cron job, but verify it is working:

certbot renew --dry-run

Step 11: Configure Firewall

Restrict inbound traffic to only SSH, HTTP, and HTTPS. The Gunicorn port (8000) must not be exposed externally — it should only be accessible from localhost through the Nginx reverse proxy.

ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw allow 80/tcp
ufw allow 443/tcp
ufw enable

Environment Variables in Production

Never hardcode configuration values in your source code. Use environment variables for all settings that differ between environments (development, staging, production). The pydantic-settings package reads variables from the .env file and environment, with environment variables taking precedence.

# Set environment variables in the systemd service
# Or add them to /opt/fastapi-app/config/.env

DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/mydb
REDIS_URL=redis://localhost:6379/0
SECRET_KEY=your-secret-key-here
API_KEY=your-api-key-here
SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USER=noreply@example.com
SMTP_PASSWORD=your-smtp-password

For secrets, consider using a secrets manager or at minimum, restricting the .env file permissions. Never commit the .env file to version control.

Logging and Monitoring

Application Logging

The logging configuration in app/main.py writes to both stdout (captured by journald) and a file. Use journalctl for real-time log viewing and the log files for persistent storage and analysis.

# Real-time application logs
journalctl -u fastapi-app -f

# Access logs
tail -f /opt/fastapi-app/logs/access.log

# Application error logs
tail -f /opt/fastapi-app/logs/app.log

Log Rotation

Set up logrotate to prevent log files from consuming all disk space:

cat > /etc/logrotate.d/fastapi-app << 'EOF'
/opt/fastapi-app/logs/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 0640 root root
    sharedscripts
    postrotate
        systemctl reload fastapi-app > /dev/null 2>&1 || true
    endscript
}
EOF

Monitoring with Prometheus

Add Prometheus metrics to your FastAPI application for production monitoring. Install the required packages and add a metrics endpoint.

source /opt/fastapi-app/.venv/bin/activate
uv pip install prometheus-fastapi-instrumentator

Add the instrumentator to your main.py file:

from prometheus_fastapi_instrumentator import Instrumentator

# After creating the app
Instrumentator().instrument(app).expose(app, endpoint="/metrics")

Configure Prometheus to scrape your application metrics. Add this to your Prometheus scrape configuration:

- job_name: 'fastapi-app'
  scrape_interval: 15s
  static_configs:
    - targets: ['localhost:8000']
  metrics_path: '/metrics'

Key metrics to monitor: request latency (histogram), request count by endpoint and status code, active connections, worker process count, and memory usage per worker.

Docker Alternative Deployment

If you prefer containerized deployment, you can run FastAPI in Docker with Gunicorn. This approach is useful when you want reproducible builds or need to orchestrate the API with other services (databases, caches, message queues).

cat > /opt/fastapi-app/Dockerfile << 'EOF'
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install uv
COPY requirements.txt .
RUN uv pip install --system --no-cache-dir -r requirements.txt

FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin/gunicorn /usr/local/bin/gunicorn
COPY --from=builder /usr/local/bin/uvicorn /usr/local/bin/uvicorn
COPY . .

EXPOSE 8000
CMD ["gunicorn", "-c", "config/gunicorn.conf.py", "app.main:app"]
EOF

cat > /opt/fastapi-app/docker-compose.yml << 'EOF'
services:
  api:
    build: .
    container_name: fastapi-app
    ports:
      - "8000:8000"
    environment:
      - DEBUG=false
      - LOG_LEVEL=info
      - WORKERS=4
    volumes:
      - ./logs:/app/logs
      - ./config/.env:/app/config/.env:ro
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
      interval: 30s
      timeout: 10s
      retries: 3
EOF

Build and run the Docker container:

cd /opt/fastapi-app
docker compose up -d --build

Performance Tuning

Worker Count

The formula workers = (CPU cores * 2) + 1 works well for I/O-bound APIs. For CPU-bound workloads (heavy computation, image processing), use workers = CPU cores + 1 instead. On a 2-core VPS, the I/O-bound formula gives 5 workers, and the CPU-bound formula gives 3 workers.

Connection Pooling

Use connection pooling for database access. SQLAlchemy's async engine with create_async_engine supports connection pooling natively. Configure pool size based on your worker count:

engine = create_async_engine(
    DATABASE_URL,
    pool_size=20,           # Number of persistent connections
    max_overflow=10,        # Additional connections when pool is full
    pool_timeout=30,        # Seconds to wait for a connection
    pool_recycle=1800,      # Recycle connections after 30 minutes
    pool_pre_ping=True      # Verify connections before use
)

Response Caching

Cache responses for frequently accessed, rarely changing data. Use Redis as a cache backend with the fastapi-cache2 package:

from fastapi_cache import FastAPICache
from fastapi_cache.backends.redis import RedisBackend
from fastapi_cache.decorator import cache

@router.get("/items")
@cache(expire=60)  # Cache for 60 seconds
async def get_items():
    return ITEMS_DB

Async Database Queries

Always use async database drivers. For PostgreSQL, use asyncpg (not psycopg2). For MySQL, use aiomysql. Synchronous database drivers block the event loop and defeat the purpose of async FastAPI.

Troubleshooting Common Issues

1. 502 Bad Gateway from Nginx

Symptom: Nginx returns 502 Bad Gateway errors, but the application works when tested directly with curl on port 8000.

Solution: Nginx cannot reach Gunicorn. Check that Gunicorn is running (systemctl status fastapi-app). Verify the upstream address in the Nginx configuration matches the Gunicorn bind address. Check that SELinux or AppArmor is not blocking the connection. Check Nginx error logs:

tail -20 /var/log/nginx/error.log

2. Workers Timing Out on Long Requests

Symptom: Requests that take longer than a certain time return a 504 Gateway Timeout.

Solution: Increase the Gunicorn timeout in gunicorn.conf.py (timeout = 120 or higher) and the Nginx proxy_read_timeout (proxy_read_timeout 120s). Make sure both values are coordinated — Nginx should have a slightly higher timeout than Gunicorn so Nginx does not close the connection before Gunicorn finishes processing.

3. Memory Usage Growing Over Time

Symptom: The application's memory footprint grows steadily until the VPS runs out of RAM.

Solution: Enable worker recycling with max_requests and max_requests_jitter in the Gunicorn configuration. This restarts workers periodically, freeing any accumulated memory. Check for common Python memory leaks: unclosed database connections, growing caches or lists, circular references in global state, or file handles that are never closed.

4. Permission Denied Errors

Symptom: The application fails to start or cannot write to log files.

Solution: Verify that the systemd service user has read access to the application directory and write access to the logs directory. If you changed the service user from root to a dedicated user (recommended for production), ensure that user owns the application files:

chown -R appuser:appuser /opt/fastapi-app

5. SSL Certificate Renewal Fails

Symptom: Certbot renewal fails with a connection or authorization error.

Solution: Verify that port 80 is open and reachable from the internet. Certbot's HTTP-01 challenge requires inbound HTTP traffic. Check that your DNS records are correct. Run certbot renew --dry-run to test the renewal process without actually renewing. Check the Certbot logs at /var/log/letsencrypt/letsencrypt.log for specific error messages.

Frequently Asked Questions

Is FastAPI ready for production?

Yes. FastAPI has been used in production by companies of all sizes since its initial release in 2018. Its async capabilities, automatic validation, and OpenAPI documentation make it well-suited for production API deployments. The key is using a proper ASGI server (Gunicorn with uvicorn workers) rather than the development server.

How many requests per second can FastAPI handle?

On a 2-core VPS with 4 Gunicorn workers, a simple JSON API endpoint typically handles 1,000-3,000 requests per second. Actual throughput depends on the endpoint complexity, database queries, and payload size. A "Hello World" endpoint can reach 5,000+ RPS. Endpoints with database queries average 500-1,500 RPS depending on query complexity.

Should I use Docker or bare-metal deployment?

Bare-metal (Systemd + Nginx) has slightly lower overhead and is simpler for a single application. Docker adds containerization benefits: reproducible builds, easier scaling to multiple services, and simplified deployments. Use bare-metal if you are deploying a single API. Use Docker if you are deploying multiple services (API + database + cache + queue) or need CI/CD pipeline integration.

How do I handle database migrations in production?

Use Alembic for database migrations. Create migration scripts locally, test them, and run them on the VPS during deployment. Add migration commands to your deployment script or systemd service pre-start hook. Never run migrations automatically on every deploy — some migrations are destructive and need review.

Can I deploy FastAPI behind Cloudflare?

Yes. Point your domain DNS to Cloudflare, configure Cloudflare in proxy mode, and set the SSL mode to "Full (Strict)." Cloudflare handles caching, DDoS protection, and CDN distribution. Your Nginx still terminates TLS with the Let's Encrypt certificate. Cloudflare connects to your origin over HTTPS.

How do I scale FastAPI beyond a single VPS?

Run multiple VPS instances behind a load balancer (Nginx, HAProxy, or a cloud load balancer). Each instance runs identical code. Use a shared PostgreSQL instance (managed or self-hosted) and a shared Redis instance for caching and sessions. Store session state and file uploads in object storage (S3, MinIO) rather than the local filesystem.

What is the difference between uvicorn and Gunicorn?

Uvicorn is the ASGI server that actually runs your FastAPI application. Gunicorn is a process manager that forks multiple uvicorn worker processes. In production, you use Gunicorn to manage uvicorn workers because a single uvicorn process can only use one CPU core. Gunicorn distributes requests across multiple workers to utilize all available cores.

How do I update the application code?

Pull the latest code (git pull or manual upload), install any new dependencies, and restart the service: systemctl restart fastapi-app. Gunicorn performs a graceful restart — it finishes in-flight requests with the old workers while starting new workers with the updated code. For zero-downtime deployments, consider a blue-green strategy with Docker or use systemd's ExecReload for reload-on-SIGHUP.

FastAPI API testing — Testing FastAPI endpoints

FastAPI deployment — FastAPI application running on VPS