Site icon FSIBLOG

How to Cache Python Pip Requirements for Reliable Docker Builds

How to Cache Python Pip Requirements for Reliable Docker Builds

How to Cache Python Pip Requirements for Reliable Docker Builds

You’re working on a Python project, and every time you run docker compose up --build, the RUN pip install -r requirements.txt step fails halfway because your internet connection is slower than a dial-up modem. When you retry, Docker starts from scratch, redownloading all packages again. Frustrating? Absolutely.

Why does this happen?

  1. Docker’s Layer Caching: If a step (like pip install) fails, Docker invalidates the cache for that layer and everything after it.
  2. No Persisted Pip Cache: By default, pip doesn’t save downloaded packages between builds. Every failure means starting over.

Cache Packages or Go Offline

Here’s how I solved this for good and how you can too.

Use Docker BuildKit Cache Mounts

Modern Docker builds (BuildKit) can persist pip’s cache across runs.

# syntax=docker/dockerfile:1.4
FROM python:3.11

ENV PYTHONUNBUFFERED=1
WORKDIR /app

# Copy requirements first to leverage Docker layer caching
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

COPY . .

CMD celery -A myapp worker -l info -Q ${CELERY_QUEUE}

What’s happening here?

Run it with BuildKit enabled:

DOCKER_BUILDKIT=1 docker compose up --build

Why this works:

Offline Installs with Pre-Downloaded Packages

No internet? No problem. Pre-download packages and install them offline.

Download packages locally
On your host machine:

pip download -r requirements.txt -d ./pip_packages

This creates a pip_packages folder with all dependencies.

Modify the Dockerfile

FROM python:3.11

ENV PYTHONUNBUFFERED=1
WORKDIR /app

# Copy pre-downloaded packages
COPY pip_packages /pip_packages
COPY requirements.txt .

# Install from local directory (no internet!)
RUN pip install --no-index --find-links=/pip_packages -r requirements.txt

COPY . .

CMD celery -A myapp worker -l info -Q ${CELERY_QUEUE}

Key flags:

Perfect for:

Hybrid Caching for Best Results

Combine BuildKit caching with a local package fallback:

# syntax=docker/dockerfile:1.4
FROM python:3.11

ENV PYTHONUNBUFFERED=1
WORKDIR /app

COPY requirements.txt .

# Try using BuildKit cache first
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt || true  # Don't fail if cache is incomplete

# Fallback to local packages
COPY pip_packages /pip_packages
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --no-index --find-links=/pip_packages -r requirements.txt

COPY . .

CMD celery -A myapp worker -l info -Q ${CELERY_QUEUE}

Why this rocks:

Final Thoughts

  1. Use BuildKit if you control the build environment. It’s seamless and fast.
  2. Pre-download packages for offline scenarios or flaky networks.
  3. Hybrid approach is gold for mission-critical builds.

Pro Tips:

By caching pip’s downloads or going fully offline, I turned my Docker builds from a hair-pulling ordeal into a smooth process. Now I can finally focus on coding not waiting for packages to download.

Exit mobile version