Container Deployment Guide¶

This document provides comprehensive build instructions and deployment guidance for all Docker images in the VLLM Evaluation system.

Overview¶

The VLLM Eval system uses containerized evaluation services with cross-platform build support. All images follow consistent naming conventions and deployment patterns.

Naming Convention¶

macOS builds: Tag ends with -mac
Linux/amd64 builds: Tag ends with -linux

Prerequisites¶

Docker 24+ with Buildx enabled (for cross-platform builds)
Access to ghcr.io/thakicloud/ container registry

Available Images¶

1. Evalchemy¶

Purpose: EleutherAI lm-evaluation-harness based unified benchmark runner

macOS (local arch)

docker build -f docker/evalchemy.Dockerfile -t ghcr.io/thakicloud/evalchemy-mac:latest .
docker push ghcr.io/thakicloud/evalchemy-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/evalchemy.Dockerfile \
  -t ghcr.io/thakicloud/evalchemy-linux:latest .
docker push ghcr.io/thakicloud/evalchemy-linux:latest

2. Standard Evalchemy¶

Purpose: Standard benchmark evaluation with predefined tasks

macOS (local arch)

docker build -f docker/standard-evalchemy.Dockerfile \
  -t ghcr.io/thakicloud/standard-evalchemy-mac:latest .
docker push ghcr.io/thakicloud/standard-evalchemy-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/standard-evalchemy.Dockerfile \
  -t ghcr.io/thakicloud/standard-evalchemy-linux:latest .
docker push ghcr.io/thakicloud/standard-evalchemy-linux:latest

3. NVIDIA Eval¶

Purpose: NVIDIA-specific evaluation tasks including AIME and LiveCodeBench

macOS (local arch)

docker build -f docker/nvidia-eval.Dockerfile -t ghcr.io/thakicloud/nvidia-eval-mac:latest .
docker push ghcr.io/thakicloud/nvidia-eval-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/nvidia-eval.Dockerfile \
  -t ghcr.io/thakicloud/nvidia-eval-linux:latest .
docker push ghcr.io/thakicloud/nvidia-eval-linux:latest

4. VLLM Benchmark¶

Purpose: VLLM performance benchmarking and evaluation

macOS (local arch)

docker build -f docker/vllm-benchmark.Dockerfile -t ghcr.io/thakicloud/vllm-benchmark-mac:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/vllm-benchmark.Dockerfile \
  -t ghcr.io/thakicloud/vllm-benchmark-linux:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux:latest

5. VLLM Benchmark Linux¶

Purpose: Linux-optimized VLLM benchmarking

macOS (local arch)

docker build -f docker/vllm-benchmark-linux.Dockerfile -t ghcr.io/thakicloud/vllm-benchmark-linux-mac:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/vllm-benchmark-linux.Dockerfile \
  -t ghcr.io/thakicloud/vllm-benchmark-linux:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux:latest

6. Deepeval¶

Purpose: PyTest-style LLM evaluation framework with custom metrics

macOS (local arch)

docker build -f docker/deepeval.Dockerfile -t ghcr.io/thakicloud/deepeval-mac:latest .
docker push ghcr.io/thakicloud/deepeval-mac:latest

Linux/amd64

docker buildx build --platform linux/amd64 \
  -f docker/deepeval.Dockerfile \
  -t ghcr.io/thakicloud/deepeval-linux:latest .
docker push ghcr.io/thakicloud/deepeval-linux:latest

Container Configuration¶

Security Features¶

All images use non-root user evaluser where applicable
Minimal base images for reduced attack surface
Proper permission handling for mounted volumes

Standard Directories¶

Results: /app/results - Output directory for evaluation results
Parsed: /app/parsed - Directory for processed evaluation data
Volumes: Typically mounted to these paths in Kubernetes deployments

Health Checks¶

Most images include built-in health checks. See individual Dockerfiles for specific implementation details.

Deployment Patterns¶

Kubernetes Integration¶

These images are designed for deployment in Kubernetes environments using:

Argo Workflows: For orchestrating evaluation pipelines
GPU Scheduling: Images support NVIDIA GPU allocation
Resource Limits: Configured for optimal resource utilization

Volume Mounts¶

# Example Kubernetes volume configuration
volumeMounts:
- name: results-volume
  mountPath: /app/results
- name: parsed-volume
  mountPath: /app/parsed

Troubleshooting¶

Common Issues¶

Exec format error

Cause: Building for incorrect platform
Solution: Use Buildx with --platform linux/amd64 for Linux images

Windows line endings

Cause: CRLF line endings in scripts
Solution: Ensure scripts use LF or run: sed -i 's/\r$//' /app/*.sh

Permission denied

Cause: Scripts not executable
Solution: Run container with root to diagnose and chmod +x /app/*.sh

Network issues

Cause: Container networking problems
Solution: Use --network host for local testing or verify connectivity

Build Automation¶

CI/CD Integration¶

These build commands are integrated into the CI/CD pipeline for:

Automatic builds on code changes
Multi-platform image generation
Container registry deployment
Security scanning and validation

Version Management¶

Images are tagged with: - latest for current stable builds - release-{version} for specific releases - Platform-specific suffixes (-mac, -linux)

Evalchemy API - Evalchemy service API reference
Deepeval API - Deepeval service API reference
NVIDIA Eval API - NVIDIA evaluation API reference
VLLM Benchmark API - VLLM benchmarking API reference

Container Deployment Guide¶

Overview¶

Naming Convention¶

Prerequisites¶

Available Images¶

1. Evalchemy¶

2. Standard Evalchemy¶

3. NVIDIA Eval¶

4. VLLM Benchmark¶

5. VLLM Benchmark Linux¶

6. Deepeval¶

Container Configuration¶

Security Features¶

Standard Directories¶

Health Checks¶

Deployment Patterns¶

Kubernetes Integration¶

Volume Mounts¶

Troubleshooting¶

Common Issues¶

Build Automation¶

CI/CD Integration¶

Version Management¶

Related Documentation¶