Container Deployment Guide¶
This document provides comprehensive build instructions and deployment guidance for all Docker images in the VLLM Evaluation system.
Overview¶
The VLLM Eval system uses containerized evaluation services with cross-platform build support. All images follow consistent naming conventions and deployment patterns.
Naming Convention¶
- macOS builds: Tag ends with
-mac
- Linux/amd64 builds: Tag ends with
-linux
Prerequisites¶
- Docker 24+ with Buildx enabled (for cross-platform builds)
- Access to
ghcr.io/thakicloud/
container registry
Available Images¶
1. Evalchemy¶
Purpose: EleutherAI lm-evaluation-harness based unified benchmark runner
macOS (local arch)
docker build -f docker/evalchemy.Dockerfile -t ghcr.io/thakicloud/evalchemy-mac:latest .
docker push ghcr.io/thakicloud/evalchemy-mac:latest
Linux/amd64
docker buildx build --platform linux/amd64 \
-f docker/evalchemy.Dockerfile \
-t ghcr.io/thakicloud/evalchemy-linux:latest .
docker push ghcr.io/thakicloud/evalchemy-linux:latest
2. Standard Evalchemy¶
Purpose: Standard benchmark evaluation with predefined tasks
macOS (local arch)
docker build -f docker/standard-evalchemy.Dockerfile \
-t ghcr.io/thakicloud/standard-evalchemy-mac:latest .
docker push ghcr.io/thakicloud/standard-evalchemy-mac:latest
Linux/amd64
docker buildx build --platform linux/amd64 \
-f docker/standard-evalchemy.Dockerfile \
-t ghcr.io/thakicloud/standard-evalchemy-linux:latest .
docker push ghcr.io/thakicloud/standard-evalchemy-linux:latest
3. NVIDIA Eval¶
Purpose: NVIDIA-specific evaluation tasks including AIME and LiveCodeBench
macOS (local arch)
docker build -f docker/nvidia-eval.Dockerfile -t ghcr.io/thakicloud/nvidia-eval-mac:latest .
docker push ghcr.io/thakicloud/nvidia-eval-mac:latest
Linux/amd64
docker buildx build --platform linux/amd64 \
-f docker/nvidia-eval.Dockerfile \
-t ghcr.io/thakicloud/nvidia-eval-linux:latest .
docker push ghcr.io/thakicloud/nvidia-eval-linux:latest
4. VLLM Benchmark¶
Purpose: VLLM performance benchmarking and evaluation
macOS (local arch)
docker build -f docker/vllm-benchmark.Dockerfile -t ghcr.io/thakicloud/vllm-benchmark-mac:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-mac:latest
Linux/amd64
docker buildx build --platform linux/amd64 \
-f docker/vllm-benchmark.Dockerfile \
-t ghcr.io/thakicloud/vllm-benchmark-linux:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux:latest
5. VLLM Benchmark Linux¶
Purpose: Linux-optimized VLLM benchmarking
macOS (local arch)
docker build -f docker/vllm-benchmark-linux.Dockerfile -t ghcr.io/thakicloud/vllm-benchmark-linux-mac:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux-mac:latest
Linux/amd64
docker buildx build --platform linux/amd64 \
-f docker/vllm-benchmark-linux.Dockerfile \
-t ghcr.io/thakicloud/vllm-benchmark-linux:latest .
docker push ghcr.io/thakicloud/vllm-benchmark-linux:latest
6. Deepeval¶
Purpose: PyTest-style LLM evaluation framework with custom metrics
macOS (local arch)
docker build -f docker/deepeval.Dockerfile -t ghcr.io/thakicloud/deepeval-mac:latest .
docker push ghcr.io/thakicloud/deepeval-mac:latest
Linux/amd64
docker buildx build --platform linux/amd64 \
-f docker/deepeval.Dockerfile \
-t ghcr.io/thakicloud/deepeval-linux:latest .
docker push ghcr.io/thakicloud/deepeval-linux:latest
Container Configuration¶
Security Features¶
- All images use non-root user
evaluser
where applicable - Minimal base images for reduced attack surface
- Proper permission handling for mounted volumes
Standard Directories¶
- Results:
/app/results
- Output directory for evaluation results - Parsed:
/app/parsed
- Directory for processed evaluation data - Volumes: Typically mounted to these paths in Kubernetes deployments
Health Checks¶
Most images include built-in health checks. See individual Dockerfiles for specific implementation details.
Deployment Patterns¶
Kubernetes Integration¶
These images are designed for deployment in Kubernetes environments using:
- Argo Workflows: For orchestrating evaluation pipelines
- GPU Scheduling: Images support NVIDIA GPU allocation
- Resource Limits: Configured for optimal resource utilization
Volume Mounts¶
# Example Kubernetes volume configuration
volumeMounts:
- name: results-volume
mountPath: /app/results
- name: parsed-volume
mountPath: /app/parsed
Troubleshooting¶
Common Issues¶
Exec format error
- Cause: Building for incorrect platform
- Solution: Use Buildx with
--platform linux/amd64
for Linux images
Windows line endings
- Cause: CRLF line endings in scripts
- Solution: Ensure scripts use LF or run:
sed -i 's/\r$//' /app/*.sh
Permission denied
- Cause: Scripts not executable
- Solution: Run container with root to diagnose and
chmod +x /app/*.sh
Network issues
- Cause: Container networking problems
- Solution: Use
--network host
for local testing or verify connectivity
Build Automation¶
CI/CD Integration¶
These build commands are integrated into the CI/CD pipeline for:
- Automatic builds on code changes
- Multi-platform image generation
- Container registry deployment
- Security scanning and validation
Version Management¶
Images are tagged with:
- latest
for current stable builds
- release-{version}
for specific releases
- Platform-specific suffixes (-mac
, -linux
)
Related Documentation¶
- Evalchemy API - Evalchemy service API reference
- Deepeval API - Deepeval service API reference
- NVIDIA Eval API - NVIDIA evaluation API reference
- VLLM Benchmark API - VLLM benchmarking API reference