Deploy Deepseek R1 671B param model using Tensorfuse
Benchmark | Deepseek-R1 (671B) | Remarks |
---|---|---|
MMLU | 90.8% | Near state-of-the-art |
AIME 2024 (Pass@1) | 79.8% | Mathematical and reasoning abilities |
LiveCodeBench (Pass@1-COT) | 65.9% | Excels at multi-step reasoning |
deployment.yaml
).VLLM_API_KEY
) as a Tensorfuse secret. Unlike some other models, Deepseek-R1 671B does not require a separate Hugging Face token, so we can skip that step.
vllm-key
as your API key.
openssl rand -base64 32
and remember to keep it safe as Tensorfuse secrets are opaque.
readiness
endpoint in your config. Tensorfuse uses this endpoint to ensure that your service is healthy.
readiness
endpoint is configured, Tensorfuse tries the /readiness
path on port 80 by default which can cause issues if your app is not listening on that path.YOUR_APP_URL
with the endpoint shown in the output of the above command and run:
Dockerfile
(--model
flag).
• GPU resources in deployment.yaml
.
• (Optional) --tensor-parallel-size
depending on your hardware.
Below is a table summarizing the key changes for each variant:
Model Variant | Dockerfile Model Name | GPU Type | Num GPUs / Tensor parallel size |
---|---|---|---|
DeepSeek-R1 2B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | A10G | 1 |
DeepSeek-R1 7B | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | A10G | 1 |
DeepSeek-R1 8B | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A10G | 1 |
DeepSeek-R1 14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | L40S | 1 |
DeepSeek-R1 32B | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | L4 | 4 |
DeepSeek-R1 70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | L40S | 4 |
DeepSeek-R1 671B | deepseek-ai/DeepSeek-R1 | H100 | 8 |