Deepseek-R1 is an advanced large language model designed to handle a wide range of conversational and generative tasks. It has proven capabilities in various benchmarks and excels in complex reasoning. In this guide, we will walk you through deploying the Deepseek-R1 671B parameter model on your cloud account using Tensorfuse. We will be using H100 GPUs for this example, however it is super easy to deploy on other GPUs as well (see Tip).Documentation Index
Fetch the complete documentation index at: https://tensorfuse.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
Why Build with Deepseek-R1
Deepseek-R1 offers:- High Performance on Evaluations: Achieves strong results on industry-standard benchmarks.
- Advanced Reasoning: Handles multi-step logical reasoning tasks with minimal context.
- Multilingual Support: Pretrained on diverse linguistic data, making it adept at multilingual understanding.
- Scalable Distilled Models: Smaller distilled variants (2B, 7B, 32B, 70B) offer cheaper options without compromising on cost.
| Benchmark | Deepseek-R1 (671B) | Remarks |
|---|---|---|
| MMLU | 90.8% | Near state-of-the-art |
| AIME 2024 (Pass@1) | 79.8% | Mathematical and reasoning abilities |
| LiveCodeBench (Pass@1-COT) | 65.9% | Excels at multi-step reasoning |
Prerequisites
Before you begin, ensure you have configured Tensorfuse on your AWS account. If you haven’t done that yet, follow the Getting Started guide.Deploying Deepseek-R1-671B with Tensorfuse
Each Tensorkube deployment requires:- Your code (in this example, vLLM API server code is used from the Docker image).
- Your environment (as a Dockerfile).
- A deployment configuration (
deployment.yaml).
VLLM_API_KEY) as a Tensorfuse secret. Unlike some other models, Deepseek-R1 671B does not require a separate Hugging Face token, so we can skip that step.
Step 1: Set your API authentication token
Generate a random string that will be used as your API authentication token. Store it as a secret in Tensorfuse using the command below. For the purpose of this demo, we will be usingvllm-key as your API key.
openssl rand -base64 32 and remember to keep it safe as Tensorfuse secrets are opaque.
Step 2: Prepare the Dockerfile
We will use the official vLLM Openai image as our base image. This image comes with all the necessary dependencies to run vLLM. The image is present on DockerHub as vllm/vllm-openai.Dockerfile
Step 3: Deployment config
Although you can deploy tensorfuse apps using command line, it is always recommended to have a config file so that you can follow a GitOps approach to deployment.deployment.yaml
readiness endpoint in your config. Tensorfuse uses this endpoint to ensure that your service is healthy.
Now you can deploy your service using the following command:
Step 4: Accessing the deployed app
Voila! Your autoscaling production LLM service is ready. Only authenticated requests will be served by your endpoint. Once the deployment is successful, you can see the status of your app by running:Remember to configure a TLS endpoint with a custom domain before going to production.
YOUR_APP_URL with the endpoint shown in the output of the above command and run:
Deploying other versions of Deepseek-R1
Although this guide has focused on Deepseek-R1 671B, there are smaller distilled variants available. Each variant changes primarily in: • Model name in theDockerfile (--model flag).
• GPU resources in deployment.yaml.
• (Optional) --tensor-parallel-size depending on your hardware.
Below is a table summarizing the key changes for each variant:
| Model Variant | Dockerfile Model Name | GPU Type | Num GPUs / Tensor parallel size |
|---|---|---|---|
| DeepSeek-R1 2B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | A10G | 1 |
| DeepSeek-R1 7B | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | A10G | 1 |
| DeepSeek-R1 8B | deepseek-ai/DeepSeek-R1-Distill-Llama-8B | A10G | 1 |
| DeepSeek-R1 14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | L40S | 1 |
| DeepSeek-R1 32B | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | L4 | 4 |
| DeepSeek-R1 70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | L40S | 4 |
| DeepSeek-R1 671B | deepseek-ai/DeepSeek-R1 | H100 | 8 |

