Deploy Qwen QwQ 32B using Tensorfuse
gpu_type
option below.Benchmark | Qwen QwQ (32B) | Deepseek-R1 (671B) | Remarks |
---|---|---|---|
AIME 2024 (Pass@1) | 79.5% | 79.8% | Mathematical and reasoning abilities |
LiveCodeBench (Pass@1-COT) | 63.4% | 65.9% | Excels at multi-step reasoning |
deployment.yaml
).VLLM_API_KEY
) as a Tensorfuse secret. Unlike some other models, Qwen QwQ 32B does not require a separate Hugging Face token, so we can skip that step.
vllm-key
as your API key.
openssl rand -base64 32
and remember to keep it safe as Tensorfuse secrets are opaque.
readiness
endpoint in your config. Tensorfuse uses this endpoint to ensure that your service is healthy.
readiness
endpoint is configured, Tensorfuse tries the /readiness
path on port 80 by default which can cause issues if your app is not listening on that path.YOUR_APP_URL
with the endpoint shown in the output of the above command and run:
Specification | L4 | L40S | A10G |
---|---|---|---|
VRAM | 24 GB | 48 GB | 24 GB |
Performance Score | 13.44 | 42.25 | - |
TFLOPS (FP32) | 30.29 | 91.6 | 31.2 |
Power Consumption | 72W | 350W | 150W |
Cost-Efficiency | High | Medium | Medium |