Deploy GPT-OSS models from OpenAI in your AWS using Tensorfuse
openai:gptoss
image.
Before we deploy, here’s a quick snapshot of inference benchmark scores for GPT-OSS models:
Model | GPU Configuration | Context Length | Tokens/sec |
---|---|---|---|
gpt-oss-20b | 1xH100 | 130k tokens | 240 |
gpt-oss-120b | 8xH100 | 130k tokens | 200 |
deployment.yaml
)READ
token from your huggingface profile and store it as a secret in Tensorfuse using the command below.
HUGGING_FACE_HUB_TOKEN
as vLLM assumes the same.
readiness
endpoint in your config. Tensorfuse uses this to ensure your service is healthy before routing traffic to it. If not specified, Tensorfuse will default to checking /readiness
on port 80.YOUR_APP_URL
with the endpoint from the command output and run: