Getting Started
Deploy serverless GPU applications on your AWS account
Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide, we will walk you through the process of setting up Tensorkube and deploying a FastAPI app on it.
Prerequisites
Before you begin, ensure you have the following:
- AWS credentials setup on your machine
- Python and pip installed on your machine
You can run the following commands to setup AWS credentials on your machine:
aws configure
or you can manually export them as environment variables:
export AWS_ACCESS_KEY_ID=your_access_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_access_key
export AWS_DEFAULT_REGION=your_default_region
Installation
First, install the tensorkube Python package:
pip install tensorkube
Following this, run the following command to login to Tensorfuse and get your token.
tensorkube login
You just have to sign in using your Google Workspace account and Tensorfuse will automatically manage the token for you.
After this, run the following command to setup Tensorkube on your AWS account:
tensorkube configure
This is a one time setup that will create a CloudFormation stack on your AWS account. This needs to be run once per AWS account. If you are a team that is using tensorfuse, only one of the team members is required to run this command.
This command sets up a Cloudformation stack, creates a k8s cluster and deploys custom resources in order to enable serverless GPUs on your AWS account. It can take anywhere from 8 to 15 minutes to complete.
Deploying your first Tensorkube app
Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile).
Code files
Let’s create a simple FastAPI app and deploy it on Tensorkube. Before deploying your app, ensure you have a /readiness
endpoint configured in your FastAPI app.
Tensorkube uses this endpoint to check the health of your deployments. Given below is a simple FastAPI app that you can deploy:
from fastapi import FastAPI
app = FastAPI()
@app.get("/readiness")
def readiness():
return {"status": "ready"}
@app.get("/")
def read_root():
return {"message": "Hello, World!"}
Environment files
Add your python dependencies to requirements.txt:
fastapi
uvicorn
pydantic
Next, create a Dockerfile for your FastAPI app. Given below is a simple Dockerfile that you can use:
# Use an official Python runtime as a parent image
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Run app.py when the container launches
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]
Deploying the app
This is the easiest part. Navigate to your project root and run the following command:
tensorkube deploy
Voila! Your first deployment is ready to go. You can access your app at the URL provided in the output.
Deploying with GPUs
If you want to deploy your app with GPUs, you can specify the number of GPUs you want to use in your deployment:
tensorkube deploy --gpus 1 --gpu-type a10g
The --gpu-type
arguement supports all the GPU types that are available on AWS. You can find the list of supported GPU types here.
Check the status of your deployment
You can list all your deployments using the following command:
tensorkube list deployments
You can also query specific details about a particular deployment, such as the enpoint url or the status of the deployment:
tensorkube get deployment <deployment_id>
And that’s it. Your automatic GPU serverless deployment is now up and running. Enjoy!