In ML, jobs represent discrete tasks such as model training, inference, or data processing. Efficient management of these tasks is crucial, especially in shared resource environments. Job queues play a vital role in this process by optimising resource allocation and preventing resource contention. This is particularly important in ML workflows, where tasks are often resource-intensive and time-consuming.

Getting started with queued jobs

To get started with jobs, you need to have the Tensorfuse CLI installed on your machine. You can install the CLI using the following command:

pip install --upgrade pip
pip install --upgrade tensorkube
tensorkube login

Configuration for AWS

You can run the following commands to setup AWS credentials on your machine:

aws configure

or you can manually export them as environment variables:

export AWS_ACCESS_KEY_ID=your_access_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_access_key
export AWS_DEFAULT_REGION=your_default_region

Deploying and Running Jobs

Deploy a Job
```
tensorkube job deploy --name <job-name> --gpus <number-of-gpus> --gpu-type <gpu-type> --max-scale <max-scale> --cpu <cpu-units> --memory <memory-size> --secret <secret-name>
```
This command deploys a job with the specified parameters.
If your queued jobs also include a different payload for each job, please refer to point 3 for information on how to access the payload in your deployment.
- --name <job-name>: The name of the job.
- --gpus <number-of-gpus>: The number of GPUs required for the job. [Default 0]
- --gpu-type <gpu-type>: The type of GPU required.
- --max-scale <max-scale>: The maximum scale for the job. [Default 3]
- --cpu <cpu-units>: The amount of CPU units required. Used only if GPUs are 0. Specified in milliCPUs [Default 100]
- --memory <memory-size>: The amount of memory required. Specified in MB [Default 200]
- --secret <secret-key>: The name of the secret required by the job. Can be used multiple times to attach multiple secrets.
Queue a Job
```
tensorkube job queue --job-name <job-name> --job-id <job-id> --payload <payload>
```
This command queues a job by pushing data to the queue, which triggers the execution of the job. Make sure that the job-name matches the job you deployed.
- --job-name <job-name>: The name of the job to be queued.
- --job-id <job-id>: The unique identifier for the job.
- --payload <payload>: The parameters or data to be passed to the job. Data Type: String.
Accessing your payload

To access your payload string inside the deployment, install the tensorkube package in your Docker image and add the following snippet to your code.

from tensorkube import get_queued_message

message = get_queued_message()

If you are sending a json object as a string, remember to convert it back to a json object like so

import json
from tensorkube import get_queued_message

message = json.loads(get_queued_message())

Poll for Job Status
```
tensorkube job get --job-name <job-name> --job-id <job-id>
```
This command returns the status of any particular job
- --job-name <job-name>: The name of the job to be polled.
- --job-id <job-id>: The unique identifier for the job whose status you want to check.
** List all the jobs**
```
tensorkube job list
```
This command lists all the jobs that are currently deployed and also shows the status of the last three job runs for a particular job.

Example

Let’s take an example of an inference job that takes in a prompt and generates a response using a pre-trained model (Qwen) in this scenario. We perform the following steps -

Download the model from HuggingFace using hf-transfer.
Read the job inputs using get_queued_message() from tensorkube
Generate the response using the model

job.py

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, utils
import json
import os
from tensorkube import get_queued_message
from huggingface_hub import snapshot_download
import time

model_dir = "models"
access_token = os.environ.get("HUGGING_FACE_HUB_TOKEN")

# First download the model using snapshot_download as it is much faster than AutoModel.from_pretrained
model_download_start_time = time.time()
os.makedirs('./models/qwen', exist_ok=True)
snapshot_download(repo_id="Qwen/Qwen2-1.5B-Instruct", local_dir="models",ignore_patterns=["*.pt", "*.bin"], token=access_token)
model_download_end_time= time.time()
print(f'It took f{model_download_end_time - model_download_start_time} to download the model.')
print(os.listdir("./models"))

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Now load the model
model_load_start = time.time()
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    torch_dtype="auto",
).to(device)

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model_load_end = time.time()
print(f'It took {model_load_end - model_load_start} seconds to load the model from disk to GPU.')


def generate_text(text: str):
    if not text:
        return {"error": "text field is required"}
    prompt = text
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return {"generated_text": response}

if __name__ == "__main__":
    prompt = json.loads(get_queued_message())
    print(generate_text(prompt['text']))

Create a dockerfile for this as follows:

Dockerfile

# Use the nvidia cuda base image
FROM nvidia/cuda:12.1.1-runtime-ubuntu22.04

ENV HF_HUB_ENABLE_HF_TRANSFER=1

# Update and install required packages
RUN apt-get update && apt-get install -y \
    python3.10 \
    python3.10-dev \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*


# Upgrade pip
RUN pip3 install --no-cache-dir --upgrade pip && pip install transformers && pip install tensorkube torch hf_transfer huggingface_hub[hf_transfer]

# Set working directory
WORKDIR /code

# Copy the job code files
COPY job.py /code/job.py

CMD ["python3", "job.py"]

Deployment

Deploy this job definition:

tensorkube job deploy --name inference-job --gpus 1 --gpu-type a10g

Queue a job process:

tensorkube job queue --job-name inference-job --job-id 1 --payload '{"text": "What is life?"}'

Get its status:

tensorkube job get --job-name inference-job --job-id 1

Programmatic Access to Job Queues

You can also queue jobs programmatically from your python code. Follow the prerequisites section for submitting jobs from a different service. For eg if you have a backend service that needs to submit jobs to the tensorkube cluster, you need to perform the following steps on your backend to be able to submit jobs programmatically.

Prerequisetes:

Tensorkube: Install tensorkube the tensorkube package using the command or adding it to your requirements.txt / Dockerfile file.

pip install tensorkube

AWS CLI: This is used by the tensorkube package to be able to access the EKS cluster. You can find the steps to install AWS CLI here: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html This would become a part of your Dockerfile if you are deploying your service in a container.

Attach IAM Policies to your AWS User/Role: To programmatically submit jobs in tensorkube cluster, the aws user or iam role needs to have the following policy attached. If you have your backend deployed on ECS, you can attach the policy to the ECS task role. If you are running your backend AWS Lambda, you can attach the policy to the Lambda execution role.

<Note>
Before attaching the policy, change region and account number to the correct values. To get ACCOUNT_NO run the command `tensorkube account`. The REGION is the region where your tensorkube cluster is deployed.
</Note>
```json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "dynamodb:PutItem",
                    "dynamodb:GetItem"
                ],
                "Resource": "arn:aws:dynamodb:{REGION}:{ACCOUNT_NO}:table/tensorkube-*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "eks:DescribeCluster"
                ],
                "Resource": "arn:aws:eks:{REGION}:{ACCOUNT_NO}:cluster/tensorkube"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "sqs:CreateQueue",
                    "sqs:SendMessage",
                    "sqs:GetQueueUrl"
                ],
                "Resource": "arn:aws:sqs:{REGION:{ACCOUNT_NO}:tensorkube-*"
            }

        ]
    }

The above policy allows the user to access the tensorkube cluster. It also allows to access DynamoDB table and SQS queues used by the tensorkube cluster which is required to submit jobs programmatically.

Configure AWS: Run the aws configure and enter your ACCESS_KEY_ID, SECRET_ACCESS_KEY, SESSION_TOKEN(only for Identity Center User) and REGION values as you are prompted. You can also directly modify your ~/.aws/credentials file for backend deployments. Read more about configuring your AWS CLI here https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

Code Snippet

To programmatically queue a job, add the following snippet to your code

job_queue.py

from tensorkube import queue_new_job

job_name=<YOUR_DEPLOYED_JOB_NAME>
job_id=<ID_OF_NEW_QUEUED_JOB>
job_payload=<STRING_PAYLOAD_FOR_JOB>

queue_new_job(job_name, job_id, job_payload)

Get Started

Concepts

Operations

Troubleshooting

Enterprise Setup

Architecture

Job Queues

Getting started with queued jobs

Configuration for AWS

Deploying and Running Jobs

Example

Deployment

Programmatic Access to Job Queues

Prerequisetes:

Code Snippet

Get Started

Concepts

Operations

Troubleshooting

Enterprise Setup

Architecture

​Getting started with queued jobs

​Configuration for AWS

​Deploying and Running Jobs

​Example

​Deployment

​Programmatic Access to Job Queues

​Prerequisetes:

​Code Snippet

Getting started with queued jobs

Configuration for AWS

Deploying and Running Jobs

Example

Deployment

Programmatic Access to Job Queues

Prerequisetes:

Code Snippet