Changelog

This changelog provides an overview of user-visible updates, including new features, improvements, bug fixes, and deprecations, for the tensorkube CLI.

As the CLI is still in its alpha development stage, breaking changes may occur. We strive to minimize these disruptions and provide the tensorkube upgrade.

We thank you for your understanding and patience as we progress towards a stable release of the CLI.

0.0.41 (December 13, 2024)

Added warnings for older versions of pre-requisite tools during configure
Tensorkube now doesn’t create clusters if there is tool incompatibility
If pre-requisite tools are not available then tensorfuse auto-installs them during tensorkube configure.

0.0.40 (December 12, 2024)

Added support for custom domains with HTTPS, allowing you to attach custom domains to your ML endpoints secured using AWS Certificate Manager (ACM). This includes commands to configure your domain, manage DNS records, and deploy applications with custom domains:
- tensorkube configure --domain-name <domain>: Set up a wildcard SSL certificate for your domain.
- tensorkube domain get-validation-records --domain-name <domain>: Retrieve DNS records for validation.
- tensorkube domain attach --domain-name <domain>: Bind load balancer to ACM certificate for TLS termination.
- tensorkube deploy --domain-name <sub.domain>: Deploy your application with the configured custom domain.
Updated the process for adding cluster access using new commands:
- tensorkube give-cluster-access: Command to grant access permissions for a specific user or team member to the Tensorkube cluster.
- tensorkube get-principal-arn: Command to obtain the ARN for a role that assumes access, enabling seamless integration for Identity Center Users.

0.0.23 (November 1, 2024)

Support for managing datasets using S3 buckets directly through Tensorfuse, simplifying dataset handling for training workflows. This includes new dataset commands:
- tensorkube datasets create --dataset-id my_dataset --path data.jsonl: Creates a new dataset.
- tensorkube datasets list: Lists all datasets.
- tensorkube datasets delete --dataset-id my_dataset: Deletes a specified dataset.
For more details, see finetuning.mdx and finetuning_llama_70b.mdx.
Ability to check the status of training runs with the tensorkube train list command, allowing users to monitor ongoing and completed jobs easily.
Enhanced logging support with tensorkube train logs --job-id <JOB_ID>, providing detailed insights into the training process for debugging and analysis.

0.0.21 (October 30, 2024)

Added support for Lambda Labs in Dev Containers, expanding the cloud provider options beyond AWS. You can now specify --cloud lambda when starting a dev container to utilize Lambda Labs for your development workflows. This enhancement offers more flexibility and potentially cost-effective solutions for GPU-enabled development.
- Example command with Lambda Labs:tensorkube dev --cloud lambda start --gpu-type v100 --port 8080
- Refer to the Dev Containers documentation for detailed setup and usage instructions.
Fixed the issue with viewing deployment logs when services are not yet ready. Instead of encountering an error, you will now see the actual status of the service, providing better transparency and debugging capabilities.
Enhanced support for large dshm (dynamic shared memory) size in Lorax deployments. This change allows larger models to efficiently share memory space during execution, enhancing performance, especially in memory-intensive applications.

0.0.20 (October 25, 2024)

Added support for L40S GPUs on AWS, enabling more powerful computation options for your Tensorfuse deployments. You can now specify the L40S GPU type when deploying your services to take advantage of the enhanced capabilities provided by this GPU.
Use the following command to deploy a service with L40S GPUs tensorkube deploy --gpu-type l40s --gpus 1

0.0.19 (October 24, 2024)

Added support for Dev Containers, allowing seamless testing and development with hot-reloading of GPU-enabled containers. This feature significantly improves efficiency, especially for machine learning and large language models (LLMs) workloads, by reflecting code changes instantly in running containers without the need for restarts.
New commands introduced:
- tensorkube dev --cloud <provider> start --gpu-type <type> --port <port>: Starts a Dev Container instance with hot-reloading enabled.
- tensorkube dev --cloud <provider> stop: Stops the running Dev Container instance while retaining its state.
- tensorkube dev --cloud <provider> list: Lists all running Dev Container instances.
- tensorkube dev --cloud <provider> delete: Completely deletes the Dev Container instance and associated resources.
For comprehensive documentation, including setup and usage instructions, refer to the Dev Containers guide.

0.0.18 (October 18, 2024)

Added support for running fine-tuning runs using Axolotl, allowing you to perform detailed training processes on your models using specified configurations. This feature leverages Axolotl’s styled declarative config.yaml files.
Use the tensorkube train create command with the --axolotl flag to initiate the fine-tuning process.
Supported options include specifying the number of GPUs, GPU type, environment, job ID, and configuration path.
Refer to the Axolotl config documentation for a detailed list of parameters.
Follow our Tensorfuse fine-tuning guide to get started with training models like LoRA adapters for LLaMA-3.1-8B on an SQL dataset.

0.0.17 (September 23, 2024)

Added support for secrets. You can provide sensitive information such as credentials, passwords, API keys etc to your containers using secrets. Checkout the guide to use secrets in your deployments.
Also updated our Github Actions guide to work with secrets. Modify your workflow files accordingly to use secrets in your deployments.
Added support for defining your deployments using a config file. As the number of configurable parameters grow, it becomes difficult to manage them as cli arguements. Therefore we are also supporting config.yaml files for deployment. You can use tensorkube deploy --config-file <path-to-config-file> to use config files for deployment. Their use however is completely optional.
Modified the appearance of links in tensorkube list deployments command. AWS ELB URLs can be quite large and hence can cause display issues in smaller terminals. Now the URLs are displayed outside the status table so that they can be copied when wrapped.

0.0.13 (August 19, 2024)

Added support for GitHub actions. You can configure your CD pipelines using the guide here.
Now your container builds will get interrupted if one of the intermediate command fails. Earlier you had to wait for all the resources to be executed before you can retry your builds.
Added support for multiple concurrent clusters: Now you can run Tensorfuse alongside other kubernetes clusters without changing context
Migration from sslip.io domains: Sslip domains are notorious for misconfigurations and outages. We have finally modified our stack to directly support your deployments using AWS load balancer urls. Using tensorkube list deployments will give the new urls. However, the older way of using sslip.io urls as host headers will continue to work. To view older header urls use —old flag: tensorkube list deployments --old
S3 bug:- Earlier there was a bug which caused redundant files to be available in your containers in the case of simultaneous runs. This has been resolved now. Each deployment run would begin from a clean slate and only the latest deployment would prevail.
Added Required Login:- We are continuously trying to improve the support we can give to our customers. To further aid this we will be releasing a dashboard in the coming weeks. As a prerequisite you will have to log-in to tensorfuse once for running your commands. This means running tensorkube login just once.

0.0.12 (August 12, 2024)

Fixed build failures during deployments

0.0.11 (August 5, 2024)

Fixed the bug where long environment names prevented proper resource creation

0.0.10 (July 29, 2024)

Added support for directly ssh-ing into your deployments using the tensorkube deployment ssh <deployment-name> command. This command will open an interactive shell in the pod and allow you to debug issues.
Added support for environments so that there is segregation between prod, staging etc. You can run tensorkube environment create --env-name <env-name> to create a new environment and then use --env <env-name> flag with all other command.
Added support to stream logs from your deployments to your terminal with the help of tensorkube deployment logs <deployment-name> command.
Added support for getting detailed information around your deployments (such as latest version, status etc) using tensorkube deployment describe <deployment-name> command.
A better UI for interacing with the CLI. Now you can see the progress of your commands in a more interactive way.
Added cleanup jobs to remove outdated root filesystems.

0.0.9 (July 22, 2024)

Released the Tensorfuse Container Runtime that enables ~3s cold starts for your containers.
Made installation instructions independent of the package manager. Now you no longer have to depend on snap to install Tensorfuse prerequisites.
Added support for cloudwatch logging and container insights on cloudwatch for you Tensorfuse runtime. This enables you to browse and debug logs from the Cloudwatch console and view usage metrics dashboards.
Fixed the s3 mountpoint error that keeps popping up randomly during container starts.
Modified the runtime to work with Bottlerocket AMIs and the Nvidia container runtime.

0.0.7 (June 26, 2024)

Added support for T4 and L4 gpu types.
Added support for ignoring files and directories specified in a .dockerignore file during the image build process.
Added the tensorkube upgrade command to bring your tensorfuse runtime to the latest version
Added flags --min-scale and --max-scale to control the number of pods running in your tensorfuse app. By default the minimum scale is 0 and the maximum scale is 3.
Non gpu pods will no longer be scheduled on gpu machines.

0.0.5 (June 24, 2024)

The CLI now works seamlessly on Linux machines. Earlier, the CLI was supported only on Macs.
Added support for Network File System within your Tensorkube cluster. Network File System enables faster cold starts and reduces Image build times as well.
Upgraded the Image build engine to use less resources during an Image build process. The new build engine consumes less RAM and hence can run on cheaper instances.
You can now use tensorkube install-prerequisites to install and check all the prerequisite packages required to run tensorkube before configuring it with tensorkube configure.
tensorkube configure now resumes from the last installation checkpoint. You now don’t have to remove and reset the cluster in case your configuration runs into errors.

0.0.4 (June 17, 2024)

Logging support added for streaming logs during the container image build process and while a service is getting started. Every intermediate process from building the image to submitting a nodeclaim to actually starting a pod can be understood and debug using logs now.
Support for versioning deployments is now live. Everytime you run tensorkube deploy, a new version of your service is created and the older version is retired.
tensorkube deploy now accepts --cpu and --memory optional parameters which allow you to specify the number of CPU millicores and the amount of RAM you want your servers to run on.
Your pods now remain active upto 5 minutes after they receive their last HTTP request.
Your pods now have a hard upscale limit of 3 pods.

0.0.3 (June 11, 2024)

Tensorkube CLI is now available on pypi and can be installed using pip install tensorkube
tensorkube deploy now supports --gpus and --gpu-type parameters. --gpus defines the number of GPUs each pod requires and the --gpu-type parameter defines the type of GPU you want your system to run on.
You can now list all your service deployments using the tensorkube list deployments command
Your file uploads now come with progress bars so that you can optimise your images during deployment.

CLI Reference

Permissions

Changelog

0.0.41 (December 13, 2024)

0.0.40 (December 12, 2024)

0.0.23 (November 1, 2024)

0.0.21 (October 30, 2024)

0.0.20 (October 25, 2024)

0.0.19 (October 24, 2024)

0.0.18 (October 18, 2024)

0.0.17 (September 23, 2024)

0.0.13 (August 19, 2024)

0.0.12 (August 12, 2024)

0.0.11 (August 5, 2024)

0.0.10 (July 29, 2024)

0.0.9 (July 22, 2024)

0.0.7 (June 26, 2024)

0.0.5 (June 24, 2024)

0.0.4 (June 17, 2024)

0.0.3 (June 11, 2024)

Changelog

CLI Reference

Permissions

​0.0.41 (December 13, 2024)

​0.0.40 (December 12, 2024)

​0.0.23 (November 1, 2024)

​0.0.21 (October 30, 2024)

​0.0.20 (October 25, 2024)

​0.0.19 (October 24, 2024)

​0.0.18 (October 18, 2024)

​0.0.17 (September 23, 2024)

​0.0.13 (August 19, 2024)

​0.0.12 (August 12, 2024)

​0.0.11 (August 5, 2024)

​0.0.10 (July 29, 2024)

​0.0.9 (July 22, 2024)

​0.0.7 (June 26, 2024)

​0.0.5 (June 24, 2024)

​0.0.4 (June 17, 2024)

​0.0.3 (June 11, 2024)

0.0.41 (December 13, 2024)

0.0.40 (December 12, 2024)

0.0.23 (November 1, 2024)

0.0.21 (October 30, 2024)

0.0.20 (October 25, 2024)

0.0.19 (October 24, 2024)

0.0.18 (October 18, 2024)

0.0.17 (September 23, 2024)

0.0.13 (August 19, 2024)

0.0.12 (August 12, 2024)

0.0.11 (August 5, 2024)

0.0.10 (July 29, 2024)

0.0.9 (July 22, 2024)

0.0.7 (June 26, 2024)

0.0.5 (June 24, 2024)

0.0.4 (June 17, 2024)

0.0.3 (June 11, 2024)