Changelog
Changelog
Changes made to the Tensorkube CLI
This changelog provides an overview of user-visible updates, including new features, improvements, bug fixes, and deprecations, for the tensorkube CLI.
As the CLI is still in its alpha development stage, breaking changes may occur. We strive to minimize these disruptions and provide the tensorkube upgrade
.
We thank you for your understanding and patience as we progress towards a stable release of the CLI.
0.0.21 (October 30, 2024)
- Added support for Lambda Labs in Dev Containers, expanding the cloud provider options beyond AWS. You can now specify
--cloud lambda
when starting a dev container to utilize Lambda Labs for your development workflows. This enhancement offers more flexibility and potentially cost-effective solutions for GPU-enabled development.- Example command with Lambda Labs:
tensorkube dev --cloud lambda start --gpu-type v100 --port 8080
- Refer to the Dev Containers documentation for detailed setup and usage instructions.
- Example command with Lambda Labs:
- Fixed the issue with viewing deployment logs when services are not yet ready. Instead of encountering an error, you will now see the actual status of the service, providing better transparency and debugging capabilities.
- Enhanced support for large
dshm
(dynamic shared memory) size in Lorax deployments. This change allows larger models to efficiently share memory space during execution, enhancing performance, especially in memory-intensive applications.
0.0.20 (October 25, 2024)
- Added support for L40S GPUs on AWS, enabling more powerful computation options for your Tensorfuse deployments. You can now specify the L40S GPU type when deploying your services to take advantage of the enhanced capabilities provided by this GPU.
- Use the following command to deploy a service with L40S GPUs
tensorkube deploy --gpu-type l40s --gpus 1
0.0.19 (October 24, 2024)
- Added support for Dev Containers, allowing seamless testing and development with hot-reloading of GPU-enabled containers. This feature significantly improves efficiency, especially for machine learning and large language models (LLMs) workloads, by reflecting code changes instantly in running containers without the need for restarts.
- New commands introduced:
tensorkube dev --cloud <provider> start --gpu-type <type> --port <port>
: Starts a Dev Container instance with hot-reloading enabled.tensorkube dev --cloud <provider> stop
: Stops the running Dev Container instance while retaining its state.tensorkube dev --cloud <provider> list
: Lists all running Dev Container instances.tensorkube dev --cloud <provider> delete
: Completely deletes the Dev Container instance and associated resources.
- For comprehensive documentation, including setup and usage instructions, refer to the Dev Containers guide.
0.0.18 (October 18, 2024)
- Added support for running fine-tuning runs using Axolotl, allowing you to perform detailed training processes on your models using specified configurations. This feature leverages Axolotl’s styled declarative
config.yaml
files. - Use the
tensorkube train create
command with the--axolotl
flag to initiate the fine-tuning process. - Supported options include specifying the number of GPUs, GPU type, environment, job ID, and configuration path.
- Refer to the Axolotl config documentation for a detailed list of parameters.
- Follow our Tensorfuse fine-tuning guide to get started with training models like LoRA adapters for
LLaMA-3.1-8B
on an SQL dataset.
0.0.17 (September 23, 2024)
- Added support for secrets. You can provide sensitive information such as credentials, passwords, API keys etc to your containers using secrets. Checkout the guide to use secrets in your deployments.
- Also updated our Github Actions guide to work with secrets. Modify your workflow files accordingly to use secrets in your deployments.
- Added support for defining your deployments using a config file. As the number of configurable parameters grow, it becomes difficult to manage them as cli arguements. Therefore we are also supporting config.yaml files for deployment. You can use
tensorkube deploy --config-file <path-to-config-file>
to use config files for deployment. Their use however is completely optional. - Modified the appearance of links in
tensorkube list deployments
command. AWS ELB URLs can be quite large and hence can cause display issues in smaller terminals. Now the URLs are displayed outside the status table so that they can be copied when wrapped.
0.0.13 (August 19, 2024)
- Added support for GitHub actions. You can configure your CD pipelines using the guide here.
- Now your container builds will get interrupted if one of the intermediate command fails. Earlier you had to wait for all the resources to be executed before you can retry your builds.
- Added support for multiple concurrent clusters: Now you can run Tensorfuse alongside other kubernetes clusters without changing context
- Migration from
sslip.io
domains: Sslip domains are notorious for misconfigurations and outages. We have finally modified our stack to directly support your deployments using AWS load balancer urls. Usingtensorkube list deployments
will give the new urls. However, the older way of usingsslip.io
urls as host headers will continue to work. To view older header urls use —old flag:tensorkube list deployments --old
- S3 bug:- Earlier there was a bug which caused redundant files to be available in your containers in the case of simultaneous runs. This has been resolved now. Each deployment run would begin from a clean slate and only the latest deployment would prevail.
- Added Required Login:- We are continuously trying to improve the support we can give to our customers. To further aid this we will be releasing a dashboard in the coming weeks. As a prerequisite you will have to log-in to tensorfuse once for running your commands. This means running
tensorkube login
just once.
0.0.12 (August 12, 2024)
- Fixed build failures during deployments
0.0.11 (August 5, 2024)
- Fixed the bug where long environment names prevented proper resource creation
0.0.10 (July 29, 2024)
- Added support for directly ssh-ing into your deployments using the
tensorkube deployment ssh <deployment-name>
command. This command will open an interactive shell in the pod and allow you to debug issues. - Added support for environments so that there is segregation between prod, staging etc. You can run
tensorkube environment create --env-name <env-name>
to create a new environment and then use--env <env-name>
flag with all other command. - Added support to stream logs from your deployments to your terminal with the help of
tensorkube deployment logs <deployment-name>
command. - Added support for getting detailed information around your deployments (such as latest version, status etc) using
tensorkube deployment describe <deployment-name>
command. - A better UI for interacing with the CLI. Now you can see the progress of your commands in a more interactive way.
- Added cleanup jobs to remove outdated root filesystems.
0.0.9 (July 22, 2024)
- Released the Tensorfuse Container Runtime that enables ~3s cold starts for your containers.
- Made installation instructions independent of the package manager. Now you no longer have to depend on
snap
to install Tensorfuse prerequisites. - Added support for cloudwatch logging and container insights on cloudwatch for you Tensorfuse runtime. This enables you to browse and debug logs from the Cloudwatch console and view usage metrics dashboards.
- Fixed the s3 mountpoint error that keeps popping up randomly during container starts.
- Modified the runtime to work with Bottlerocket AMIs and the Nvidia container runtime.
0.0.7 (June 26, 2024)
- Added support for T4 and L4 gpu types.
- Added support for ignoring files and directories specified in a
.dockerignore
file during the image build process. - Added the
tensorkube upgrade
command to bring your tensorfuse runtime to the latest version - Added flags
--min-scale
and--max-scale
to control the number of pods running in your tensorfuse app. By default the minimum scale is 0 and the maximum scale is 3. - Non gpu pods will no longer be scheduled on gpu machines.
0.0.5 (June 24, 2024)
- The CLI now works seamlessly on Linux machines. Earlier, the CLI was supported only on Macs.
- Added support for Network File System within your Tensorkube cluster. Network File System enables faster cold starts and reduces Image build times as well.
- Upgraded the Image build engine to use less resources during an Image build process. The new build engine consumes less RAM and hence can run on cheaper instances.
- You can now use
tensorkube install-prerequisites
to install and check all the prerequisite packages required to run tensorkube before configuring it withtensorkube configure
. tensorkube configure
now resumes from the last installation checkpoint. You now don’t have to remove and reset the cluster in case your configuration runs into errors.
0.0.4 (June 17, 2024)
- Logging support added for streaming logs during the container image build process and while a service is getting started. Every intermediate process from building the image to submitting a nodeclaim to actually starting a pod can be understood and debug using logs now.
- Support for versioning deployments is now live. Everytime you run
tensorkube deploy
, a new version of your service is created and the older version is retired. tensorkube deploy
now accepts--cpu
and--memory
optional parameters which allow you to specify the number of CPU millicores and the amount of RAM you want your servers to run on.- Your pods now remain active upto 5 minutes after they receive their last HTTP request.
- Your pods now have a hard upscale limit of 3 pods.
0.0.3 (June 11, 2024)
- Tensorkube CLI is now available on pypi and can be installed using
pip install tensorkube
tensorkube deploy
now supports--gpus
and--gpu-type
parameters.--gpus
defines the number of GPUs each pod requires and the--gpu-type
parameter defines the type of GPU you want your system to run on.- You can now list all your service deployments using the
tensorkube list deployments
command - Your file uploads now come with progress bars so that you can optimise your images during deployment.