Version: 25.4.0

Release Notes

Version 25.4.0 - April 2025

note

Support is generally provided for both torch.function_name and Tensor.function_name syntaxes. If you find a case where only one is supported, please report it to support@clika.io.

Overview

This major release introduces significant enhancements to our core model compression and deployment capabilities, streamlining workflows and expanding hardware support.

Key Features

Enhanced quantization algorithm: Our core quantization engine has been substantially upgraded. Fine-tuning is now optional for achieving excellent results, significantly simplifying the optimization process.
- A new Quantization Sensitivity Threshold option is available in the QuantizationSettings object for more direct control over the quantization process.
Initial TensorRT-LLM support: Introduced support for weights-only quantization (WOQ) with TensorRT-LLM, enabling optimized deployment for large language models on compatible NVIDIA hardware. Support for additional TensorRT-LLM features is planned.
Automatic CPU offloading: Models that exceed available accelerator memory can now automatically offload parts of the computation to the CPU, ensuring successful execution even with large models.
Simplified model export: Easily export optimized models to common deployment formats, including INT4, BFloat16, and Float16.

Upcoming Features

We are actively working on the following enhancements for future releases:

Multi-GPU support: Multi-GPU capabilities, temporarily unavailable due to internal architectural upgrades, will be restored.
Expanding TensorRT-LLM integration: Further enhancements and broader feature support for TensorRT-LLM deployments.
Memory optimization: Continued focus on reducing VRAM and RAM consumption during model optimization and inference.
Compile ClikaModule to framework: Run the ClikaModule directly on the Framework using a simple method call.
Quantization sensitivity calibration caching: Avoid re-measuring quantization sensitivities and cache the results for quick development iterations and experimentation.
Easier access to quantization sensitivities: Introduce an API to retrieve sensitivity data for deeper analysis and visualization purposes.

Release Notes

Version 25.4.0 - April 2025​

Overview​

Key Features​

Upcoming Features​

Version 25.4.0 - April 2025

Overview

Key Features

Upcoming Features