CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

Introduction

Supported LLMs:

Updates:

May 6, 2025: 🎉 Code released!

Contact Us:

Coming !!!

Quick Start

Quick start is based on llama2-7b as an example, other models can change the base model path.

1. Installation

pip install -r requirement.txt

2. Generate optimal pruning ratio

cd Generator
bash quick_start.sh

3. Prune and evaluate the model

cd ../Pruner
bash quick_start.sh

Fintune

You can run the following command to fintune the modle on alpaca.

cd ../Pruner
bash fintune.sh

Evaluation

You can run the following command to evalute Llama2-7B on BBH (zero-shot), MMLU (3-shot), PPL, and Commonsense (zero-shot). You need to download LLaMA-Factory-main and place it in the Pruner folder

cd ../Pruner
bash eval.sh

Acknowledgement

The evaluation of the LLM: lm-evaluation-harness
LLaMA: https://github.com/facebookresearch/llama
Vicuna: https://github.com/lm-sys/FastChat
Peft: https://github.com/huggingface/peft
Alpaca-lora: https://github.com/tloen/alpaca-lora

Citation

Comming !!!

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Generator		Generator
Pruner		Pruner
figures		figures
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

Introduction

Supported LLMs:

Updates:

Contact Us:

Table of Contents

Quick Start

1. Installation

2. Generate optimal pruning ratio

3. Prune and evaluate the model

Fintune

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

qxpBlog/CLONE

Folders and files

Latest commit

History

Repository files navigation

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

Introduction

Supported LLMs:

Updates:

Contact Us:

Table of Contents

Quick Start

1. Installation

2. Generate optimal pruning ratio

3. Prune and evaluate the model

Fintune

Evaluation

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages