Skip to content

qxpBlog/CLONE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge


Introduction

Supported LLMs:

Updates:

  • May 6, 2025: 🎉 Code released!

Contact Us:

Coming !!!

Table of Contents

This repository supplies only the software‑module code; the hardware components are not available for remote testing.

Quick Start

Quick start is based on llama2-7b as an example, other models can change the base model path.

1. Installation

pip install -r requirement.txt

2. Generate optimal pruning ratio

cd Generator
bash quick_start.sh

3. Prune and evaluate the model

cd ../Pruner
bash quick_start.sh

Fintune

You can run the following command to fintune the modle on alpaca.

cd ../Pruner
bash fintune.sh

Evaluation

You can run the following command to evalute Llama2-7B on BBH (zero-shot), MMLU (3-shot), PPL, and Commonsense (zero-shot). You need to download LLaMA-Factory-main and place it in the Pruner folder

cd ../Pruner
bash eval.sh

Acknowledgement

Citation

Comming !!!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages