[ECCV 2024] SNP: Structured Neuron-level Pruning to Preserve Attention Scores

Official implementation of the paper "SNP: Structured Neuron-level Pruning to Preserve Attention Scores" accepted at European Conference on Computer Vision (ECCV) 2024.

Structured Neuron-level Pruning (SNP) prunes neurons with less informative attention scores and eliminates redundancy among heads. Our approach effectively accelerates Transformer-based models for both edge devices and server processors. SNP with head pruning can compress the DeiT-Base by 80% of the parameters and computational costs and achieve 4.93× speed up on Jetson Nano and 3.85× on RTX3090.

Proposed Method

SNP prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. For more details, please refer to the main paper.

Benchmark on ImageNet-1K

Inference speed and Top-1 accuracy of the compressed model across different devices. Latency is benchmarked with 200 warmup runs and averaged over 1000 runs (all units in the table are in milliseconds). A single image is used as the batch size, except for the RTX 3090, where a batch size of 64 images is employed.

_Model	_{Top-1 (%)}	_GFLOPs	_{Raspberry Pi 4B (.onnx)}	_{Jetson Nano (.trt)}	_{Xeon Silver 4210R (.pt)}	_{RTX 3090 (.pt)}
_DeiT-Tiny	_72.2	_1.3	_139.1	_41.0	_34.7	_18.7
_{+ SNP (Ours)}	_70.2	_0.6	_{81.6 (1.70×)}	_{26.7 (1.54×)}	_{25.3 (1.38×)}	_{17.8 (1.05×)}
_DeiT-Small	_79.8	_4.6	_401.3	_99.3	_53.4	_46.1
_{+ SNP (Ours)}	_78.5	_2.0	_{199.2 (2.01×)}	_{45.5 (2.18×)}	_{38.6 (1.38×)}	_{32.9 (1.40×)}
_{+ SNP (Ours)}	_73.3	_1.3	_{136.7 (2.94×)}	_{32.0 (3.10×)}	_{33.5 (1.60×)}	_{27.0 (1.71×)}
_DeiT-Base	_81.8	_17.6	_1377.7	_293.3	_122.0	_151.4
_{+ SNP (Ours)}	_79.6	_6.4	_{565.7 (2.44×)}	_{132.6 (2.21×)}	_{64.7 (1.89×)}	_{73.00 (2.07×)}
_{+ SNP (Ours) + Head}	_79.1	_3.5	_{307.0 (4.48×)}	_{59.5 (4.93×)}	_{46.1 (2.65×)}	_{39.3 (3.85×)}
_{EfficientFormer-L1}	_79.2	_1.3	_169.1	_31.0	_43.8	_26.2
_{+ SNP (Ours)}	_75.5	_0.6	_{95.1 (1.78×)}	_{19.8 (1.56×)}	_{38.3 (1.14×)}	_{17.2 (1.52×)}
_{+ SNP (Ours)}	_74.5	_0.5	_{82.6 (2.05×)}	_{17.8 (1.74×)}	_{35.2 (1.24×)}	_{16.0 (1.64×)}

Installation

conda create -n snp python=3.8
conda activate snp
git clone https://github.com/Nota-NetsPresso/SNP.git
cd SNP
pip install -r requirements.txt

Getting Started

Sign Up for NetsPresso

To compress the DeiT model using SNP, you need to sign up for a NetsPresso account. You can sign up here or go directly to the Sign Up page.

Simple Run

Following steps compress the DeiT-T model using SNP and train it for 20 epochs:

Run the main script:
```
bash main.sh
```

When prompted, enter your NetsPresso user information:

Please enter your NetsPresso Email:
Please enter your NetsPresso Password:

Enter the path to your ImageNet-1K dataset:

Please enter the path to your ImageNet dataset:

Reproduce the ImageNet-1K results

Compressed DeiT-T: 0.6 GFLOPs and 70.29% Top-1 Acc.:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
    python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
        train.py --model "./reported_models/compressed_models/DeiT-T.pt" \
                --lr 0.00025 \
                --batch-size 256 \
                --epochs 300 \
                --output_dir ${OUPUT_DIR} \
                --data-path  ${IMAGENET_PATH}\
                > ./txt_logs/training_deit_t.txt 2>&1 &

Compressed DeiT-S: 2.0 GFLOPs and 78.52% Top-1 Acc.:

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
        python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
            train.py --model "./reported_models/compressed_models/DeiT-S_2GFLOPs.pt" \
                    --lr 0.00025 \
                    --batch-size 256 \
                    --epochs 300 \
                    --output_dir ${OUPUT_DIR} \
                    --data-path  ${IMAGENET_PATH}\
                    > ./txt_logs/training_deit_s_2GFLOPs.txt 2>&1 &

Compressed DeiT-S with 1.3 GFLOPs and 73.32% Top-1 Acc.:

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
        python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
            train.py --model "./reported_models/compressed_models/DeiT-S_1_3GFLOPs.pt" \
                    --lr 0.00025 \
                    --batch-size 256 \
                    --epochs 300 \
                    --output_dir ${OUPUT_DIR} \
                    --data-path  ${IMAGENET_PATH}\
                    > ./txt_logs/training_deit_s_1_27GFLOPs.txt 2>&1 &

Overall Instructions for SNP

To compress the DeiT model, use the following command:

python compress.py --NetsPresso-Email ${USER_NAME} \
                    --NetsPresso-Pwd ${USER_PWD} \
                    --model deit_tiny_patch16_224 \
                    --data-path ${IMAGENET_PATH}\
                    --output_dir ${OUPUT_DIR} \
                    --num-imgs-snp-calculation 64\

To train the compressed model (saved in the compressed directory within output_dir), use the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
    python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
        train.py --model "${OUPUT_DIR}/compressed/compressed.pt" \
                --batch-size 256 \
                --epochs 300 \
                --output_dir ${OUPUT_DIR} \
                --data-path  ${IMAGENET_PATH}\
                > ./txt_logs/training_test.txt 2>&1 &

Try SNP on Your Own Model

from netspresso import NetsPresso
from netspresso.enums import CompressionMethod, GroupPolicy, LayerNorm, Policy
from netspresso.clients.compressor.v2.schemas import Options

# Step 0: Login to NetsPresso
netspresso = NetsPresso(email=args.NetsPresso_Email, password=args.NetsPresso_Pwd)

# Step 1: Declare the compressor
compressor = netspresso.compressor_v2()

# Step 2: Upload the model
# Provide the path to your model and specify the input shape
model = compressor.upload_model(
    input_model_path=${MODEL_PATH},
    input_shapes=[{"batch": 1, "channel": 3, "dimension": [224, 224]}],
)

# Step 3: Select the compression method
# Specify the compression method and options
compression_info = compressor.select_compression_method(
    model_id=model.ai_model_id,
    compression_method=CompressionMethod.PR_SNP,
    options=Options(
        policy=Policy.AVERAGE,
        layer_norm=LayerNorm.TSS_NORM,
        group_policy=GroupPolicy.NONE,
        reshape_channel_axis=-1,
    ),
)

# Step 4: Load the compression ratio for each layer
# Assign the compression ratio for each available layer
for available_layer in compression_info.available_layers:
    available_layer.values = [${COMPRESS_RATIO}[available_layer.name]]

# Step 5: Compress the model
# Perform the compression and save the compressed model
compressed_model_info = compressor.compress_model(
    compression=compression_info,
    output_dir=${SAVE_DIR},
)

# Load the compressed model
compressed_model=torch.load(compressed_model_info.compressed_model_path)

# After compressing the model, the user needs to train the compressed model to compensate for the performance loss.

License

All rights related to this repository and the compressed models are reserved by Nota Inc.
The intended use is strictly limited to research and non-commercial projects.

Citation

@article{shim2024snp,
  title={SNP: Structured Neuron-level Pruning to Preserve Attention Scores},
  author={Shim, Kyunghwan and Yun, Jaewoong and Choi, Shinkook},
  journal={arXiv preprint arXiv:2404.11630},
  year={2024},
  url={https://arxiv.org/abs/2404.11630}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
SNP_compression		SNP_compression
dataset		dataset
fig		fig
reported_models		reported_models
utils		utils
LICENSE		LICENSE
README.md		README.md
compress.py		compress.py
engine.py		engine.py
main.sh		main.sh
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ECCV 2024] SNP: Structured Neuron-level Pruning to Preserve Attention Scores

Proposed Method

Benchmark on ImageNet-1K

Installation

Getting Started

Sign Up for NetsPresso

Simple Run

Reproduce the ImageNet-1K results

Overall Instructions for SNP

Try SNP on Your Own Model

License

Citation

About

Releases

Packages

Languages

License

Nota-NetsPresso/SNP

Folders and files

Latest commit

History

Repository files navigation

[ECCV 2024] SNP: Structured Neuron-level Pruning to Preserve Attention Scores

Proposed Method

Benchmark on ImageNet-1K

Installation

Getting Started

Sign Up for NetsPresso

Simple Run

Reproduce the ImageNet-1K results

Overall Instructions for SNP

Try SNP on Your Own Model

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages