Skip to content

Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]

License

Notifications You must be signed in to change notification settings

Nota-NetsPresso/SNP

Repository files navigation

Official implementation of the paper "SNP: Structured Neuron-level Pruning to Preserve Attention Scores" accepted at European Conference on Computer Vision (ECCV) 2024.

Description1 Description2

Structured Neuron-level Pruning (SNP) prunes neurons with less informative attention scores and eliminates redundancy among heads. Our approach effectively accelerates Transformer-based models for both edge devices and server processors. SNP with head pruning can compress the DeiT-Base by 80% of the parameters and computational costs and achieve 4.93× speed up on Jetson Nano and 3.85× on RTX3090.

Proposed Method

SNP prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. For more details, please refer to the main paper.

Description1

Benchmark on ImageNet-1K

Inference speed and Top-1 accuracy of the compressed model across different devices. Latency is benchmarked with 200 warmup runs and averaged over 1000 runs (all units in the table are in milliseconds). A single image is used as the batch size, except for the RTX 3090, where a batch size of 64 images is employed.

Model Top-1 (%) GFLOPs Raspberry Pi 4B (.onnx) Jetson Nano (.trt) Xeon Silver 4210R (.pt) RTX 3090 (.pt)
DeiT-Tiny 72.2 1.3 139.1 41.0 34.7 18.7
+ SNP (Ours) 70.2 0.6 81.6 (1.70×) 26.7 (1.54×) 25.3 (1.38×) 17.8 (1.05×)
DeiT-Small 79.8 4.6 401.3 99.3 53.4 46.1
+ SNP (Ours) 78.5 2.0 199.2 (2.01×) 45.5 (2.18×) 38.6 (1.38×) 32.9 (1.40×)
+ SNP (Ours) 73.3 1.3 136.7 (2.94×) 32.0 (3.10×) 33.5 (1.60×) 27.0 (1.71×)
DeiT-Base 81.8 17.6 1377.7 293.3 122.0 151.4
+ SNP (Ours) 79.6 6.4 565.7 (2.44×) 132.6 (2.21×) 64.7 (1.89×) 73.00 (2.07×)
+ SNP (Ours) + Head 79.1 3.5 307.0 (4.48×) 59.5 (4.93×) 46.1 (2.65×) 39.3 (3.85×)
EfficientFormer-L1 79.2 1.3 169.1 31.0 43.8 26.2
+ SNP (Ours) 75.5 0.6 95.1 (1.78×) 19.8 (1.56×) 38.3 (1.14×) 17.2 (1.52×)
+ SNP (Ours) 74.5 0.5 82.6 (2.05×) 17.8 (1.74×) 35.2 (1.24×) 16.0 (1.64×)

Installation

conda create -n snp python=3.8
conda activate snp
git clone https://github.com/Nota-NetsPresso/SNP.git
cd SNP
pip install -r requirements.txt

Getting Started

Sign Up for NetsPresso

To compress the DeiT model using SNP, you need to sign up for a NetsPresso account. You can sign up here or go directly to the Sign Up page.

Simple Run

Following steps compress the DeiT-T model using SNP and train it for 20 epochs:

  1. Run the main script:
    bash main.sh
    
  2. When prompted, enter your NetsPresso user information:
    Please enter your NetsPresso Email:
    Please enter your NetsPresso Password:
    
  3. Enter the path to your ImageNet-1K dataset:
    Please enter the path to your ImageNet dataset:
    

Reproduce the ImageNet-1K results

Compressed DeiT-T: 0.6 GFLOPs and 70.29% Top-1 Acc.:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
    python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
        train.py --model "./reported_models/compressed_models/DeiT-T.pt" \
                --lr 0.00025 \
                --batch-size 256 \
                --epochs 300 \
                --output_dir ${OUPUT_DIR} \
                --data-path  ${IMAGENET_PATH}\
                > ./txt_logs/training_deit_t.txt 2>&1 &
Compressed DeiT-S: 2.0 GFLOPs and 78.52% Top-1 Acc.:
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
        python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
            train.py --model "./reported_models/compressed_models/DeiT-S_2GFLOPs.pt" \
                    --lr 0.00025 \
                    --batch-size 256 \
                    --epochs 300 \
                    --output_dir ${OUPUT_DIR} \
                    --data-path  ${IMAGENET_PATH}\
                    > ./txt_logs/training_deit_s_2GFLOPs.txt 2>&1 &
Compressed DeiT-S with 1.3 GFLOPs and 73.32% Top-1 Acc.:
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
        python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
            train.py --model "./reported_models/compressed_models/DeiT-S_1_3GFLOPs.pt" \
                    --lr 0.00025 \
                    --batch-size 256 \
                    --epochs 300 \
                    --output_dir ${OUPUT_DIR} \
                    --data-path  ${IMAGENET_PATH}\
                    > ./txt_logs/training_deit_s_1_27GFLOPs.txt 2>&1 &

Overall Instructions for SNP

  1. To compress the DeiT model, use the following command:

    python compress.py --NetsPresso-Email ${USER_NAME} \
                        --NetsPresso-Pwd ${USER_PWD} \
                        --model deit_tiny_patch16_224 \
                        --data-path ${IMAGENET_PATH}\
                        --output_dir ${OUPUT_DIR} \
                        --num-imgs-snp-calculation 64\
  2. To train the compressed model (saved in the compressed directory within output_dir), use the following command:

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\
        python -m torch.distributed.launch --nproc_per_node 8 --master_addr="127.0.0.1" --master_port=12345 \
            train.py --model "${OUPUT_DIR}/compressed/compressed.pt" \
                    --batch-size 256 \
                    --epochs 300 \
                    --output_dir ${OUPUT_DIR} \
                    --data-path  ${IMAGENET_PATH}\
                    > ./txt_logs/training_test.txt 2>&1 &

Try SNP on Your Own Model



from netspresso import NetsPresso
from netspresso.enums import CompressionMethod, GroupPolicy, LayerNorm, Policy
from netspresso.clients.compressor.v2.schemas import Options

# Step 0: Login to NetsPresso
netspresso = NetsPresso(email=args.NetsPresso_Email, password=args.NetsPresso_Pwd)

# Step 1: Declare the compressor
compressor = netspresso.compressor_v2()

# Step 2: Upload the model
# Provide the path to your model and specify the input shape
model = compressor.upload_model(
    input_model_path=${MODEL_PATH},
    input_shapes=[{"batch": 1, "channel": 3, "dimension": [224, 224]}],
)

# Step 3: Select the compression method
# Specify the compression method and options
compression_info = compressor.select_compression_method(
    model_id=model.ai_model_id,
    compression_method=CompressionMethod.PR_SNP,
    options=Options(
        policy=Policy.AVERAGE,
        layer_norm=LayerNorm.TSS_NORM,
        group_policy=GroupPolicy.NONE,
        reshape_channel_axis=-1,
    ),
)

# Step 4: Load the compression ratio for each layer
# Assign the compression ratio for each available layer
for available_layer in compression_info.available_layers:
    available_layer.values = [${COMPRESS_RATIO}[available_layer.name]]

# Step 5: Compress the model
# Perform the compression and save the compressed model
compressed_model_info = compressor.compress_model(
    compression=compression_info,
    output_dir=${SAVE_DIR},
)

# Load the compressed model
compressed_model=torch.load(compressed_model_info.compressed_model_path)

# After compressing the model, the user needs to train the compressed model to compensate for the performance loss.

License

  • All rights related to this repository and the compressed models are reserved by Nota Inc.
  • The intended use is strictly limited to research and non-commercial projects.

Citation

@article{shim2024snp,
  title={SNP: Structured Neuron-level Pruning to Preserve Attention Scores},
  author={Shim, Kyunghwan and Yun, Jaewoong and Choi, Shinkook},
  journal={arXiv preprint arXiv:2404.11630},
  year={2024},
  url={https://arxiv.org/abs/2404.11630}
}

About

Structured Neuron Level Pruning to compress Transformer-based models [ECCV'24]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published