Shortcuts

RepLKNet

摘要

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We suggested five guidelines, e.g., applying re-parameterized large depth-wise convolutions, to design efficient highperformance large-kernel CNNs. Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large kernel CNNs have much larger effective receptive fields and higher shape bias rather than texture bias.

使用方式

from mmpretrain import inference_model, get_model

model = get_model('replknet-31B_3rdparty_in1k', pretrained=True)
model.backbone.switch_to_deploy()
predict = inference_model(model, 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])

Models and results

Image Classification on ImageNet-1k

模型

预训练

Params (M)

Flops (G)

Top-1 (%)

Top-5 (%)

配置文件

下载

replknet-31B_3rdparty_in1k*

从头训练

79.86

15.64

83.48

96.57

config

model

replknet-31B_3rdparty_in1k-384px*

从头训练

79.86

45.95

84.84

97.34

config

model

replknet-31B_in21k-pre_3rdparty_in1k*

ImageNet-21k

79.86

15.64

85.20

97.56

config

model

replknet-31B_in21k-pre_3rdparty_in1k-384px*

ImageNet-21k

79.86

45.95

85.99

97.75

config

model

replknet-31L_in21k-pre_3rdparty_in1k-384px*

ImageNet-21k

172.67

97.24

86.63

98.00

config

model

replknet-XL_meg73m-pre_3rdparty_in1k-320px*

MEG73M

335.44

129.57

87.57

98.39

config

model

Models with * are converted from the official repo. The config files of these models are only for inference. We haven’t reproduce the training results.

引用

@inproceedings{ding2022scaling,
  title={Scaling up your kernels to 31x31: Revisiting large kernel design in cnns},
  author={Ding, Xiaohan and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11963--11975},
  year={2022}
}
Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.