Shortcuts

Conformer

摘要

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network.

使用方式

from mmpretrain import inference_model

predict = inference_model('conformer-tiny-p16_3rdparty_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])

Models and results

Image Classification on ImageNet-1k

模型

预训练

Params (M)

Flops (G)

Top-1 (%)

Top-5 (%)

配置文件

下载

conformer-tiny-p16_3rdparty_in1k*

从头训练

23.52

4.90

81.31

95.60

config

model

conformer-small-p16_3rdparty_in1k*

从头训练

37.67

10.31

83.32

96.46

config

model

conformer-small-p32_8xb128_in1k

从头训练

38.85

7.09

81.96

96.02

config

model

conformer-base-p16_3rdparty_in1k*

从头训练

83.29

22.89

83.82

96.59

config

model

Models with * are converted from the official repo. The config files of these models are only for inference. We haven’t reproduce the training results.

引用

@article{peng2021conformer,
      title={Conformer: Local Features Coupling Global Representations for Visual Recognition},
      author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
      journal={arXiv preprint arXiv:2105.03889},
      year={2021},
}
Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.