本文已被:浏览 30次 下载 0次 |
|
基于多尺度特征提取的密集型小目标检测网络 |
元昌安1, 王文姬2, 黄豪杰3, 覃正优2, 张金勇2, 廖惠仙4, 覃 晓2, 李小森5, 李永玉2, 符云琴2, 谭思婧2, 钱泉梅2, 吴琨生6
|
|
(1.广西科学院;2.南宁师范大学广西人机交互与智能决策重点实验室;3.广西壮族自治区通信产业服务有限公司技术服务分公司;4.广东财贸职业学院数字技术学院;5.广西民族大学人工智能学院;6.广西壮族自治区南宁树木园) |
|
摘要: |
针对现有的无锚框目标检测算法难以在密集场景下有效提取多尺度目标特征的问题,本文提出基于多尺度特征提取的密集型小目标检测网络(Intensive Small Target Detection Network Based On Multi-scale Feature Extraction,IMSE)。首先,本文提出多尺度特征增强模块,其包括窗口注意力模块和多尺度信息融合模块,通过建立全局上下文联系增强网络在密集场景下的特征表达,更有效地提取检测目标的多尺度特征;其次,提出可变形卷积特征金字塔,将多尺度特征增强模块的输出特征进行多尺度特征融合,有效解决当检测形状不规则、分布无规律的物体时容易出现的漏检、错检和重检问题;最后将融合后的多尺度特征分别输入检测头进行分类与边界框的回归任务。模型在公共数据集MS COCO、CARPK与基于实际生产场景构建的WOOD数据集上进行验证。实验结果表明IMSE在三个数据集上均取得了较好的结果,验证了模型的有效性。 |
关键词: 目标检测 自注意力机制 FCOS 特征金字塔 空洞卷积 可变形卷积 |
DOI: |
投稿时间:2024-07-22修订日期:2024-08-21 |
基金项目:广西科技重大专项(桂科AA22068057,桂科AB21076021) |
|
Intensive Small Target Detection Network Based On Multi-scale Feature Extraction |
YUAN Changan1, WANG Wenji2, HUANG Haojie3, QIN Zhengyou2, ZHANG Jinyong2, LIAO Huixian4, QIN Xiao2, LI Xiaosen5, LI Yongyu2, FU Yunqin2, TAN Sijing2, QIAN Quanmei2, WU Kunsheng6
|
(1.Guangxi Academy of Sciences,Nanning,Guangxi;2.Guangxi Key Laboratory of Human-Computer Interaction and Intelligent Decision Making,Nanning Normal University,Nanning,Guangxi;3.Guangxi Technical Service Company,China Communications Services Corporation Limited;4.College of Digital Technology,Guangdong Vocational College of Finance and Trade,Qingyuan,Guangdong;5.School of Artiffcial Intelligence,Guangxi Minzu University,Nanning;6.Nanning Arboretum,Guangxi Zhuang Autonomous Region,Nanning,Guangxi) |
Abstract: |
Aiming at the problem that existing target detection algorithms without anchor frame are difficult to extract multi-scale target features effectively in dense scenes, An Intensive Small Target Detection Network Based On Multi-scale Feature Extraction (IMSE) is proposed in this paper. Firstly, this paper proposes a multi-scale feature enhancement module, which includes window attention module and multi-scale information fusion module, to enhance the feature expression of the network in dense scenes by establishing global context connection, so as to extract multi-scale features of detection targets more effectively. Firstly, this article proposes a multi-scale feature enhancement module, which includes a window attention module and a multi-scale information fusion module. By establishing global contextual connections to enhance the feature expression of the network in dense scenes, the multi-scale features of the detection target are more effectively extracted. Secondly, a deformable convolutional feature pyramid is proposed to fuse the output features of the multi-scale feature enhancement module into multi-scale features, effectively solving the problems of missed detection, false detection, and re detection that are prone to occur when detecting objects with irregular shapes and irregular distributions. Finally, the fused multi-scale features are input into the detection head for classification and bounding box regression tasks. The model was validated on public datasets MS COCO, CARPK, and WOOD datasets constructed based on actual production scenarios. The experimental results showed that IMSE achieved good results on all three datasets, verifying the effectiveness of the model. |
Key words: Object Detection Self-Attention FCOS FPN Atrous Convolution Deformable Conv |