广西科学

引用本文：

本文已被：浏览 130次下载 0次
基于多尺度特征提取的密集型小目标检测网络
元昌安¹, 王文姬², 黄豪杰³, 覃正优², 张金勇², 廖惠仙⁴, 覃晓², 李小森⁵, 李永玉², 符云琴², 谭思婧², 钱泉梅², 吴琨生⁶
0 字体:加大+\|默认\|缩小-
(1.广西科学院;2.南宁师范大学广西人机交互与智能决策重点实验室;3.广西壮族自治区通信产业服务有限公司技术服务分公司;4.广东财贸职业学院数字技术学院;5.广西民族大学人工智能学院;6.广西壮族自治区南宁树木园)

摘要:

针对现有的无锚框目标检测算法难以在密集场景下有效提取多尺度目标特征的问题，本研究提出基于多尺度特征提取的密集型小目标检测网络（Intensive Small Target Detection Network Based On Multi-scale Feature Extraction，IMSE）。本研究首先提出多尺度特征增强模块，其包括窗口注意力模块和多尺度信息融合模块，通过建立全局级别的上下文联系从而增强IMSE在密集场景下的特征表达，进而能够更有效地提取检测目标的多尺度特征；其次，提出可变形卷积特征金字塔结构，引入空洞卷积进行特征增强，从而能够有效提高IMSE检测形状不规则、分布无规律物体的能力；最后将融合后的多尺度特征分别输入检测头进行分类与边界框的回归任务。IMSE在公共数据集MS COCO、CARPK与基于实际生产场景构建的WOOD数据集上进行验证，实验结果表明，IMSE在3个数据集上的平均精度（AP）分别达到了49.4%、75.8%和55.0%，分别比原始FCOS方法高出1.8%、1.4%和2.1%，验证了所提出模型的有效性。

关键词: 目标检测自注意力机制特征金字塔空洞卷积可变形卷积

DOI：

投稿时间：2024-07-22修订日期：2025-02-01

基金项目:广西科技重大专项（桂科AA22068057,桂科AB21076021）

Intensive Small Target Detection Network Based On Multi-scale Feature Extraction

YUAN Changan¹, WANG Wenji², HUANG Haojie³, QIN Zhengyou², ZHANG Jinyong², LIAO Huixian⁴, QIN Xiao², LI Xiaosen⁵, LI Yongyu², FU Yunqin², TAN Sijing², QIAN Quanmei², WU Kunsheng⁶

(1.Guangxi Academy of Sciences,Nanning,Guangxi;2.Guangxi Key Laboratory of Human-Computer Interaction and Intelligent Decision Making,Nanning Normal University,Nanning,Guangxi;3.Guangxi Technical Service Company,China Communications Services Corporation Limited;4.College of Digital Technology,Guangdong Vocational College of Finance and Trade,Qingyuan,Guangdong;5.School of Artiffcial Intelligence,Guangxi Minzu University,Nanning;6.Nanning Arboretum,Guangxi Zhuang Autonomous Region,Nanning,Guangxi)

Abstract:

Aiming at the problem that existing target detection algorithms without anchor frame are difficult to extract multi-scale target features effectively in dense scenes, An Intensive Small Target Detection Network Based on Multi-scale Feature Extraction (IMSE) is proposed in this paper. Firstly, this paper proposes a multi-scale feature enhancement module, which includes window attention module and multi-scale information fusion module, to enhance the feature expression of the network in dense scenes by establishing global context connection, so as to extract multi-scale features of detection targets more effectively. Firstly, this article proposes a multi-scale feature enhancement module, which includes a window attention module and a multi-scale information fusion module. By establishing global level contextual connections, it enhances the feature expression of IMSE in dense scenes, and can more effectively extract multi-scale features of detection targets; Secondly, a deformable convolutional feature pyramid structure is proposed, which introduces dilated convolution for feature enhancement, thereby effectively improving the ability of IMSE to detect irregularly shaped and irregularly distributed objects; Finally, the fused multi-scale features are input into the detection head for classification and bounding box regression tasks. IMSE was validated on the public datasets MS COCO, CARPK, and the WOOD dataset constructed based on actual production scenarios. The experimental results showed that the average precision (AP) of IMSE on the three datasets reached 49.4%, 75.8%, and 55.0%, respectively, which were 1.8%, 1.4%, and 2.1% higher than the original FCOS method, respectively, verifying the effectiveness of the proposed model.

Key words: Object Detection self-Attention FPN Atrous Convolution Deformable Convolution

用微信扫一扫