广西科学

引用本文：

吴德刚,赵利平,陈乾辉,张宇波.基于双尺度串行网络的视频异常行为检测[J].广西科学,2023,30(3):575-586. [点击复制]
WU Degang,ZHAO Liping,CHEN Qianhui,ZHANG Yubo.Video Abnormal Behavior Detection with Dual-Scale Serial Network[J].Guangxi Sciences,2023,30(3):575-586. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 356次下载 439次	码上扫一扫！
基于双尺度串行网络的视频异常行为检测
吴德刚¹, 赵利平², 陈乾辉¹, 张宇波³
0 字体:加大+\|默认\|缩小-
(1.商丘工学院机械工程学院, 河南商丘 476000;2.商丘工学院信息与电子工程学院, 河南商丘 476000;3.郑州大学电气与信息工程学院, 河南郑州 450000)

摘要:

针对传统视频异常行为检测模型存在的性能不佳与时间开销较大的问题，从空间和时序维度构造双尺度串行网络的视频异常行为检测模型(Dual-Scale Serial Network,DSS-Net)。首先，利用深度可分离卷积对Vgg-16网络进行改进，并利用改进的特征提取器从空间维度提取特征，从而可以通过减少计算参数量来降低模型的时间开销。接着，在此基础上引入注意力机制，从而强化目标特征的表达能力。最后，利用长短期记忆(Long Short-Term Memory,LSTM)网络从时序维度提取运动视频每一帧之间的上下文时序关系。在当前主流的UCSD Ped1和Ped2数据集以及更具挑战性的UCF数据集上进行测试，结果表明，在3个数据集上DSS-Net的ROC(Receiver Operating Characteristic)线下面积(Area Under Curve,AUC)值分别达到95.30%、96.80%、80.60%，等错误率(Equal Error Rate,EER)分别达到10.60%、12.60%、18.50%，同时具有更强的实时性。相比经典的One-class Neural Network (ONN)和Aggregation of Ensembles (AOE)模型，DSS-Net在Ped1和Ped2数据集上的AUC值分别提升了0.42%和0.94%。此外，DSS-Net也在UMN、ShanghaiTech和CUHK Avenue等数据集上进行了泛化能力和鲁棒性的测试，结果与当前主流模型相比具有一定的竞争力。

关键词: 视频异常行为检测空间维度时序维度深度可分离卷积注意力机制

DOI：10.13656/j.cnki.gxkx.20230710.017

投稿时间：2022-10-31修订日期：2023-01-09

基金项目:河南省高等学校青年骨干教师培养计划项目(2018GGJS190)和商丘工学院2022科研项目(2022KYXM02)资助。

Video Abnormal Behavior Detection with Dual-Scale Serial Network

WU Degang¹, ZHAO Liping², CHEN Qianhui¹, ZHANG Yubo³

(1.College of Mechanical Engineering, Shangqiu Institute of Technology, Shangqiu, Henan, 476000, China;2.College of Information and Electronic Engineering, Shangqiu Institute of Technology, Shangqiu, Henan, 476000, China;3.College of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, Henan, 450000, China)

Abstract:

Aiming at the problems of poor performance and large time overhead in traditional video abnormal behavior detection models,a video abnormal behavior detection model (Dual-Scale Serial Network,DSS-Net) based on dual-scale serial network is constructed from spatial and temporal dimensions.Firstly,the Vgg-16 network is improved by using deep separable convolution,and the improved feature extractor is used to extract features from the spatial dimension,so that the time overhead of the model can be reduced by reducing the number of calculation parameters.Then,on this basis,the attention mechanism is introduced to strengthen the expression ability of the target features.Finally,the Long Short-Term Memory (LSTM) network is used to extract the context temporal relationship between each frame of motion video from the temporal dimension.The test was carried out on the current mainstream UCSD Ped1 and Ped2 datasets and the more challenging UCF dataset.The results show that the Receiver Operating Characteristic (ROC) Area Under Curve (AUC) values of the proposed model on the three data sets reach 95.30%,96.80% and 80.60% respectively,and the Equal Error Rate (EER) reaches 10.60%,12.60% and 18.50%,respectively,and it has stronger real-time performance.Compared with the classical One-class Neural Network (ONN) and Aggregation of Ensembles (AOE) models,the AUC values of the proposed model DSS-Net are increased by 0.42% and 0.94% on the Ped1 and Ped2 datasets,respectively.In addition,the proposed model DSS-Net is also tested for generalization ability and robustness on data sets such as UMN,ShanghaiTech,and CUHK Avenue,and the results are also competitive compared with the current mainstream models.

Key words: video abnormal behavior detection spatial dimensions temporal dimensions deeply separable convolution attention mechanism

用微信扫一扫