摘要: |
近年来,多模态显式语义内容的研究取得了显著进展,但针对深度隐晦语义内容(如网络梗图)的探索仍然较为有限。网络梗图(meme)通常通过文本与图像的结合传递复杂的隐含语义,甚至可能包含有害信息。因此,有害网络meme检测逐渐成为研究深度隐晦语义的重要方向。然而,现有方法在以下两个方面存在显著不足:一方面,现有模型主要依赖于文本和图像的表层特征进行检测,未能有效弥合图文表征与深度隐晦语义之间的语义鸿沟;另一方面,现有数据集规模较小,限制了模型性能的进一步提升。为解决上述问题,本文从社会科学的角度重新审视meme的定义,并总结出其两大核心特性:1)网络meme作为文化传播载体,其解读高度依赖于社会文化知识的桥接作用;2)网络meme的传播往往依赖模仿机制,通过参考相似样本能够有效辅助对其隐晦语义的理解。在此基础上,本文提出了一种社会文化知识和代表性样本驱动的多模态深度隐晦语义检测框架。该框架包含以下三个关键模块:(1)多模态生成式社会文化知识搜索模块,用于获取与meme中视觉对象相关的社会文化知识;(2)代表性样本检索模块,用于检索与输入meme高度相关的相似样本;(3)自适应特征关系加权模块,用于抑制因引入社会文化知识和代表性样本可能带来的噪声干扰,同时强化关键信息的作用。实验结果表明,本文提出的框架在有害meme检测任务中表现出优异的性能,且通过社会文化知识与代表性样本的有效结合显著提升了模型的可解释性。 |
关键词: 多模态,有害内容检测,注意力机制,生成式搜索,深度隐晦语义,信息安全 |
DOI: |
投稿时间:2025-02-25修订日期:2025-03-06 |
基金项目:国家自然科学基金资助项目(No.62262011),广西重大专项(GuikeAA23062001),广西重点研发计划(GuikeAB23026036) |
|
Multimodal Deep Implicit Semantic Detection Driven by Sociocultural Knowledge and Representative Samples |
Wangxiude1, Xiexiaolan1, Wangxiuxian2
|
(1.School of Computer Science and Engineering, Guilin University of Technology;2.School of Electrical Automation and Information Engineering, Tianjin University) |
Abstract: |
In recent years, significant progress has been made in the study of multimodal explicit semantic content; however, research on deep implicit semantic content, such as internet memes, remains relatively limited. Internet memes typically convey complex implicit semantics through the combination of text and images, which may even include harmful information. Consequently, harmful meme detection has emerged as a crucial direction for exploring deep implicit semantics. Existing methods, however, exhibit notable limitations in two key aspects: first, they primarily rely on the surface-level features of text and images for detection, failing to effectively bridge the semantic gap between multimodal representations and deep implicit semantics; second, the small scale of existing datasets significantly constrains further improvement in model performance. To address these issues, this study revisits the definition of metaphor from a social science perspective and identifies two core characteristics: (1) internet memes, as cultural transmission carriers, rely heavily on the bridging role of sociocultural knowledge for interpretation; and (2) the dissemination of internet memes often depends on imitation mechanisms, where referencing similar samples can effectively assist in understanding their implicit semantics. Based on these characteristics, this paper proposes a sociocultural knowledge- and representative sample-driven multimodal framework for deep implicit semantic detection. The framework consists of three key modules: (1) The multimodal generative sociocultural knowledge retrieval module, which retrieves sociocultural knowledge related to the visual objects in the meme; (2) The representative sample retrieval module, which identifies highly relevant similar samples for the input meme; and (3) The adaptive feature relationship weighting module, which suppresses noise introduced by sociocultural knowledge and representative samples while enhancing the contribution of critical information. Experimental results demonstrate that the proposed framework achieves superior performance in harmful meme detection tasks and that the effective integration of sociocultural knowledge and representative samples significantly improves the model"s interpretability. |
Key words: Multi-modality, Harmful Content Detection, Attention Mechanisms, Generative Search, Deep Implicit Semantics, Information Security |