Visuals to Text:A Comprehensive Review on Automatic Image Captioning|Yue Ming;Nannan Hu;Chunxiao Fan;Fan Feng;Jiangwan Zhou;Hui Yu|School of Creative Technologies,University of Ports-mouth,Portsmouth PO1 2DJ,UK - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

Visuals to Text:A Comprehensive Review on Automatic Image Captioning

文献摘要：

Image captioning refers to automatic generation of descriptive texts according to the visual content of images.It is a technique integrating multiple disciplines including the computer vision(CV),natural language processing(NLP)and artificial intelligence.In recent years,substantial research efforts have been devoted to generate image caption with impressive progress.To summarize the recent advances in image captioning,we present a comprehensive review on image captioning,covering both traditional methods and recent deep learning-based techniques.Specifically,we first briefly review the early traditional works based on the retrieval and template.Then deep learning-based image captioning researches are focused,which is categorized into the encoder-decoder framework,attention mechanism and training strategies on the basis of model structures and training manners for a detailed introduction.After that,we summarize the publicly available datasets,evaluation metrics and those proposed for specific requirements,and then compare the state of the art methods on the MS COCO dataset.Finally,we provide some discussions on open challenges and future research directions.

文献关键词：

中图分类号：

[1] 自动化技术、计算机技术（TP） / 计算技术、计算机技术（TP3） / 计算机的应用（TP39） / 信息处理(信息加工)（TP391）

[2] 数理科学和化学（O） / 数学（O1） / 几何、拓扑（O18） / 代数几何（O187） / 代数曲线、代数曲面（O187.1）

[3] 医药、卫生（R） / 基础医学（R3） / 病理学（R36） / 病理过程（R364）

作者姓名：

Yue Ming;Nannan Hu;Chunxiao Fan;Fan Feng;Jiangwan Zhou;Hui Yu

作者机构：

Beijing University of Posts and Telecommunications,Beijing 100876,China;School of Creative Technologies,University of Ports-mouth,Portsmouth PO1 2DJ,UK

文献出处：

自动化学报（英文版）

引用格式：

[1]Yue Ming;Nannan Hu;Chunxiao Fan;Fan Feng;Jiangwan Zhou;Hui Yu-.Visuals to Text:A Comprehensive Review on Automatic Image Captioning)[J].自动化学报（英文版）,2022(08):1339-1365

A类：

Visuals,caption

B类：

Text,Comprehensive,Review,Automatic,Image,Captioning,captioning,refers,automatic,generation,descriptive,texts,according,visual,content,images,It,integrating,multiple,disciplines,including,computer,vision,CV,natural,language,processing,NLP,artificial,intelligence,In,recent,years,substantial,efforts,have,been,devoted,generate,impressive,progress,To,summarize,advances,we,present,comprehensive,review,covering,both,traditional,methods,deep,learning,techniques,Specifically,first,briefly,early,works,retrieval,template,Then,researches,focused,which,categorized,into,encoder,decoder,framework,attention,mechanism,training,strategies,basis,model,structures,manners,detailed,introduction,After,that,publicly,available,datasets,evaluation,metrics,those,proposed,specific,requirements,then,compare,state,COCO,Finally,provide,some,discussions,open,challenges,future,directions

AB值：

0.668913

相似文献

Backdoor Attacks on Image Classification Models in Deep Neural Networks

ZHANG Quanxin;MA Wencong;WANG Yajie;ZHANG Yaoyuan;SHI Zhiwei;LI Yuanzhang-Beijing Institute of Technology,Beijing 100036,China;China Information Technology Security Evaluation Center,Beijing 100085,China

Machine Learning for Cataract Classification/Grading on Ophthalmic Imaging Modalities:A Survey

Xiao-Qing Zhang;Yan Hu;Zun-Jie Xiao;Jian-Sheng Fang;Risa Higashita;Jiang Liu-Research Institute of Trustworthy Autonomous Systems,Southern University of Science and Technology,Shenzhen 518055,China;Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen 518055,China;Tomey Corporation,Nagoya 4510051,Japan;Cixi Institute of Biomedical Engineering,Ningbo Institute of Materials Technology and Engineering,Chinese Academy of Sciences,Ningbo 315300,China;Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation,Southern University of Science and Technology,Shenzhen 518055,China

Facial-sketch Synthesis:A New Challenge

Deng-Ping Fan;Ziling Huang;Peng Zheng;Hong Liu;Xuebin Qin;Luc Van Gool-Computer Vision Laboratory,ETH Zurich,Zurich 8092,Switzerland;Information and Communication Engineering,University of Tokyo,Tokyo 113-8654,Japan;Computer Vision,Mohamed bin Zayed University of Artificial Intelligence,Abu Dhabi,UAE;Digital Content and Media Sciences Research Division,National Institute of Informatics,Tokyo 101-8430,Japan

Image De-occlusion via Event-enhanced Multi-modal Fusion Hybrid Network

Si-Qi Li;Yue Gao;Qiong-Hai Dai-Beijing National Research Center for Information Science and Technology,Tsinghua University,Beijing 100084,China;Institute for Brain and Cognitive Sciences,Tsinghua University,Beijing 100084,China;Beijing Laboratory of Brain and Cognitive Intelligence,Beijing Municipal Education Commission,Tsinghua University,Beijing 100084,China;Key Laboratory for Information System Security,School of Software,Tsinghua University,Beijing 100084,China;Department of Automation,Tsinghua University,Beijing 100084,China

A Novel Attention-based Global and Local Information Fusion Neural Network for Group Recommendation