典型文献
Speech-driven facial animation with spectral gathering and temporal attention
文献摘要:
In this paper,we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip.By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism,we design a light-weight speech encoder that learns useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data.To learn subject-independent facial motion,we use deformation gradients as the internal representation,which allows nuanced local motions to be better synthesized than using vertex offsets.Compared with state-of-the-art automatic-speech-recognition-based methods,our model is much smaller but achieves similar robustness and quality most of the time,and noticeably better results in certain challenging cases.
文献关键词:
中图分类号:
作者姓名:
Yujin CHAI;Yanlin WENG;Lvdi WANG;Kun ZHOU
作者机构:
State Key Lab of CAD&CG,Zhejiang University,Hangzhou 310058,China;FaceUnity Technology Inc.,Hangzhou 310011,China
文献出处:
引用格式:
[1]Yujin CHAI;Yanlin WENG;Lvdi WANG;Kun ZHOU-.Speech-driven facial animation with spectral gathering and temporal attention)[J].计算机科学前沿,2022(03):149-158
A类:
B类:
Speech,driven,facial,animation,spectral,gathering,temporal,attention,In,this,paper,efficient,algorithm,that,generates,synchronized,from,given,vocal,audio,clip,By,combining,dimensional,bidirectional,long,short,term,memory,mechanism,design,light,weight,speech,encoder,learns,useful,features,input,without,resorting,trained,recognition,modules,large,training,data,To,subject,independent,deformation,gradients,internal,representation,which,allows,nuanced,local,motions,better,synthesized,than,using,vertex,offsets,Compared,state,art,automatic,methods,our,model,much,smaller,but,achieves,similar,robustness,quality,most,noticeably,results,certain,challenging,cases
AB值:
0.698732
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。