首站-论文投稿智能助手
典型文献
On the learning dynamics of two-layer quadratic neural networks for understanding deep learning
文献摘要:
Deep learning performs as a powerful paradigm in many real-world applications;however,its mechanism remains much of a mystery.To gain insights about nonlinear hierar-chical deep networks,we theoretically describe the coupled nonlinear learning dynamic of the two-layer neural network with quadratic activations,extending existing results from the linear case.The quadratic activation,although rarely used in practice,shares convexity with the widely used ReLU activa-tion,thus producing similar dynamics.In this work,we focus on the case of a canonical regression problem under the standard normal distribution and use a coupled dynamical system to mimic the gradient descent method in the sense of a continuous-time limit,then use the high order moment tensor of the normal distribution to simplify these ordinary differential equations.The simplified system yields unexpected fixed points.The existence of these non-global-optimal stable points leads to the existence of saddle points in the loss surface of the quadratic networks.Our analysis shows there are conserved quantities during the training of the quadratic networks.Such quantities might result in a failed learning process if the network is initialized improperly.Finally,We illustrate the comparison between the numerical learning curves and the theoretical one,which reveals the two alternately appearing stages of the learning process.
文献关键词:
作者姓名:
Zhenghao TAN;Songcan CHEN
作者机构:
College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,MIIT Key Laboratory of Pattern Analysis and Machine Intelligence,Nanjing 211106,China
文献出处:
引用格式:
[1]Zhenghao TAN;Songcan CHEN-.On the learning dynamics of two-layer quadratic neural networks for understanding deep learning)[J].计算机科学前沿,2022(03):75-80
A类:
B类:
On,learning,dynamics,layer,quadratic,neural,networks,understanding,deep,Deep,performs,powerful,paradigm,many,real,world,applications,however,its,mechanism,remains,much,mystery,To,gain,insights,about,nonlinear,hierar,chical,theoretically,describe,coupled,activations,extending,existing,results,from,case,although,rarely,used,practice,shares,convexity,widely,ReLU,thus,producing,similar,In,this,focus,canonical,regression,problem,standard,normal,distribution,dynamical,system,mimic,gradient,descent,method,sense,continuous,limit,then,high,order,moment,tensor,simplify,these,ordinary,differential,equations,simplified,yields,unexpected,fixed,points,existence,global,optimal,stable,leads,saddle,loss,surface,Our,analysis,shows,there,conserved,quantities,during,training,Such,might,failed,process,initialized,improperly,Finally,We,illustrate,comparison,between,numerical,curves,one,which,reveals,alternately,appearing,stages
AB值:
0.633245
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。