关于 kaiming initialization 论文中的一个疑惑与想法 - V2EX
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
bstjanced555
V2EX    机器学习

关于 kaiming initialization 论文中的一个疑惑与想法

  •  
  •   bstjanced555 2022-01-18 14:01:55 +08:00 1751 次点击
    这是一个创建于 1436 天的主题,其中的信息可能已经有所发展或是发生改变。

    最近细读两篇 initial 的论文,Xavier 和 Kaiming ,在读 Kaiming 时有个疑惑,不过疑问提出的两天后,晚上炒菜的时候忽然想明白了,这里写篇帖子记录下我的疑问和解释,有兴趣的人可以一起看,如果哪里说错了大家也好指正。

    论文地址:

    1. Xavier 初始化论文 Understanding the difficulty of training deep feedforward neural networks
    2. Kaiming 初始化论文 Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification

    总的来讲,Kaiming 和 Xavier 的初始化的思路是一样的,都是为了保证前向传播和后向传播的信号稳定,避免出现放大或缩小的情况,他们都是确保前传和后传的方差表现一致,只是用的激活函数不一样。

    Xavier 是 2010 年提出来的,他用的激活是对称双曲族(symmetric hyperbolic distribution),论文里验证用的是 tanh 。Kaiming 是 2015 年提出来的,他用的是修正线性单元族( Rectified Linear Unit, ReLU ),同时在此之上提出了一个负向斜率由网络本身自由控制的 PReLU 。

    以上是两篇论文的基本情况,我的疑问在 Kaiming 论文的第 4 页右侧 Backward Propagation Case 里,也就是 kaiming 初始化后向传播推导过程里,里面有一段“In back-propagation we also have Delta y_l=f'(y_l)Delta x_(l+1) where f' is the derivative of f. For the ReLU case, f'(yl) is zero or one, and their probabilities are equal.”我的疑问在最后一句:为什么 their probabilities are equal ?

    根据该论文 2.1 单元最后一段提出的两点有趣的现象:

    Table 1 also shows the learned coefficients of PReLUs for each layer. There are two interesting phenomena in Table 1. First, the first conv layer (conv1) has coefficients (0.681 and 0.596) significantly greater than 0. As the filters of conv1 are mostly Gabor-like filters such as edge or texture detectors, the learned results show that both positive and negative responses of the filters are respected. We be lieve that this is a more economical way of exploiting lowlevel information, given the limited number of filters (e.g.,64). Second, for the channel-wise version, the deeper conv layers in general have smaller coefficients. This implies that the activations gradually become “more nonlinear” at increasing depths. In other words, the learned model tends to keep more information in earlier stages and becomes more discriminative in deeper stages.

    这段我个人觉得可以这么理解,当深度卷积网络使用 PReLU 时,作者发现上层的负斜率较大,随深度的增加,斜率逐渐变小,越深越非线性,上层会 keep more information ,特征会混杂在一起,随深度的增加,会 more discriminative ,特征也会渐渐提纯,是不是可以理解为越到下方提取的特征数据会呈现稀疏的特性?那么数据在经过线性单元之前就不会在 0 处五五开,也就回到了上面我的疑问,their probabilities should not be equal.

    当时对于我乱七八糟的想法也没人可以讨论,只能搁置一边,不过就提出疑问后的隔两天晚上,在做笋丁炒肉时忽然想明白了,我的想法是:Kaiming 论文里 two interesting phenomena 是基于 Table1 的结论,而 table1 是训练完成后的最终状态,而初始化时只需要考虑初始状态即可,初始的时候一般用的都是 x 轴对称的分布去初始化参数,自然五五开,至于后面是几几开就由网络自己去调整 bia

    目前尚无回复
    关于     帮助文档     自助推广系统     博客     API     FAQ     Solana     2943 人在线   最高记录 6679       Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 33ms UTC 13:46 PVG 21:46 LAX 05:46 JFK 08:46
    Do have faith in what you're doing.
    ubao msn snddm index pchome yahoo rakuten mypaper meadowduck bidyahoo youbao zxmzxm asda bnvcg cvbfg dfscv mmhjk xxddc yybgb zznbn ccubao uaitu acv GXCV ET GDG YH FG BCVB FJFH CBRE CBC GDG ET54 WRWR RWER WREW WRWER RWER SDG EW SF DSFSF fbbs ubao fhd dfg ewr dg df ewwr ewwr et ruyut utut dfg fgd gdfgt etg dfgt dfgd ert4 gd fgg wr 235 wer3 we vsdf sdf gdf ert xcv sdf rwer hfd dfg cvb rwf afb dfh jgh bmn lgh rty gfds cxv xcv xcs vdas fdf fgd cv sdf tert sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf sdf shasha9178 shasha9178 shasha9178 shasha9178 shasha9178 liflif2 liflif2 liflif2 liflif2 liflif2 liblib3 liblib3 liblib3 liblib3 liblib3 zhazha444 zhazha444 zhazha444 zhazha444 zhazha444 dende5 dende denden denden2 denden21 fenfen9 fenf619 fen619 fenfe9 fe619 sdf sdf sdf sdf sdf zhazh90 zhazh0 zhaa50 zha90 zh590 zho zhoz zhozh zhozho zhozho2 lislis lls95 lili95 lils5 liss9 sdf0ty987 sdft876 sdft9876 sdf09876 sd0t9876 sdf0ty98 sdf0976 sdf0ty986 sdf0ty96 sdf0t76 sdf0876 df0ty98 sf0t876 sd0ty76 sdy76 sdf76 sdf0t76 sdf0ty9 sdf0ty98 sdf0ty987 sdf0ty98 sdf6676 sdf876 sd876 sd876 sdf6 sdf6 sdf9876 sdf0t sdf06 sdf0ty9776 sdf0ty9776 sdf0ty76 sdf8876 sdf0t sd6 sdf06 s688876 sd688 sdf86