登录注册写文章

【transformer各函数功能2encoderlayer】2021-04-08

【transformer各函数功能2encoderlayer】2021-04-08

1.输入的维度为模型的维度，是上一层线性转化之后的模型维度，输出的维度是d_k=d_q 乘上头数

1024

64*16=1024

(layer_stack): ModuleList(

(0): EncoderLayer(

(slf_attn): MultiHeadAttention(

(w_qs): Linear(in_features=256, out_features=1024, bias=True)

(w_ks): Linear(in_features=256, out_features=1024, bias=True)

(w_vs): Linear(in_features=256, out_features=1024, bias=True)

(attention): ScaledDotProductAttention(

(dropout): Dropout(p=0.1, inplace=False)

(softmax): Softmax(dim=2)

)

(layer_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)

(fc): Linear(in_features=1024, out_features=256, bias=True)

(dropout): Dropout(p=0.1, inplace=False)

)

2.1280是d_inner 在计算attention之后用一个全连接转为256模型维度

模型维度做一个前馈传播（也是自己设定中间的维度）（看函数好像是做了一个残差连接最后的out put 加上了残差）

(pos_ffn): PositionwiseFeedForward(

(w_1): Linear(in_features=256, out_features=1280, bias=True)

(w_2): Linear(in_features=1280, out_features=256, bias=True)

(dropout): Dropout(p=0.1, inplace=False)

(layer_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)

)

)

3.返回的是enc_output，和 enc_slf_attn

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

【transformer各函数功能1词嵌入和位置参数】2021-04-07
训练词典 1.#用一个线性层去做词嵌入 use linear transformation with layer ...
star星陨阅读 4,889评论 0赞 0
【model1第一层encoder】
第一层到第六晨的encoder结构基本一样，未发现明显的区别 Transformer( (encoder): En...
star星陨阅读 3,276评论 0赞 0

PyTorch简明笔记[3]-神经网络的基本组件（Layers、functions）
前言： PyTorch的torch.nn中包含了各种神经网络层、激活函数、损失函数等等的类。我们通过torch.n...
Stack_empty阅读 12,944评论 4赞 28
AlexNet参数量、计算量的计算
参数量计算—— 卷积层的参数数量就是一个卷积核的参数乘上卷积核的个数和分别表示卷积核的高和宽，一般二者相等，表示...
李苏溪阅读 12,168评论 0赞 0
深度卷积神经网络(LetNet--> AlexNet --> VGG-->GoogleNet-...
LetNet详见上篇卷积神经网络 LetNet存在缺陷：在大的真实数据集上的表现并不尽如⼈意。神经网络计算复杂...
潇潇墨风阅读 5,195评论 0赞 1

赞1赞

赞赏

手机看全文