An Actor-Critic Algorithm for Sequence Prediction

Recurrent neural networks


RNNs for sequence prediction

In our models, the sequence of vectors is produced by either a bidirectional RNN (Schuster and Paliwal, 1997) or a convolutional encoder (Rush et al., 2015).








3 Actor-Critic for Sequence Prediction

We note that this way of re-writing the gradient of the expected reward is known in RL under the names policy gradient theorem (Sutton et al., 1999) and stochastic actor-critic (Sutton, 1984).
我们注意到,重写预期回报的梯度这样的RL是已知的名字政策梯度定理下(萨顿等,1999)和随机演员评论家(萨顿,1984)。




Training the critic


Applying deep RL techniques

Attempts to remove the target network by propagating the gradient through qt resulted in a lower square error (Qˆ(ˆyt ; Yˆ 1...T ) − qt) 2 , but the resulting Qˆ values proved very unreliable as training signals for the actor

采样 5page
To compensate for this, we sample predictions from a delayed actor, whose weights are slowly updated to follow the actor that is actually trained. This is inspired by (Lillicrap et al., 2015), where a delayer actor is used for a similar purpose。

有关target critic network解释

CONTINUOUS CONTROL WITH DEEP REINFORCEMENT 1509.02971.pdf
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 你若爱,生活哪里都可爱。你若恨,生活哪里都可恨。你若感恩,处处可感恩。你若成长,事事可成长。既然没有净土,不如静心...
    小小黛西阅读 1,228评论 0 0
  • 昨天共享单车第三名的小蓝单车宣布倒闭。可以这么说,在过去的一年里不?我正好一年里,小蓝单车发展势头非常迅猛。他从一...
    天之巅海无涯阅读 3,034评论 0 0
  • 今日工作: 今天因要出差去外地,5点钟就起床了,今天的工作任务是去供货公司对账和业务拓展,从早晨5点半钟开始出发到...
    周秀峰阅读 715评论 0 2