《A C-LSTM Neutral Network for Text Classification》阅读笔记

将CNN和LSTM结合起来学习句子的representation，在情感分类和问题分类的任务上都取得了很好的结果。

Traditional sentence modeling uses the bag-of-words model which often suffers from the curse of dimensionality.维数灾难

过去的方法，一种使用词袋模型，但是有维数灾难；另一种使用合成的方法，例如在语义词向量上进行代数运算以产生语义句子向量；这些方法都丢失了词序信息。更近一些的方法分为sequence-based models和tree-structured models。

CNN is able to learn local response from temporal or spatial data but lacks the ability of learning sequential correlations; on the other hand, RNN is specialized for sequential modelling but unable to extract features in a parallel way.

架构图：

1. N-gram Feature Extraction through Convolution

句子是Lxd的，filter是kxd的，feature map的大小为L-k+1。对于句子中的每个词来说，都有连续的词用于filter的计算，例如j位置的，wj=[xj, xj+1, …, xj+k-1]

m是filter

有n个filters，长度都相同

(L-k+1)xn

2. Text classification

交叉熵作为损失函数

3. Padding

maxlen是训练集中最长的句子，由于卷积层需要定长的输入，因此将所有句子都padding到maxlen的长度，补足句尾。对于测试集中的句子，比maxlen短的补足，比maxlen长的则要从句尾截断至maxlen的长度。

4. 实验

filter的长度为2，3，4；两种：单卷积层，相同的filter长度；多卷积层，不同长度的filter平行执行。

《A C-LSTM Neutral Network for Text Classification》阅读笔记

1. N-gram Feature Extraction through Convolution

2. Text classification

3. Padding

4. 实验

推荐阅读更多精彩内容