DL concept
pooling
addition, bi-interaction, and global average pooling
loss function
CrossEntropy
Quantify the difference between two probability distributions.
optimizer
warmup
read:https://www.zhihu.com/question/338066667
metrics
perplexity PPL
refer to https://blog.csdn.net/index20001/article/details/78884646
train
train from scratch
train without pre-training
some about convergence:https://www.zhihu.com/question/64966457