This was an amazing experience, but today I pulled the plug.
Continuing from the previous experiment eliminating uncommon words, adjusting the Dropout produces an effect, but the improvement is not significant enough.
Eliminating sentences with uncommon words (i.e. those that only appear once) from the dataset does not improve the accuracy of the training, for reasons unknown.
Depending on the situation, using smaller Validation and Test datasets can leave more data for training, and reduce time spent computing accuracy, without any impact on determining accuracy.
Remove non-English text from the data set.
Some batch sizes are too big and some are too small. It’s important to find the one that is just right.
Disappointing that a fancy, modern initialization technique not only didn’t help, but made things worse