Using Deep Learning to correct spelling mistakes
In January 2017 I began the Udacity Deep Learning Foundation Nanodegree Program and was hooked from the first lecture. I’d heard the term ‘neural network’ plenty of times previously, and had a general idea of what they could accomplish, but I never had a detailed understanding of how they ‘work.’ Since completing the course I haven’t had much opportunity to tinker with the technology, but I’ve continued to contemplate its uses, particularly in the domain of information retrieval, which is where I’ve spent the last decade focusing.
Unless you’re Google, the typical technique for correcting spelling mistakes is the Levenshtein distance, or its close cousin, the Damerau–Levenshtein distance. Mr. Weiss does a good job of explaining why these do not work particularly well.
- Re-implement Mr. Weiss’ Recurrent Neural Network (RNN) using Tensorflow and achieve the same level of accuracy. (He claims 90% after 12 hours of training and 95.5% after 3.5 days of training.)
- Attempt to implement some of the areas for exploration he suggests, as well as others, to see if further improvements can be obtained.
The beginning of the code
The first part of the code, which involves downloading the billion word dataset that Google released and then setting it up for training, is predominantly lifted from Mr. Weiss. The second part of the code, which involves building the graph and training the neural network, is borrowed from the Udacity sequence-to-sequence RNN example. This Udacity example works on a small dataset of 10,000 ‘sentences’ (1- to 8-character words) and trains the network to sort the characters alphabetically. The current project contains code to handle both this large dataset as well as the small dataset, which is useful for debugging.