When I began typing the title of this article, “text generation using recurrent n…”, the tool I’m typing on, Google Docs, began automatically completing my sentences. In this case, it accurately suggested recurrent neural networks! If you’ve ever used Gmail compose or even Google search, this wouldn’t surprise you, because the predictive text has been in vogue for a very long time, and has widespread use across industries.
In this blog post, Springboard’s machine learning mentor Raghav Bali inspires many deep learning project ideas and walks you through creating your own predictive text generation model with recurrent neural networks and transformers. Raghav is a senior data scientist at the UnitedHealth Group, where he designs and implements machine learning, AI and deep learning-based solutions for healthcare and insurance. Before United Health, he has also worked at American Express and Intel, building enterprise-level intelligent solutions.
For further reading, check out data scientist job description here and learn more about data science.
Understanding Language to Build Recurrent Neural Networks
For humans, language is an integral part of existence. We use language to communicate thoughts and ideas every single day. We instinctively understand what language is. Before getting machines to understand them, let’s first define ‘Language’ in abstract terms, is a collection of alphabets, used in specific settings/context, to create words/vocabulary, following a set of rules/grammar.
It takes us, humans, many years of learning to communicate in a language. This is simply because languages are complex and constantly evolving. Rules of spelling and grammar aren’t standard, there are multiple ways to communicate the same thing, and the same word might mean different things in different contexts. For example, take the sentence, “The bowler made a batsman duck.” What does this mean? Does it mean that the bowler made the batsman bend to avoid being hit by the ball? Or that the bowler got the batsman’s wicket at zero? Or that he converted him into a quacking duck?
Let’s see another example. “The stolen painting was found by the tree.” Did the tree find the painting? Or was the painting left near the tree?
As people, we understand communication through a complex process which takes into account several tangible and intangible aspects. Sarcasm, lingo, hashtags, etc. are processed by the human brain easily, which can be difficult for machines to replicate. But we have to try. Today, let’s see a deep learning project idea, based on which you can build complex models to improve on existing systems.
Get To Know Other Data Science Students
Jonathan Orr
Data Scientist at Carlisle & Company
Bret Marshall
Software Engineer at Growers Edge
Bryan Dickinson
Senior Marketing Analyst at REI
Understanding Recurrent Neural Networks
A recurrent neural network (RNN) is an upgraded version of the neural network, where connections between nodes are treated as sequential signals.
Take the visual below, for instance. In this case, you’ll notice that the input for h2 is not just x2, but also y1, which is the output of the previous action. We use this for natural language processing and text generation applications because typically language is a sequence. When people speak, words take meaning based on previous words, and sentences take meaning from previous sentences. With RNNs, the context follows.
How to Build a Language Model
We can build language models across many levels — word-level, phrase-level or what we are going to do today, which is corrector-level. Basically, a corrector is a set of alphabets, punctuation, etc., which helps predict the next corrector. We are doing this primarily because corrector-level language modelling will give you a finite and manageable vocabulary. You can go forth and build word-level models on the same principles as well.
We will use TensorFlow 2.0 with Keras as the high-level library. We’ll use gated recurrent units (GRUs), which are better at managing long-range dependencies and gradient problems. We are taking the book ‘Adventures of Sherlock Holmes’ from the Gutenberg library as our input dataset.
Step 1: Pre-processing
- Import the required libraries from Tensorflow.
- Set the data path to the book in Project Gutenberg. For access to all the links and references, sign up here.
- Download the book using Gutenberg’s standard API.
- Prepare text by performing basic clean up.
- Identify unique character count/vocabulary size: In this case, it is 96, because we’re doing corrector-level analysis. This would have been significantly higher if we went with a word-level language model.
- Perform character-to-integer mapping: Give every unique corrector a corresponding integer for machines to understand. You might also need to perform reverse mapping for decoding the output.
Please note that because we are performing text generation at corrector-level, we don’t have to perform activities that we typically do with NLP such as soft-word removal etc.
Step 2: Data preparation
- Create sequences with a max length of 100.
- Create batches of sequences, also of fixed size, in this case, we are using a batch size of 64.
- Run a quick shuffle.
Step 3: Prepare the model
- Using Keras’ sequential API, prepare a model with one embedding layer. You can increase the number of layers if you wish. However, it will increase the training time.
- Define your vocabulary size, embedding dimensions and RNN units.
- Set up callbacks.
- Train the dragon, in this case, it is your language model, for 64 epochs.
Step 4: Text generation
Try with a context input such as ‘Watson you are’ and see what your model predicts. In our case, it says, “in the street,” which is a meaningful and grammatically correct prediction. However, as we move further and further away from the context, predictions are losing quality and turning into gibberish. You can continue to optimise the model to improve prediction accuracy.
To get a hands-on demonstration of how we built this model, watch Raghav’s session on Youtube: Text Generation using RNNs and Transformers in NLP. He also explores decoding strategies like greedy, beam search, sampling, top-k sampling, top-p sampling/nucleus sampling, encoder-decoder architecture and more!
Since you’re here…
Curious about a career in data science? Experiment with our free data science learning path, or join our Data Science Bootcamp, where you’ll get your tuition back if you don’t land a job after graduating. We’re confident because our courses work – check out our student success stories to get inspired.