How Deep Learning Revolutionized NLP

Sakshi GuptaSakshi Gupta | 4 minute read | January 24, 2021
How Deep Learning Revolutionized NLP

In this article

From the rule-based systems to deep learning-powered applications, the field of Natural Language Processing (NLP) has significantly advanced over the last several years. Despite these strides, there’s still more work to do: to date, the issues of NLP are not fully solved and the amount of research in this discipline is skyrocketing.

In this article, we will discuss recent advancements made in the field of NLP and how deep learning architectures played a crucial role in this journey.

What is natural language processing (NLP)?

In general, the term “natural language” refers to the way humans interact with each other, mainly through speech and text. NLP relates to the capability of machines or computer systems to understand this natural language which is mostly text-in and text-out. (A commonly confused term Automated Speech Recognition, or ASR, refers to the ability of the machines or computer systems to recognize speech.)

How is NLP used?

Right now, the famous application of NLP in action is virtual assistants like Amazon Alexa, Apple Siri, and Google Assist, though they use carefully crafted rule-based actions too. The list below gives some applications of NLP, although not exhaustive:

  • Search engines
  • Sentiment analysis
  • Machine translation
  • Auto-correct
  • Speech recognition
  • Text summarization
  • Chatbots
  • Text classification
  • Email filtering
  • Text generation

What is the role of deep learning in NLP?

In the last five years, deep learning has revolutionized the field of NLP and improved the performance of different NLP-based tasks. Before deep learning, NLP primarily relied on the Bag of Words approach using models like Naive Bayes, Logistic Regression, SVM, and so on to classify text inputs.

Also, the need for language-based “magic” lists (stop-words, lemmatization, stemming, etc.) came from traditional linguistics. The main drawback of these techniques was their disregard for word order and context in a sentence.

After exhausting the previous methods, the concept of word embedding came into play. Word embedding leads to perform semantic similarity comparisons like: “King – Man + Woman = Queen.” These word embeddings include Tomas Mikolov’s Word2vec, Stanford University’s GloVe, AllenNLP’s ELMo, and so on.

The revolution began with the introduction of Recurrent Neural Networks (RNNs). RNNs are explicitly designed to process sequential data. They are skilled at capturing the short-term dependencies but struggle with long-term memorization. To resolve this, Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) are used as part of RNNs to capture long-term dependencies.

For RNNs, tokens are fed sequentially in the network while output at each step is merged with the next input token. This helps to create a form of memory that retains dependencies between words. Bidirectional training (left to right and right to left) is also an option with RNNs, and its implementation aids in gaining a deeper understanding of language flow.

The LSTM has a cell memory unit, which has the ability to forget previously learned patterns as well as to add new findings as needed. These all were performing well until the concept of Attention was introduced.

Get To Know Other Data Science Students

Abby Morgan

Abby Morgan

Data Scientist at NPD Group

Read Story

George Mendoza

George Mendoza

Lead Solutions Manager at Hypergiant

Read Story

Hastings Reeves

Hastings Reeves

Business Intelligence Analyst at Velocity Global

Read Story

How transformers changed it all

In 2017, a research paper named “Attention is All You Need” was released by Google, which introduced the Transformer architecture. This began the next revolution in the field of NLP after RNNs.

The transformers work by permitting parallel computations. Unlike RNNs, they don’t use recurrence; they use a mechanism called “Attention” and that allows ingesting the entire sentence at once rather than as fractions. In simple terms, the “Attention” mechanism concentrates on a few relevant things while ignoring the elements irrelevant to the task at hand. After this, many companies and research groups developed enhanced versions based on the original transformer architecture. The leader board can be found here.

In 2018 and 2019, OpenAI released Generative Pre-trained Transformers: GPT and GPT2. The GPT2 has 1.5 billion parameters, and at the time it hit new records for zero-shot tasks.

After the release of GPT in 2018, Bidirectional Encoder Representations from Transformers (BERT) was released by Google. The BERT-large has approximately 340 million parameters.

These transformer-based models require lots of computation power for training and restricting their development among large tech companies and research institutes. Here are some examples:

In June 2020, OpenAI released GPT3, which has 175 billion parameters. This marked new state-of-the-art benchmarks. The GPT3 has a wide spectrum of applications and it is so powerful that its generated text is difficult to distinguish from that written by a human. This has immense benefits, also but poses significant risks.

Another major area of focus in the research community is to reduce the complexity of large NLP models like transformers as they are very expensive to train. This is also a key reason behind the increasing carbon footprint of deep learning. Moreover, deploying these extremely large models in realtime is a massive challenge because of the cost and complexity associated with it.

What does this mean for the future of NLP?

With the revolution of Machine Learning, Deep Learning and Data Science soon, we might see new architectures that surpass transformers (eg. Performers). Until then, new development based on transformers and conducted by Deep Learning Engineers, Data Scientists and ML Engineers will keep pushing the performance of the NLP system to a new level.

Apart from NLP, transformers are successful in other domains as well as image completion, music generation, and so on. Certainly, models will make the most out of transfer learning and self-supervised learning using a large amount of text data available and create new benchmarks.

Since you’re here…
Curious about a career in data science? Experiment with our free data science learning path, or join our Data Science Bootcamp, where you’ll only pay tuition after getting a job in the field. We’re confident because our courses work – check out our student success stories to get inspired.

Sakshi Gupta

About Sakshi Gupta

Sakshi is a Senior Associate Editor at Springboard. She is a technology enthusiast who loves to read and write about emerging tech. She is a content marketer and has experience working in the Indian and US markets.