Home > Machine Learning Engineering > How Deep Learning Revolutionized NLP

How Deep Learning Revolutionized NLP

In this article, we will discuss recent advancements made in the field of NLP and how deep learning architectures played a crucial role in this journey.

nlp deep learning

From the rule-based systems to deep learning-powered applications, the field of Natural Language Processing (NLP) has significantly advanced over the last several years. Despite these strides, there’s still more work to do: to date, the issues of NLP are not fully solved and the amount of research in this discipline is skyrocketing. 

In this article, we will discuss recent advancements made in the field of NLP and how deep learning architectures played a crucial role in this journey.

What is natural language processing (NLP)?

In general, the term “natural language” refers to the way humans interact with each other, mainly through speech and text. NLP relates to the capability of machines or computer systems to understand this natural language which is mostly text-in and text-out. (A commonly confused term Automated Speech Recognition, or ASR, refers to the ability of the machines or computer systems to recognize speech.)

How is NLP used?

Right now, the famous application of NLP in action is virtual assistants like Amazon Alexa, Apple Siri, and Google Assist, though they use carefully crafted rule-based actions too. The list below gives some applications of NLP, although not exhaustive:

  • Search engines
  • Sentiment analysis
  • Machine translation
  • Auto-correct
  • Speech recognition
  • Text summarization 
  • Chatbots 
  • Text classification 
  • Email filtering 
  • Text generation 

What is the role of deep learning in NLP?

In the last five years, deep learning has revolutionized the field of NLP and improved the performance of different NLP-based tasks. Before deep learning, NLP primarily relied on the Bag of Words approach using models like Naive Bayes, Logistic Regression, SVM, and so on to classify text inputs.

Also, the need for language-based "magic" lists (stop-words, lemmatization, stemming, etc.) came from traditional linguistics. The main drawback of these techniques was their disregard for word order and context in a sentence. 

After exhausting the previous methods, the concept of word embedding came into play. Word embedding leads to perform semantic similarity comparisons like: "King - Man + Woman = Queen." These word embeddings include Tomas Mikolov's Word2vec, Stanford University's GloVe, AllenNLP's ELMo, and so on.

The revolution began with the introduction of Recurrent Neural Networks (RNNs). RNNs are explicitly designed to process sequential data. They are skilled at capturing the short-term dependencies but struggle with long-term memorization. To resolve this, Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) are used as part of RNNs to capture long-term dependencies.  

For RNNs, tokens are fed sequentially in the network while output at each step is merged with the next input token. This helps to create a form of memory that retains dependencies between words. Bidirectional training (left to right and right to left) is also an option with RNNs, and its implementation aids in gaining a deeper understanding of language flow. 

The LSTM has a cell memory unit, which has the ability to forget previously learned patterns as well as to add new findings as needed. These all were performing well until the concept of Attention was introduced. 

How transformers changed it all

In 2017, a research paper named “Attention is All You Need” was released by Google, which introduced the Transformer architecture. This began the next revolution in the field of NLP after RNNs. 

The transformers work by permitting parallel computations. Unlike RNNs, they don’t use recurrence; they use a mechanism called “Attention” and that allows ingesting the entire sentence at once rather than as fractions. In simple terms, the "Attention" mechanism concentrates on a few relevant things while ignoring the elements irrelevant to the task at hand. After this, many companies and research groups developed enhanced versions based on the original transformer architecture. The leader board can be found here

In 2018 and 2019, OpenAI released Generative Pre-trained Transformers: GPT and GPT2. The GPT2 has 1.5 billion parameters, and at the time it hit new records for zero-shot tasks.   

After the release of GPT in 2018, Bidirectional Encoder Representations from Transformers (BERT) was released by Google. The BERT-large has approximately 340 million parameters. 

These transformer-based models require lots of computation power for training and restricting their development among large tech companies and research institutes. Here are some examples:

In June 2020, OpenAI released GPT3, which has 175 billion parameters. This marked new state-of-the-art benchmarks. The GPT3 has a wide spectrum of applications and it is so powerful that its generated text is difficult to distinguish from that written by a human. This has immense benefits, also but poses significant risks. 

Another major area of focus in the research community is to reduce the complexity of large NLP models like transformers as they are very expensive to train. This is also a key reason behind the increasing carbon footprint of deep learning. Moreover, deploying these extremely large models in realtime is a massive challenge because of the cost and complexity associated with it.  

What does this mean for the future of NLP?

Soon, we might see new architectures that surpass transformers (eg. Performers). Until then, new development based on transformers will keep pushing the performance of the NLP system to a new level.

Apart from NLP, transformers are successful in other domains as well as image completion, music generation, and so on. Certainly, models will make the most out of transfer learning and self-supervised learning using a large amount of text data available and create new benchmarks. 

Is machine learning engineering the right career for you?

Knowing machine learning and deep learning concepts is important—but not enough to get you hired. According to hiring managers, most job seekers lack the engineering skills to perform the job. This is why more than 50% of Springboard's Machine Learning Career Track curriculum is focused on production engineering skills. In this course, you'll design a machine learning/deep learning system, build a prototype, and deploy a running application that can be accessed via API or web service. No other bootcamp does this.

Our machine learning training will teach you linear and logistical regression, anomaly detection, cleaning, and transforming data. We’ll also teach you the most in-demand ML models and algorithms you’ll need to know to succeed. For each model, you will learn how it works conceptually first, then the applied mathematics necessary to implement it, and finally learn to test and train them.

Find out if you're eligible for Springboard's Machine Learning Career Track.

Ready to learn more?

Browse our Career Tracks and find the perfect fit