{"id":14624,"date":"2021-01-24T05:54:59","date_gmt":"2021-01-24T13:54:59","guid":{"rendered":"https:\/\/www.springboard.com\/?p=14624"},"modified":"2023-07-04T20:54:24","modified_gmt":"2023-07-05T03:54:24","slug":"nlp-deep-learning","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/nlp-deep-learning\/","title":{"rendered":"How Deep Learning Revolutionized NLP"},"content":{"rendered":"\n<p><a href=\"https:\/\/www.python.org\/\" target=\"_blank\" rel=\"noopener\"><\/a>From the rule-based systems to deep learning-powered applications, the field of Natural Language Processing (NLP) has significantly advanced over the last several years. Despite these strides, there\u2019s still more work to do: to date, the issues of NLP are not fully solved and the amount of research in this discipline is skyrocketing.<\/p>\n\n\n\n<p>In this article, we will discuss recent advancements made in the field of NLP and how deep learning architectures played a crucial role in this journey.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is natural language processing (NLP)?<\/h2>\n\n\n\n<p>In general, the term \u201cnatural language\u201d refers to the way humans interact with each other, mainly through speech and text. NLP relates to the capability of machines or computer systems to understand this natural language which is mostly text-in and text-out. (A commonly confused term Automated Speech Recognition, or ASR, refers to the ability of the machines or computer systems to recognize speech.)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How is NLP used?<\/h2>\n\n\n\n<p>Right now, the famous application of NLP in action is virtual assistants like Amazon Alexa, Apple Siri, and Google Assist, though they use carefully crafted rule-based actions too. The list below gives some applications of NLP, although not exhaustive:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Search engines<\/li>\n\n\n\n<li>Sentiment analysis<\/li>\n\n\n\n<li>Machine translation<\/li>\n\n\n\n<li>Auto-correct<\/li>\n\n\n\n<li>Speech recognition<\/li>\n\n\n\n<li>Text summarization<\/li>\n\n\n\n<li>Chatbots<\/li>\n\n\n\n<li>Text classification<\/li>\n\n\n\n<li>Email filtering<\/li>\n\n\n\n<li>Text generation<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What is the role of deep learning in NLP?<\/h2>\n\n\n\n<p>In the last five years, deep learning has revolutionized the field of NLP and improved the performance of different NLP-based tasks. Before deep learning, NLP primarily relied on the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bag-of-words_model\" target=\"_blank\" rel=\"noreferrer noopener\">Bag of Words<\/a> approach using models like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Naive_Bayes_classifier\" target=\"_blank\" rel=\"noreferrer noopener\">Naive Bayes<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_regression\" target=\"_blank\" rel=\"noreferrer noopener\">Logistic Regression<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Support_vector_machine\" target=\"_blank\" rel=\"noreferrer noopener\">SVM<\/a>, and so on to classify text inputs.<\/p>\n\n\n\n<p>Also, the need for language-based &#8220;magic&#8221; lists (stop-words, lemmatization, stemming, etc.) came from traditional linguistics. The main drawback of these techniques was their disregard for word order and context in a sentence.<\/p>\n\n\n\n<p>After exhausting the previous methods, the concept of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Word_embedding\" target=\"_blank\" rel=\"noreferrer noopener\">word embedding<\/a> came into play. Word embedding leads to perform semantic similarity comparisons like: &#8220;King &#8211; Man + Woman = Queen.&#8221; These word embeddings include Tomas Mikolov&#8217;s <a href=\"https:\/\/en.wikipedia.org\/wiki\/Word2vec\" target=\"_blank\" rel=\"noreferrer noopener\">Word2vec<\/a>, Stanford University&#8217;s <a href=\"https:\/\/en.wikipedia.org\/wiki\/GloVe_(machine_learning)\" target=\"_blank\" rel=\"noreferrer noopener\">GloVe<\/a>, AllenNLP&#8217;s <a href=\"https:\/\/en.wikipedia.org\/wiki\/ELMo\" target=\"_blank\" rel=\"noreferrer noopener\">ELMo<\/a>, and so on.<\/p>\n\n\n\n<p>The revolution began with the introduction of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Recurrent_neural_network\" target=\"_blank\" rel=\"noreferrer noopener\">Recurrent Neural Networks (RNNs)<\/a>. RNNs are explicitly designed to process sequential data. They are skilled at capturing the short-term dependencies but struggle with long-term memorization. To resolve this, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Long_short-term_memory\" target=\"_blank\" rel=\"noreferrer noopener\">Long-Short Term Memory (LSTM)<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gated_recurrent_unit\" target=\"_blank\" rel=\"noreferrer noopener\">Gated Recurrent Unit (GRU)<\/a> are used as part of RNNs to capture long-term dependencies.<\/p>\n\n\n\n<p>For RNNs, tokens are fed sequentially in the network while output at each step is merged with the next input token. This helps to create a form of memory that retains dependencies between words. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bidirectional_recurrent_neural_networks\" target=\"_blank\" rel=\"noreferrer noopener\">Bidirectional training<\/a> (left to right and right to left) is also an option with RNNs, and its implementation aids in gaining a deeper understanding of language flow.<\/p>\n\n\n\n<p>The LSTM has a cell memory unit, which has the ability to forget previously learned patterns as well as to add new findings as needed. These all were performing well until the concept of <a href=\"https:\/\/arxiv.org\/pdf\/1902.02181.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Attention<\/a> was introduced.<\/p>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/esme-gaisford\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203193\/Student%20Success\/Esme_Gaisford_125x125.png\" alt=\"Esme Gaisford\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Esme Gaisford<\/p><p class=\"text-muted lh-1\">Senior Quantitative Data Analyst at Pandora<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/esme-gaisford\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/george-mendoza\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1635445773\/Student%20Success\/George_Mendoza_375x375.png\" alt=\"George Mendoza\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">George Mendoza<\/p><p class=\"text-muted lh-1\">Lead Solutions Manager at Hypergiant<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/george-mendoza\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/nick-lenczewski\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1667235351\/Student%20Success\/Nick_Lenczewski.jpg\" alt=\"Nick Lenczewski\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Nick Lenczewski<\/p><p class=\"text-muted lh-1\">Data Scientist at Ovative Group<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/nick-lenczewski\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">How transformers changed it all<\/h2>\n\n\n\n<p>In 2017, a research paper named <a href=\"https:\/\/papers.nips.cc\/paper\/7181-attention-is-all-you-need.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cAttention is All You Need\u201d<\/a> was released by Google, which introduced the <a href=\"http:\/\/jalammar.github.io\/illustrated-transformer\/\" target=\"_blank\" rel=\"noreferrer noopener\">Transformer<\/a> architecture. This began the next revolution in the field of NLP after RNNs.<\/p>\n\n\n\n<p>The transformers work by permitting parallel computations. Unlike RNNs, they don\u2019t use recurrence; they use a mechanism called <a href=\"https:\/\/arxiv.org\/pdf\/1902.02181.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cAttention\u201d<\/a> and that allows ingesting the entire sentence at once rather than as fractions. In simple terms, the &#8220;Attention&#8221; mechanism concentrates on a few relevant things while ignoring the elements irrelevant to the task at hand. After this, many companies and research groups developed enhanced versions based on the original transformer architecture. The leader board can be found <a href=\"https:\/\/gluebenchmark.com\/leaderboard\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<p>In 2018 and 2019, OpenAI released Generative Pre-trained Transformers: <a href=\"https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">GPT<\/a> and <a href=\"https:\/\/d4mucfpksywv.cloudfront.net\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">GPT2<\/a>. The GPT2 has 1.5 billion parameters, and at the time it hit new records for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Zero-shot_learning#:~:text=Zero%2Dshot%20learning%20(ZSL),language%20processing%20and%20machine%20perception.\" target=\"_blank\" rel=\"noreferrer noopener\">zero-shot<\/a> tasks.<\/p>\n\n\n\n<p>After the release of GPT in 2018, <a href=\"https:\/\/arxiv.org\/abs\/1810.04805\" target=\"_blank\" rel=\"noreferrer noopener\">Bidirectional Encoder Representations from Transformers (BERT)<\/a> was released by Google. The BERT-large has approximately 340 million parameters.<\/p>\n\n\n\n<p>These transformer-based models require lots of computation power for training and restricting their development among large tech companies and research institutes. Here are some examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/arxiv.org\/abs\/1908.04577\" target=\"_blank\" rel=\"noreferrer noopener\">StructBERT<\/a> (Alibaba)<\/li>\n\n\n\n<li><a href=\"https:\/\/arxiv.org\/abs\/1906.08237\" target=\"_blank\" rel=\"noreferrer noopener\">XLNet<\/a> (Google)<\/li>\n\n\n\n<li><a href=\"https:\/\/arxiv.org\/abs\/1909.11942\" target=\"_blank\" rel=\"noreferrer noopener\">ALBERT<\/a> (Google)<\/li>\n\n\n\n<li><a href=\"https:\/\/arxiv.org\/abs\/1907.11692\" target=\"_blank\" rel=\"noreferrer noopener\">RoBERTa<\/a> (Facebook AI)<\/li>\n<\/ul>\n\n\n\n<p>In June 2020, OpenAI released <a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noreferrer noopener\">GPT3<\/a>, which has 175 billion parameters. This marked new state-of-the-art benchmarks. The GPT3 has a wide spectrum of applications and it is so powerful that its generated text is difficult to distinguish from that written by a human. This has immense benefits, also but poses significant risks.<\/p>\n\n\n\n<p>Another major area of focus in the research community is to reduce the complexity of large NLP models like transformers as they are very expensive to train. This is also a key reason behind the increasing <a href=\"https:\/\/www.forbes.com\/sites\/robtoews\/2020\/06\/17\/deep-learnings-climate-change-problem\/#35630cef6b43\" target=\"_blank\" rel=\"noreferrer noopener\">carbon footprint<\/a> of deep learning. Moreover, deploying these extremely large models in realtime is a massive challenge because of the cost and complexity associated with it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What does this mean for the future of NLP?<\/h2>\n\n\n\n<p>With the revolution of Machine Learning, Deep Learning and <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" rel=\"noreferrer noopener\">Data Science<\/a> soon, we might see new architectures that surpass transformers (eg. <a href=\"https:\/\/ai.googleblog.com\/2020\/10\/rethinking-attention-with-performers.html\" target=\"_blank\" rel=\"noreferrer noopener\">Performers<\/a>). Until then, new development based on transformers and conducted by Deep Learning Engineers, <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" target=\"_blank\" data-type=\"post\" data-id=\"24427\" rel=\"noreferrer noopener\">Data Scientists<\/a> and ML Engineers will keep pushing the performance of the NLP system to a new level.<\/p>\n\n\n\n<p>Apart from NLP, transformers are successful in other domains as well as <a href=\"https:\/\/openai.com\/blog\/image-gpt\/\" target=\"_blank\" rel=\"noreferrer noopener\">image completion<\/a>, <a href=\"https:\/\/magenta.tensorflow.org\/music-transformer\" target=\"_blank\" rel=\"noreferrer noopener\">music generation<\/a>, and so on. Certainly, models will make the most out of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Transfer_learning#:~:text=Transfer%20learning%20(TL)%20is%20a,when%20trying%20to%20recognize%20trucks.\" target=\"_blank\" rel=\"noreferrer noopener\">transfer learning<\/a> and <a href=\"https:\/\/lilianweng.github.io\/lil-log\/2019\/11\/10\/self-supervised-learning.html\" target=\"_blank\" rel=\"noreferrer noopener\">self-supervised learning<\/a> using a large amount of text data available and create new benchmarks.<\/p>\n\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<br><\/strong>Curious about a career in data science? Experiment with our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/resources\/guides\/data-science-process\/\" target=\"_blank\">free data science learning path<\/a>, or join our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\">Data Science Bootcamp<\/a>, where you\u2019ll get your tuition back if you don&#8217;t land a job after graduating. We\u2019re confident because our courses work \u2013 check out our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/success\/\" target=\"_blank\">student success stories<\/a> to get inspired.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From the rule-based systems to deep learning-powered applications, the field of Natural Language Processing (NLP) has significantly advanced over the last several years. Despite these strides, there\u2019s still more work to do: to date, the issues of NLP are not fully solved and the amount of research in this discipline is skyrocketing. In this article, [&hellip;]<\/p>\n","protected":false},"author":100,"featured_media":18993,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[],"class_list":{"0":"post-14624","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/14624"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/100"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=14624"}],"version-history":[{"count":4,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/14624\/revisions"}],"predecessor-version":[{"id":47204,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/14624\/revisions\/47204"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/18993"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=14624"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=14624"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=14624"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=14624"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}