What’s next for NLP?
By Sean Sodha | 4 minute read | December 20, 2019
In my posts over the past few months, I’ve explained what NLP is, why it’s the backbone of AI integration in industries today, and how NLP can be put to work inside organizations. As we close out 2019 and look ahead to 2020, I want to address the future of NLP, where I see it to be headed, and how these new advancements can help organizations achieve success.
Growth and organization in data pipelines
Before textual data can serve as an input to a machine learning model, it must be refined through preprocessing and grooming. This is where a data pipeline and NLP work in harmony. The concept of a data pipeline is fairly simple to understand, and yet can be very difficult to implement within an organization. It represents the stages in which data flows through. One stage’s output can flow into one or several other stages’ inputs. An organization can take each stage in the data pipeline to groom, clean, ingest, extract, or manipulate data to serve certain purposes in the process. Several years ago, and even at times today, data pipelines were disorganized, experienced severe latency, drew from messy datasets, and were generally not well structured. Now that enterprises are cognizant of this potential hazard, preventive measures are being taken and more robust pipelines have been built to ensure that AI can be used as an effective tool to automate processes, extract key information quickly, and predict previously unknown conditions.
Given that data owners are now more aware than ever of the complexities of big, messy databases, they are looking to solidify and clean up their data pipelines to ensure they can handle most types of data. This is where NLP comes into play most effectively. When data pipelines are created with clean textual data, NLP accuracy increases significantly. With the influx of blog posts, articles, tweets, documents, and more, NLP tools and techniques are increasingly integrated inside organizations’ data pipelines to make sense of this influx of content. With improved data pipelines and mass production of content comes the proliferation of applications for NLP technologies.
Improvements in NLP capabilities
Not only is the amount of data increasing, but the accuracy and scalability of NLP models is increasing as well. Over the course of the next few years, I predict the following advancements in NLP:
- Implementation of BERT: Bidirectional Encoder Representations from Transformers (BERT) is a new, open source pre-training model. The bidirectional aspect of the BERT model allows a model to read not only left to right and ingest word by word, but also all words to be ingested at once, allowing the model to capture the full context of the sentence. BERT is simple to apply, and developers can fine tune the pre-trained models to complete a specific task within their organization for faster and more accurate results. BERT is similar to how we learn the meaning of a sentence, in that we don’t simply strive to understand each word at a time, individually; rather, we simultaneously and subconsciously read ahead to better understand the broader context.
- Summarization and inferencing: A common request I get from customers is to build a model that extracts the key information from a passage, paragraph, or document and present a summary. What’s more, users want to ask a question and have the model infer the answer. Chatbots, powered by NLP, are headed down this path.
- Sentiment & emotion detection accuracy: NLP accuracy for emotion and sentiment detection within a piece of text still needs to be improved. Oftentimes sentiment models will be incorrect when it comes to understanding things like tweets or op-eds. Further, the complexity of emotions expressed in a piece of text – like joy, anger, sadness, hesitation, confidence, and indifference – can be extremely hard to identify with a high degree of confidence. Several experts suggest using a custom NLP model to help detect these nuances.
Bringing it all together
I predict that speech to text, visual recognition, and text analytics will come together as one collective unit under the NLP umbrella. These three fields of AI are on a massive upswing and will eventually serve as a singular function. As the chatbot industry is expected to grow past $10 billion by 2026, I expect NLP to play a major role in the AI landscape. While all of these different aspects of AI are currently far from perfect, the models are improving, and it might very well be soon that we will be working alongside a cognitive, holistic AI platform.
According to a study by Tractia, the market opportunity for NLP is expected to grow to $22.3 billion by the end of 2025. This growth in NLP will be fueled by exceptional computational power, scalability, and the massive shift towards digitization.
NLP has been essential to today’s text analytics platforms, and it will continue to grow as petabytes of data are created every second. However, NLP’s outcomes will only be as good as the data pipelines built underneath to support the models for greater training, detection, summarization, and accuracy.
Have a happy new year, and talk to you soon.