The HTML5 WebSpeech API allows us to incorporate speech recognition and synthesis into our web application. Currently only fully supported by Chrome (Desktop and Android) and in Firefox behind a flag, it actually works really well. Try it out:
Be it voice recognition or written text (like messages for a chat bot) though, once we have a sentence as a string, how do we get our applications to make sense of it? This is where Natural Language Processing comes in:
Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages [wiki]
One solution that comes to mind is Regular Expressions but with something as fluid as spoken language they are brittle and quickly get out of hand. How do we account for slightly different sentences? Different languages? Different word forms? We could also use third party services like Watson, Wit.ai or API.ai which somehow make sense of the sentences we send to their APIs.
What we need for this article can be a little more basic than that though. BrainJS implements a simple feed-forward neural network with an easy to use API. You can find the unmaintained original version at harthur/brain and a more active fork at brainjs/brain.js.
The trick of (supervised) machine learning is to turn all our training data into an even length array of numbers between 0 and 1 (called a “feature”) and assigning a label to it. The following example shows a Brain.js neural network trained on RGB colours to determine if the best contrast colour is light or dark:
The result might be slightly different every time you run this code example but it will always gravitate around a
light colour with a very high probability (around 96-99%).
The idea of natural language classification is that, given a labeled set of training sentences, we want to get the most likely label for a new (unknown) sentence. While the labels can be anything, common examples are sentiment analysis (is this a positive or negative statement?), topic classification (does this article talk about technology or cooking?) or determining actions (e.g. look up the weather or open a website).
To get our sentences into a machine learning friendly format (an even length array of numbers between 0 and 1) we have to go through six steps. Each step has a Codepen showing a live example once you run it. You can either type a sentence and confirm by hitting enter or speak by clicking the microphone.
The first step is to split the sentence into an array of its individual words. This is where Regular Expressions are perfect. A sentence like “Hello world, how are you” will turn into
[ 'hello', 'world', 'how', 'are', 'you']
Once tokenized, the next step is to create the word stem, the base form of the word, for each token. There are various stemming algorithms for different languages. The example below uses the Porter stemming algorithm provided by NodeNatural.
Another common step in natural language processing at this point is to also remove so called “Stop Words”. Stop words are commonly used words that often don’t matter when classifying or searching text like “is”, “the” etc.
Sometimes this is not what we want though. For example, the sentence “Are you there?” with stop words removed will result in an empty array. The beauty of Neural Networks is that they are much more flexible in learning dynamically what parts of the input data to pay attention to so in our case we don’t have to remove stop words.
Now that we have the individual word stems we look at all our input sentences and combine all tokens into a single list and remove any duplicates. Add a new sentence to see how the combined token list is changing:
Remember how our goal was to turn our input data into a even length array of numbers between 0 and 1? The combined token list is the key here. For every input sentence we can now go through the combined token list and map it to either return a 1 if the token exists in the sentence or a 0 if it doesn’t.
Now that we have a feature for each sentence we can assign a label to it and train the neural network.
In order to classify a new (unknown) sentence we featurize it through the same process and use the resulting features as the classification input which will return the probabilities for each label.
Even with just two or three training sentences we already get some interesting results but the real power of machine learning comes with much larger datasets. Some interesting open datasets can be found at caesar0301/awesome-public-datasets and we will hopefully see more being shared in the future.