Language is the barrier that differentiates human beings from other species. Though, intelligent creatures such as dolphins know how to use communications sounds, only human beings can enjoy rich language intricately. From just a couple of letters, ten thousands of words can be formed and express an indefinite number of thoughts.
For an extended period, scientists have dreamed of machines that can listen and chatter just like humans. Speech recognition technology was developed several years ago, and currently exist in most PCs and search mobile devices. However, only a few number of people use it, possibly because most people don’t bother to try it out, based on an assumption that computers don’t understand human language. Speech recognition is a complex issue that have challenged computer scientist, and even, linguistic and mathematicians. Let’s take a keen look and find out how voice search work.
Whenever human being speaks, little sound packets known as “Phones” are generated from our voice, and they correspond to a group of letters in words. For example, speaking the word cats correspond to “c,” “a”, and “t” Phones. Another concept employed is Phonemes that basically refer to blocks of sounds from which all words are build from. To differentiate Phones and Phonemes, a phone is regarded as real bits of sounds, while Phonemes are ideal bits of sound stored in the minds; hence, they are sound fragments that are actually never spoken.
When human listen to a speech, the ears catch Phones flying through the air and our brain flip them back into words, thoughts, sentences, and ideas. This happens so quickly that our brain makes it look like a magic trick, thus listening becomes so easy. Computers and mobile search software manipulate both Phonemes and Phones, but real bits are analyzed and processed to be recognized as voice. It is one of the complex fields of computer science since it is an interdisciplinary of computing, mathematics, and complex linguistic.
There are four approaches a computer use to turn a spoken word into a written word during voice search. The first is simple matching, where a word is recognized in its entirety, and it is compared to similar sound stored in the memory. Pattern and feature analysis is the second; each word is broken into bits that are recognized from key features, such as vowel. The third approach is language modeling and statistical analysis where knowledge of grammar and likelihood of certain sounds or words to follow each other is used to improve accuracy and speed of voice recognition. The fourth approach employs artificial neural networks; they are like brain-computer models that can recognize patterns of sounds and words after extensive training.
Today, Google voice search app listen when a person speaks, and then it figures what one mean, and then it attempt to find what the person have asked for on the internet. This app works by linking speech recognition approaches to a complex natural language processing systems. They not only have to figure out what a person say, but what is meant and the consequence that you want to happen. Today, mobile search devices have been incorporated with the Google voice search app such that you can Google without the keyboard.
Back in 2012, voice search had taken a new turn through adopting Deep Neural Networks (DNNs); it was a core technology that model languages sound. The technology could assess sound produced by users at every instant time and significantly improve the accuracy of speech recognition. Currently, voice recognition uses much better neural network acoustic models that employ Sequence Discriminative Training Techniques and Connectionist Temporal Classification. Besides, these models form a unique extension of Recurrent Neural Networks that make them even faster and accurate even in noisy environments.