Speech recognition becomes “almost human”


Microsoft has achieved an important milestone in the field of speech creating a technology that recognizes words in a conversation with a level of accuracy similar to that of humans. The system developed by the team of researchers and engineers of the Microsoft group of Artificial Intelligence and Research has achieved an error rate of 5.9% similar to that of a professional transcriber.

For the Redmond company it is an historic achievement. For the first time, a computer can understand the words as a person of flesh and blood . The first studies on speech recognition technologies date back to the early ’70s, when DARPA started research in this area in the interests of national security. In subsequent years several companies have accepted the challenge and now Microsoft has beaten the competition, building a system that will be used in numerous products and services, including Xbox and Cortana .

This important result was achieved through the massive use of convolutional LSTM and neural networks (Long short-term memory). Microsoft has leveraged its Computational Network Toolkit for deep learning that the research team distributed publicly on GitHub open source license. The speed of neural network training has been increased by using a series of computers with dedicated GPU.

The “human equality” does not mean that the system recognizes each word perfectly. Researchers will in fact test the technology under real conditions, ie in the presence of noise. The ultimate goal is to achieve a system able to understand the words spoken by the people . Microsoft emphasizes, however, that the creation of a true artificial intelligence is still very far.

