Direkt zum Hauptbereich

Project Sinos: The Self Learning Leveler

This article also refers to my series of earlier articles regarding Project Sinos.

Did you ever think about how you understand language? How do you manage to recognize the language from all the noises in the environment when someone speaks to you? Almost nobody thinks about it. But it becomes essential if you want to recognize speech electronically.

Frameworks like CMU Sphinx may can take over the recognition of words and the formation of meaningful sentences. But permanently analyzing all noises of the environment brings miserable results. This is why smart speakers usually need to detect a Hot Word like "Alexa" or "Hey, Google". In case of cloud based smart speakers, they would also produce a huge amount of network traffic if sending permanently all noises to their decoding service. This shows the need for hot words.

Well, how do you do that exactly?


Hot Word Detection

The simplest part is the detection of the Hot Word itself. You just need to detect the (already decoded) input sentence if it starts with the Hot Word. But as you see, the input already needs to be decoded. If you analyze permanently without any background noise, this works.


When To Listen?

Before decoding, we need to detect when to decode anything. Decoding every filled frame you get from the microphone means that you always start a blocking task, which prevents you from further recording and sound frame receiving.. In the worst case, you can only get to the first full frame. So you need to record into a buffer until there are no "loud" frames any more. And of course you only start buffering when the frames are "loud".


Handling Background Noise

Another problem is background noise, because you never have an absolutely silent environment. Fortunately, CMU Sphinx can handle quite a bit of background noise itself. But our detection from the last paragraph, which fills the buffer, still cannot recognize if the recorded noise is background noise or the start of a voice command. And, worse: how to handle a permanent background noise?

Permanent background noise can be handled as silence - if it is not too loud. If you listen to music at room volume, this is not a big problem for CMU Sphinx. If you start speaking a voice command, the loudness will increase (non-linear) for the recording device. So you can simply add a threshold to the noise detection that need to be exceeded.

The next problem is increasing background noise. Imagine the following scenario: You are sitting in your living room. A clock on the wall tics, you breathe and move around. All this causes background noise. You tell the smart speaker to play music ... the environmental loudness increases. And the threshold level for voice detection also needs to be increased.

Okay - so the threshold increased. And now, you tell the smart speaker to stop the music again. As you can imagine, you also need to lower the threshold again.

Kommentare

Beliebte Posts aus diesem Blog

Pi And More 11 - QMC5883 Magnetic Field Sensor Class

A little aside from the analytical topics of this blog, I also was occupied with a little ubiquitous computing project. It was about machine learning with a magnetic field sensor, the QMC5883. In the Arduino module GY-271, usually the chip HMC5883 is equipped. Unfortunately, in cheap modules from china, another chip is used: the QMC5883. And, as a matter of course, the software library used for the HMC5883 does not work with the QMC version, because the I2C adress and the usage is a little bit different. Another problem to me was, that I  didn't find any proper working source codes for that little magnetic field device, and so I had to debug a source code I found for Arduino at Github  (thanks to dthain ). Unfortunately it didn't work properly at this time, and to change it for the Raspberry Pi into Python. Below you can find the "driver" module for the GY-271 with the QMC5883 chip. Sorry for the bad documentation, but at least it will work on a Raspberry Pi 3. ...

Lazarus IDE and TOracleConnection - A How-To

Free programming IDEs are a great benefit for everybody who's interested in Programming and for little but ambitious companies. One of these free IDEs is the Lazarus IDE . It's a "clone" of the Delphi IDE by Embarcadero (originally by Borland). But actually Lazarus is much more than a clone: Using the Free Pascal-Compiler , it was platform-independent and cross-compiling since it was started. I am using Lazarus very often - especially for building GUIs easily because Java is still Stone-Age when a GUI is required (though there is a couple of GUI-building tools - they all are much less performant than Delphi / Lazarus). In defiance of all benefits of Lazarus there still is one Problem. Not all Components are designed for use on a 64 bit systems. Considering that 64 bit CPUs are common in ordinary PCs since at least 2008, this is very anpleasant. One of the components which will not be available on 64 bit installations is the TOracleConnection of Lazarus' SQLDB ...

How to use TOracleConnection under Lazarus for Win64

Lazarus Programmers have had no possibility to use TOracleConnection under 64 Bit Windows and Lazarus for years. Even if you tried to use the TOracleConnection with a correctly configured Oracle 11g client, you were not able to connect to the Oracle Database. The error message was always: ORA-12154: TNS:could not resolve the connect identifier specified Today I found a simple workaround to fix this problem. It seems like the OCI.DLL from Oracle Client 11g2 is buggy. All my attempts to find identify the error ended here. I could exclude problems with the TNS systems in Oracle - or the Free Pascal file oracleconnection.pp though the error messages suggestes those problems. After investigating the function calls with Process Monitor (Procmon) I found out, that even the file TNSNAMES.ORA was found and read correctly by the Lazarus Test applictaion. So trouble with files not found or wrong Registry keys could also be eliminated. Finally I installed the Oracle Instant Client 12.1c - aft...