Uncertain Substance: The Viterbi Algorithm


When i interviewed Tom Keene in the studios of Resonance FM a couple of months ago, he told me approximately 44 seconds before the end of the programme that he was working on a speech recognition algorithm that searches radio waves for conversations about money.

The work, called Uncertain Substance, investigates the Viterbi algorithm which was devised by Andrew Viterbi in 1966 as an error-correction scheme for noisy digital communication. Its use has since been extended to many digital technologies: speech recognition, satellite, DNA analysis, video encryption, deep space, wireless communications systems, etc. Physical manifestations of this algorithm exists as microchips installed in mobile devices, enabling communications networks to permeate every conceivable space, blurring distinction between home, work and social environments.

Tom’s interest in the algorithm isn’t purely motivated by a passion for programming, his project is also looking into the social effect of its application and implementation:

Used to identify patterns and trends of human behaviour, the Viterbi plays a role in automated systems that interpret, record and report on human activity. These systems increasingly make economic decisions, govern response to crime, disaster, health and manage the everyday flow of cities. The Viterbi operates at a deep social level as it constructs new sets of social relations and radically shapes the development of our cities.

Uncertain Substance: The Viterbi Algorithm was shown recently at Forking Bits, the graduation show of the MA Interactive Media: Critical Theory and Practice at Goldsmiths in London. I was out of town that week, so i decided to make yet another interview with Tom:

0aauiau83screen.jpgHi Tom! You developed speech recognition algorithm that searches radio waves for conversations about money. How does the research of the search manifest itself? What happens? Did you test the system? Where and what were the results?

I tested two versions of the system, one as an installation in an old porters office in Goldsmiths University, the other as a mobile version built into a shopping trolley which I tested at Moving Forrest at Chelsea College of Art. The porters office version displayed two very dull looking computers one of which was a speech recognition server (SRS) built around the open source project CMUsphinx, and the other was a software defined radio server (SDRS) which was built around a hacked £10 USB TV tuner. The SRS listened to the audio output of the SDRS and if it detected speech then it would stay on that radio station in the hope that it would find a keyword from a list (Money, Credit, Debt, Thousand, Billion, Trillion etc), if it didn’t find any words within 20 seconds, then it would trigger the SDRS to find another station where it would begin the process again.

The porters office added its own narrative which I discovered while cleaning it out and getting rid of years of grime and dumped objects – it recorded a pretty depressing history – there were old letters of redundancy, a broken pair of spectacles, betting slips, a small screen marked “payroll”. I incorporated these elements in the space as a subtle way of illustrating the entanglement of algorithms into everyday lives and other media systems, where algorithmic reporting and profiling informs and influences our decision making processes, event though these outputs haven’t necessarily been planned or programmed, the technology is then exerting its own power and its that mechanism that I want to understand.

0reception.jpgCan the person monitoring the algorithm actually understand the conversation?

If by that you mean, did the system do a good job of translation? No – it’s terrible at translating radio! Speech recognition is a very tricky thing to do well and this sort of system is much better suited to recognising a few keywords spoken by a voice it has been specifically trained to listen to. Though in this instance I wasn’t particularly fussed about the quality of the speech recognition, I really enjoyed (along with the audience) watching the weird sentences that were being produced as the result of a mathematical model.

What I really wanted to achieve was for the audience to engage with the operations of the algorithm, the decisions it made and how it entangles itself with other media and social systems. To achieve this I attempted to display its inner workings as much as possible.

The SDRS displayed the radio frequency it was tuning into and you could hear the audio as it shifted between pure noise, music or speech. On a second screen the SRS displayed a rolling log of; transcribed audio, found keywords, how long it had been listening for and feedback that indicated if it was bored or couldn’t understand the conversation. A point of sale receipt printer generated a physical paper trail as it printed any texts about money as and when they were found. I also managed to rig up a CCTV screen that displayed the current radio frequency/time, which also broadcast (through an on-board speaker) “found money” which it spoke in a digital voice whenever a money conversation had been found. So this relatively simple set-up incorporated multiple media systems of: Radio, Paper, CCTV, Work life, Finance, Computer networks – each touched by the Viterbi algorithm in some way.

0aobservationSAVEmoney.jpgWasn’t the public upset by the idea that an algorithm was looking for their financial conversations?

Not at all! More amusement than anything else. The most common question was – is it going to make you rich? Which gave me an opportunity to talk about the wider issues of the project and the fact that it can be very difficult to understand the the effects of the Viterbi algorithm as it cannot be separated from complex layers of social (human and technical) layers in which it is embedded. This project has been been about building a series of contraptions as a means to reveal the effect and influence of the Viterbi, the speech recognition project has been just one of those exercises, others have included its use in Mobile technologies, and its path finding capabilities.

If i understood correctly the description of your project, many crucial governmental and economic decisions depends on the algorithm’s interpretation of human activity. One would expect an algorithm to be reliable and rational so surely, we should be reassured that our fate is in such capable ‘hands’, right?

Just to be clear, I’m not saying that the Viterbi is used by Government and that’s a worrying thing – I’m not attempting to make a value judgement here – there are many examples of algorithms being used for governmental and non governmental decision making process which have both positive and negative effect. I am just attempting to illustrate the social effect of an algorithm used in a mutlitude of systems, where the power of those systems is not held by any single political party or economic system, but is dissipated and exerted by the system itself. That power exerted by these systems have the potential to influence city planning decisions, or discipline people’s behaviours at a micro level in their day to day lives, where social effect doesn’t occur because it has been programmed that way, or that government investment has created an infrastructure that facilitates greater control of he population, but rather new social phenomena is produced in a messy, unstructured chaotic way outside of human control. In terms of these algorithms being rational, then at the level of mathematics and science they are, but at the level of their actual real world social effect, then they are most certainly not.

Thank you Tom!