Speech Processing Demonstration

Introduction - Processing - Tutorials

Introduction

Speech is produced in continuous time by a human. To record speech on a computer, we need to select at which points in time we will measure the amplitude of the signal and determine how much computer memory we will allocate for each measurement (called a sample). For speech signals, you will discover by using Fourier analysis that most of the energy occurs at frequencies of less than 4000 Hertz (Hertz is abbreviated as Hz). 1 Hertz is 1 cycle per second (i.e., one period of a sinusoid per second) and is the same unit of measure on AM and FM radios. AM stations are given in kHz and FM stations in MHz. By choosing samples uniformly spaced at every 1/8000 seconds (or a sampling frequency of 8000 samples per second), one will capture most of the information available in the speech signal. Choosing how much memory per sample to allocate depends on the available computer resources, the quality of the speech one wants to maintain, and the resolution of the device that converts the analog signal to a digital signal (called an analog-to-digital converter). The amplitude of speech samples is often represented by 8, 12, or 16 bits.

Speech Processing

Once speech has been recorded in digital form, computer programs can be used to manipulate the digital data. For example, we can take the speech signal

Sampled Speech

and process it to make it sound like the speech is inside a roaring stadium

Processed Speech

The processing is based on a model of how sound reflects and bounces off the walls of a large room and consists of adding scaled and shifted versions of the sampled speech.

Last updated 10/02/95, Send comments to (Mailbox)ble@eecs.berkeley.edu