Speech Processing Demonstration
Introduction -
Processing -
Tutorials
Introduction
Speech is produced in continuous time by a human.
To record speech on a computer, we need to select at which points in
time we will measure the amplitude of the signal and determine how much
computer memory we will allocate for each measurement (called a sample).
For speech signals, you will discover by using Fourier analysis that most
of the energy occurs at frequencies of less than 4000 Hertz
(Hertz is abbreviated as Hz).
1 Hertz is 1 cycle per second (i.e., one period of a sinusoid per second)
and is the same unit of measure on AM and FM radios.
AM stations are given in kHz and FM stations in MHz.
By choosing samples uniformly spaced at every 1/8000 seconds
(or a sampling frequency of 8000 samples per second), one will capture most
of the information available in the speech signal.
Choosing how much memory per sample to allocate depends on the available
computer resources, the quality of the speech one wants to maintain, and the
resolution of the device that converts the analog signal to a digital signal
(called an analog-to-digital converter).
The amplitude of speech samples is often represented by 8, 12, or
16 bits.
Speech Processing
Once speech has been recorded in digital form, computer programs can
be used to manipulate the digital data.
For example, we can take the speech signal
Sampled
Speech
and process it to make it sound like the speech is inside a roaring stadium
Processed
Speech
The processing is based on a model of how sound reflects and bounces
off the walls of a large room and consists of adding scaled and shifted
versions of the sampled speech.
Last updated 10/02/95, Send comments to
ble@eecs.berkeley.edu