Introduction
  Introduction
Initial Proposal
Project Description

Background      Information
  Psychoacoustic Model
Filter Banks

Project Research
  Research Findings
List of MATLAB Code
Simulations

Further Work
  Extensions to Research
Wavelets

References

About Us

Background on the Psychoacoustic Model

The psychoacoustic model is based on many studies of human perception. These studies have shown that the average human does not hear all frequencies the same. Effects due to different sounds in the environment and limitations of the human sensory system lead to facts that can be used to cut out unnecessary data in an audio signal.

The two main properties of the human auditory system that make up the psychoacoustic model are:

Each provides a way of determining which portions of a signal are inaudible and indiscernible to the average human, and can thus be removed from a signal.

Absolute Threshold of Hearing

Humans can hear frequencies in the range from 20 Hz to 20,000 Hz. However, this does not mean that all frequencies are heard in the same way. One could make the assumption that a human would hear frequencies that make up speech better than others; this is a good guess. Furthermore, one could also hypothesize that hearing a tone becomes more difficult as its frequency nears either of the extremes. Again, this is true.

One other observation forms the basis for modeling. Because humans hear lower frequencies, like those making up speech, more than others, like high frequencies around 20 kHz, the ear probably has better capability in detecting differences in pitch at lower frequencies than at high ones. This, too, is true. For example, a human has an easier time telling the difference between 500 Hz and 600 Hz than he does determining whether something is 17,000 Hz or 18,000 Hz. After many studies, scientists found that the frequency range from 20 Hz to 20,000 Hz can be broken up into critical bandwidths, which are non-uniform, non-linear, and dependent on the heard sound. Signals within one critical bandwidth are hard to separate for a human observer.

A more uniform measure of frequency based on critical bandwidths is the Bark. From the earlier discussed observations, one would expect a Bark bandwidth to be smaller at low frequencies (in Hz) and larger at high ones. Indeed, this is the case.

Hz vs. Bark Frequencies

The Bark frequency scale can be approximated by the following equation:

barks = 13*arctan(0.00076*Hz) + 3.5*arctan((f/7500)^2)

To determine the effect of frequency on hearing ability, scientists played a sinusoidal tone at a very low power. The power was slowly raised until the subject could hear the tone. This level was the threshold at which the tone could be heard. The process was repeated for many frequencies in the human auditory range and with many subjects. As a result, the following plot was obtained.

ATH in Hz
ATH in Bark

This experimental data can be modeled by the following equation, where f is frequency in Hertz:

ATH(f) = 3.64 * (f/1000)^-0.8 - 6.5e^(-0.6*((f/1000) - 3.3)^2) + 10^-3*(f/1000)^4 (dB SPL)

Thus, we can make the following jump for the purposes of compression. If a signal has any frequency components with power levels that fall below the absolute threshold of hearing, then these components can be discarded, as the average listener will be unable to hear those frequencies of the signal anyway.

Auditory Masking

Humans do not have the ability to hear minute differences in frequency. For example, it is very difficult to discern a 1,000 Hz signal from one that is 1,001 Hz. This becomes even more difficult if the two signals are playing at the same time. Furthermore, the 1,000 Hz signal would also affect a human's ability to hear a signal that is 1,010 Hz, or 1,100 Hz, or 990 Hz.

This concept is known as masking. If the 1,000 Hz signal is strong, it will mask signals at nearby frequencies, making them inaudible to the listener. For a masked signal to be heard, its power will need to be increased to a level greater than that of a threshold that is determined by the frequency of the masker tone and its strength.

picture of mask

It turns out that noise can be a masker as well. If noise is strong enough, it can mask a tone that would be clear otherwise. For example, a jet engine, which is very noisy, can drown out music easily.

In a compression algorithm, therefore, one must determine:

If any frequency components around these maskers fall below the masking threshold, they can be discarded.

Tone Maskers

Determining whether a frequency component is a tone requires knowing whether it has been held constant for a period of time, as well as whether it is a sharp peak in the frequency spectrum, which indicates that it is above the ambient noise of the signal.

For the purposes of this project, only the second criterion is considered. Determining whether a certain frequency is a tone (masker) can be done with the following definition:

A frequency f (with FFT index k) is a tone if its power P[k] is:

  1. greater than P[k-1] and P[k+1], i.e., it is a local maxima
  2. 7 dB greater than the other frequencies in its neighborhood, where the neighborhood is dependent on f:
    • If 0.17 Hz < f < 5.5 kHz, the neighborhood is [k-2…k+2].
    • If 5.5 kHz =< f < 11 kHz, the neighborhood is [k-3…k+3].
    • If 11 kHz =< f < 20 kHz, the neighborhood is [k-6…k+6].

Noise Maskers

If a signal is not a tone, it must be noise. Thus, one can take all frequency components that are not part of a tone's neighborhood and treat them like noise. Combining such components into maskers, though, takes a little more thought.

Since humans have difficulty discerning signals within a critical band, the noise found within each of the bands can be combined to form one mask. Thus, the idea is to take all frequency components within a critical band that do not fit within tone neighborhoods, add them together, and place them at the geometric mean location within the critical band. Repeat this for all critical bands.

Masking Effect

The maskers which have been determined affect not only the frequencies within a critical band, but also in surrounding bands. Studies show that the spreading of this masking has an approximate slope of +25 dB/Bark before and -10 dB/Bark after the masker.

The spreading can be described as a function that depends on the maskee location i, the masker location j, the power spectrum Ptm at j, and the difference between the masker and maskee locations in Barks (deltaz=z(i)-z(j)):

SF(i,j) = 17deltaz - 0.4Ptm(j)+11 -3 <= deltaz < -1
  (0.4Ptm(j)+6)deltaz -1 <= deltaz < 0
  -17deltaz 0 <= deltaz < 1
  (0.15Ptm(j)-17)deltaz - 0.15Ptm(j) 1 <= deltaz < 8

There is a slight difference in the resulting mask that depends on whether the mask is a tone or noise. As a result, the masks can be modeled by the following equations, with the same variables as described above:

For tones: Ttm(i,j) = Ptm(j) - 0.275z(j) + SF(i,j) - 6.025 (dB SPL)
For noise: Tnm(i,j) = Pnm(j) - 0.175z(j) + SF(i,j) - 2.025 (dB SPL)

The following are plots of various levels of tone and noise maskers.

various tone maskers
various noise maskers

The final plot compares a tone and noise masker at the same frequency and of the same power.

mask comparison

Naturally, if there are multiple noise and tone maskers, the overall effect is a little harder to determine. In this project, the assumption is made that the effects are power additive. This is a reasonable assumption to make, but note that there is a definitely an interplay that can occur between maskers that would lower or increase thresholds.

[Alex Chen]   [Nader Shehad]   [Aamir Virani]   [Erik Welsh]