(6-1) Fusion of sight and
audition
G.
Dodd and Y. Sakurai: "Localization of sound\A New Zealand Revelationh, NZ
Acoustics, vol16,no4, p28|33
Contradiction on the law of the first wave front
1) Introduction
When a movie was projected on a large
screen in front, and stereophonic reproduction was given from two loudspeakers
in behind, the localization of the sound was happened at the screen. All of 7
or 8 people who remained after the movie gathering answered that they heard the
sound from the screen as well. It contradicts the first wave front law (the Haas
effect).
This phenomenon did not happen, when slides
were projected on the screen and sound was given from the same place as the
movie was played. The sound was heard from the loudspeakers behind.
On a different day, a movie was played on a
smaller screen of the half size. It was recognized that sound was heard from
the rear speakers at first. As time passed, though, the sound impression
gradually moved to forward. However, it never came from the screen to the last
minute.
Interesting enough is that even after the
screen was turned off, the sound was heard from forward. It is thought that it
was caused by learning or plasticity.
In order to explain this phenomenon more,
the introduction of the mutual reaction of the sight and the audition must be needed.
Generally, could we tend to be attracted to the sight more strongly? Can we say
that the sight is predominant? How each other is different at information
gathering?
We will have to consider at the sound
evaluation of a concert hall if the sight influences the localization of sound
in this manner. That is, we can not neglect the sight design on the stage
related to the acoustic design.
In addition, it will be involved when the
reflection in a concert hall is evaluated. When we discussed on the acoustical
renovation at Tangle Wood Music Shed, Harold Marshall said that spaciousness
can be obtained even if lateral reflection is given from the rear sides, though
the reason missed being heard.
2) Time and
spatial window
It is necessary to give a window to take
out necessary information of the object on the sight or the audition which
expands or continues. One is a time window to obtain the information of a certain
time interval and another is a spatial window where a certain necessary range
is chosen in the area. It can be said that the weight of the audition for a spatial
window is given in front being related to the head related transfer function1).
However, it is necessary to research in details of the discussion.
There are a time window and a spatial
window in either system as a strong will to get information in this manner.
3) Fusion of
sight and audition
The time window may be short for a still
picture because information can be confirmed soon. There information magnitude
or intensity is small. The time window is long, and the magnitude of
information is large for a moving image.
This appearance changes depending on the
dimension of an object, the speed of change, the kind of information and etc. It
is necessary to discuss being related to eye-movement.
When a screen is large, eye-movement
becomes great and rapid, and a large effort is necessary for the information
gathering of the sight. Then, the effort is concentrated only on the sight and
the audition processing is forgotten.
Is it possible to have the following
hypothesis?
Hypothesis;
We have the attitude to combine the two
different sensory inputs into one information, if they are of the same source.
These two systems are not good at simultaneous perception for strong
stimulation above a certain level. And, it follows to the sense with the
stronger stimulation.
However, intensity or magnitude on each
system is within certain strength, each response becomes independent.
Examples: It was the case when sight
stimulation was stronger at the movie. The specification of a view is difficult
when a lot of machines produce big noises at the same time. The sight of performers
is not important when one is deep in listening to a wonderful music. When we
hear noise of certain level, we can not concentrate our mind. When we work with
music we lose efficiency. They are cases when audition is occupied stronger
When a movie screen is small, the
eye-movement is small, and each can be separated. However, though the sound is
recognized coming behind at first, the sound image moved forward as time passed.
And it was never heard from the screen. It is said that the sound image existed
in front even right after the screen disappeared. This experience is different
from at the still picture. It could be explained as learning or prasticity.
Now, we have to discuss on the relationship of the stimulation intensity of
both sensation systems. The intensity of each sensation would be multivariable;
change in time, range of the area with information, easiness of information
acquisition, brightness of a screen, and receiving sound level, etc.
When a still screen cuts to appear
suddenly on a large movie screen, do we hear the sound from the rear? It seems
not to be so. As even at the half screen the sound remained in front after the
projection disappeared, we would keep hearing from the screen. This might be a running
or shifting learning effect for the sight as mentioned previously.
Then, is this learnt attitude possible to be
shaken off in consciousness, and to hear the sound to be recognized coming from
the rear? Why it cannot be done? Then, wrong recognition might be happened at
the place where a sight coexisted with audition in this way.
4) Supplemental
discussion
Though the received level to each ear on
the first wave front is slightly different, stereo effect occurs with their
same levels, and the sight induces it to the middle might be a possible
explanation. However, it is not enough to explain the difference between a
still and a moving screen. There the strength of information should be
discussed.
When a singer has a wireless microphone and the sound is generated
from the loudspeakers on both sides, does the sound and image attach to the singer's
movement? How about a movie is on the screen? It seems that the stereo effect
is caused at this time, and the sound and image follows easily. Though, the discussion
here is the case when the loudspeakers are placed in the rear.
The diffraction wave chiefly brings low
frequency range sound, and reflection wave composes high frequency range sound
which gives the directivity from the front. This is only an explanation from
the acoustical side and does not explain the different impression of a moving
and still screen.
5) Experiments in
the future
Two dummy loud-speakers are displayed at
each side at the screen. Stereo sets of loud-speakers are distributed along the
bottom corners of each side wall, and ask them where sound is coming from
showing a movie. Eye-movement measurement should be done together.
Sight; still or moving, large and small,
change of speed
Sound source; stereo or monaural,
Optics; brightness of a screen
It is necessary to have multivariable
analysis with above factors at the preparation. The introduction of the correlation
function between two sensory systems would arise.
In order to concentrate the discussion, a movie of only one language
should be projected.
Do the early reflection and or the sound diffusion influence? If so,
spaciousness should be discussed on the subject.
When we close our eyes at the movie where does the sound come from?
Discussions
How will it be different when sound is for
the screen and when it is not related to the screen. How about when the sound
is music? Does our mind move to the screen especially even when the music is
not related to it?
When we discuss on learning effect, we
have to talk about memory. However, the phenomena do not look to happen at the
central but at peripheral. Namely it is the process before the cerebrum to
understand. It happened as well when the subtitle was too quick and hard for me
to understand.
References
1) My homepage at Chapter 6 (3-1)
Acoustical evaluation of an auditorium
Peep into visual sensory system
Its nature and behavior
When a movie stops and we see a stopped
frame, it looses clear resolution. Such discrete frames are connected with our
spectrum but actually what is happening in our visual system. There they must
have character to connect them up. The visual system also takes
auto-correlation of a signal and leaves the remaining effect that was measured
at (5-1) of this section.
It also estimates the next from a
moving object. It has an attitude to connect in time.
When we see an object with naked eyes it
looks natural. But if we see a pair of parallax pictures through a stereoscopic
viewer, the object looks like painting on a sheet of paper, no depth. Namely,
the pair is fixed and no information exchange in-between occurs. The former
gets natural 3D with slight eye movements.
The system does not want to have
discontinuity in space or is impossible to do so. Is it imagined with the cross
correlation between both eyes? It looses some information because of the
process and needs to supply it with learning and experience.
It has an attitude to connect spatial
discontinuity with the mutual talk of both eyes. Visual system looks for
correlations spatially and temporally. There are no information split into
pieces and it tries to connect smoothly.
For an example, at the border
between black and white, each side luminance is not discontinuous but smoothly
connected referring to the effective luminance theory. There exists learning in
its behind.
The speed of input visual
information is so fast and it has to be judged clearly with short time, the
concentration to processing it needs a lot of work and efforts. Accordingly,
the visual processing precedes the audition system. Toward visual information,
we have basically an attitude on a visual object to fix it and search it.
Formation of cells for particular purpose and
switching system
When a baby is born, he
can not recognize anything. Licking his fingers he gets a visual cell to
recognize his hand. He does not count the number of fingers to recognize it. He
sees his motherfs face always in front of him and gets a visual cell to
recognize a face.
During this period, he grows a visual cell with which he can
distinguish front or behind. Step by step he always learns to establish his
visual system.
We can not work by hand without knowing a boundary and depth in a
moment. We have to judge front or behind. We establish the quick recognition
system. There has to exist particular cells to do immediately.
The system reacts so fast and needs a switching mechanism in addition.
It has to react to the light and its transient response must be so fast. It is
never the same of spectrum that lasts 400ms. See the section (5-1) of this
chapter.
When we see characters on a sheet of paper or scenery, the location of
each cell is different. And it needs switching. Our hearing system has one too,
e.g. party effect. An established cell to see depth exists and obtained
information is exchanged with another eye. Such exchanging or mutual talking
must be established too.
As we have the regulation to luminosity,
we regulate our visual system to read characters on a sheet of paper, to see
near, middle or far distances. We move attention to each area consciously. In
addition, we have a switching to see vaguely and widely
The switching at the regulation to
luminosity is not done by our mind. The mechanism is imprinted. Other switching must be done the same. Its
transform is on the fuzzy system. The concept of effective luminance is a good
example.
Sitting
down relaxed and enjoying surrounding scenery might be the check up of the
switching system and normalization of the deformed part of the system. To have
stress free must mean this process. When we think about our cells get fast metabolism, it must be related
to this check up not only learning.
Looking up the blue sky, looking over the
wide sea, looking down a wide spreading forest etc all is needed and important.
Other wise we can not have a broad view and mind.
Existence
of crystalline lens
The visual system has only eyes to get visual information
and focus to an object. The diffracted light in the crystalline lens widens the
view angle and gives three dimensional data as well. It might start to have the
mutual talk with the other side.
The retina is a plane screen and distributes inside the crystalline
lens. It collects three dimensional data and maps them on the retina. The
retina starts to have the mutual talk with the other side.
There exists a function to process the perspective. Namely, it sends
three dimensional data to the retina and process there to give it higher level.
Mutual
talk, cross correlation and learning
If we see an object by a single eye, it
looks as if painted on a plate. The distance is felt by two eyes. It is found
out by both eyes. It is not done only by the parallax. They look to have cross
correlation in between and add learning. Although they get a distance with the
tinny difference parallax, it is not enough and they need the mutual talk. This
is the same for hearing system.
They take cross correlation but
get information loss by that. Learning and experience are needed to supplement.
If this process
can not be done fast, it is impossible to grab a thing or carry it to one place
to another on a daily life.
The mutual talk between two ears is not only done by them but also
through bone conduction and skin. On the other hand, the visual system does it
only by the sight and cleverly maps it.
After all, an object is sensed
front or behind by corresponding cells through an eye. Both eyes get the mutual
information exchange to have distance, however, it is not sure yet and get
quick eye movements to make it sure.
The visual and hearing systems
have commonly these movements to make it sure. A 3D screen is helped by a moving object
where the visual system works to understand. It is not real simulation.
Fusion
of sight and audition and their comparison
Why a stereo system does create a sound
image in front? The reason could be that there are visual things in front. Even
if lateral reflections come from
behind, they give still spaciousness. Visual objects are on the
stage and the same fusion of two systems happens there.
For the
visual system, a moving object gives a lot of information to fix it. It looks
to be easier to do so.
It must be
quick to have the direct connection to judge if it is dangerous or not. It is
essential to have the function for immediate decision.
How about the
mutual talk between two ears? We do not fix our head at a certain point but we
move it always slightly. It looks to have this slight movement on both visual
and hearing systems.
In the mutual
talk level, the visual system responds to the light speed and the hearing
system to the sound speed. Visual processing is done much faster than hearing
one.
There was a
person at the movie gathering who felt the sound was from the rear when he
moved to front close to the screen. There must be a few conditions to have the
fusion. In his case, the sound level from the rear loudspeakers were directly
reached him and loud.
When we
localize a sound source which is out of the focal plane, especially behind, we
hear it through ears and with bone conduction but through skin as well.
The impulse
response of hearing system, temperature sensation and visual system lasts in
5ms, 150min. 400ms respectively. They are given at (5-1) of this chapter. 400ms
is for the visual system to give spectrum.
A little additional comment
If the imprinting
process of learning and experience is given in a concrete box, it would be
unnatural and does not adapt to nature well.
A rock singer
was playing with an electrical guitar on the stage in front of two
loudspeakers. Where is the sound image created, in the middle of the
loudspeakers or does it move with him? It is interesting to see with
experiments.
It is very difficult to set a
microphone at a given spot in anechoic chamber where there is no reference near
by. In such a case, visual intuition helps a lot.
A few interesting
fusions of sensors: the sense of touch and visual system. Smell, look and
taste.