(6-1) Fusion of sight and audition









G. Dodd and Y. Sakurai: "Localization of sound\A New Zealand Revelationh, NZ Acoustics, vol16,no4, p28|33

Contradiction on the law of the first wave front

1) Introduction
When a movie was projected on a large screen in front, and stereophonic reproduction was given from two loudspeakers in behind, the localization of the sound was happened at the screen. All of 7 or 8 people who remained after the movie gathering answered that they heard the sound from the screen as well. It contradicts the first wave front law (the Haas effect).

This phenomenon did not happen, when slides were projected on the screen and sound was given from the same place as the movie was played. The sound was heard from the loudspeakers behind.

On a different day, a movie was played on a smaller screen of the half size. It was recognized that sound was heard from the rear speakers at first. As time passed, though, the sound impression gradually moved to forward. However, it never came from the screen to the last minute.

Interesting enough is that even after the screen was turned off, the sound was heard from forward. It is thought that it was caused by learning or plasticity.

In order to explain this phenomenon more, the introduction of the mutual reaction of the sight and the audition must be needed. Generally, could we tend to be attracted to the sight more strongly? Can we say that the sight is predominant? How each other is different at information gathering?

We will have to consider at the sound evaluation of a concert hall if the sight influences the localization of sound in this manner. That is, we can not neglect the sight design on the stage related to the acoustic design.
In addition, it will be involved when the reflection in a concert hall is evaluated. When we discussed on the acoustical renovation at Tangle Wood Music Shed, Harold Marshall said that spaciousness can be obtained even if lateral reflection is given from the rear sides, though the reason missed being heard.

2) Time and spatial window
It is necessary to give a window to take out necessary information of the object on the sight or the audition which expands or continues. One is a time window to obtain the information of a certain time interval and another is a spatial window where a certain necessary range is chosen in the area. It can be said that the weight of the audition for a spatial window is given in front being related to the head related transfer function1). However, it is necessary to research in details of the discussion.

There are a time window and a spatial window in either system as a strong will to get information in this manner.

3) Fusion of sight and audition
The time window may be short for a still picture because information can be confirmed soon. There information magnitude or intensity is small. The time window is long, and the magnitude of information is large for a moving image.
 
This appearance changes depending on the dimension of an object, the speed of change, the kind of information and etc. It is necessary to discuss being related to eye-movement.

When a screen is large, eye-movement becomes great and rapid, and a large effort is necessary for the information gathering of the sight. Then, the effort is concentrated only on the sight and the audition processing is forgotten.

Is it possible to have the following hypothesis?

Hypothesis;
We have the attitude to combine the two different sensory inputs into one information, if they are of the same source. These two systems are not good at simultaneous perception for strong stimulation above a certain level. And, it follows to the sense with the stronger stimulation.
However, intensity or magnitude on each system is within certain strength, each response becomes independent.

Examples: It was the case when sight stimulation was stronger at the movie. The specification of a view is difficult when a lot of machines produce big noises at the same time. The sight of performers is not important when one is deep in listening to a wonderful music. When we hear noise of certain level, we can not concentrate our mind. When we work with music we lose efficiency. They are cases when audition is occupied stronger

When a movie screen is small, the eye-movement is small, and each can be separated. However, though the sound is recognized coming behind at first, the sound image moved forward as time passed. And it was never heard from the screen. It is said that the sound image existed in front even right after the screen disappeared. This experience is different from at the still picture. It could be explained as learning or prasticity. Now, we have to discuss on the relationship of the stimulation intensity of both sensation systems. The intensity of each sensation would be multivariable; change in time, range of the area with information, easiness of information acquisition, brightness of a screen, and receiving sound level, etc.

When a still screen cuts to appear suddenly on a large movie screen, do we hear the sound from the rear? It seems not to be so. As even at the half screen the sound remained in front after the projection disappeared, we would keep hearing from the screen. This might be a running or shifting learning effect for the sight as mentioned previously.

Then, is this learnt attitude possible to be shaken off in consciousness, and to hear the sound to be recognized coming from the rear? Why it cannot be done? Then, wrong recognition might be happened at the place where a sight coexisted with audition in this way.

4) Supplemental discussion
Though the received level to each ear on the first wave front is slightly different, stereo effect occurs with their same levels, and the sight induces it to the middle might be a possible explanation. However, it is not enough to explain the difference between a still and a moving screen. There the strength of information should be discussed.
When a singer has a wireless microphone and the sound is generated from the loudspeakers on both sides, does the sound and image attach to the singer's movement? How about a movie is on the screen? It seems that the stereo effect is caused at this time, and the sound and image follows easily. Though, the discussion here is the case when the loudspeakers are placed in the rear.

The diffraction wave chiefly brings low frequency range sound, and reflection wave composes high frequency range sound which gives the directivity from the front. This is only an explanation from the acoustical side and does not explain the different impression of a moving and still screen.
 
5) Experiments in the future
Two dummy loud-speakers are displayed at each side at the screen. Stereo sets of loud-speakers are distributed along the bottom corners of each side wall, and ask them where sound is coming from showing a movie. Eye-movement measurement should be done together.

Sight; still or moving, large and small, change of speed
Sound source; stereo or monaural,
Optics; brightness of a screen

It is necessary to have multivariable analysis with above factors at the preparation. The introduction of the correlation function between two sensory systems would arise.
In order to concentrate the discussion, a movie of only one language should be projected.
Do the early reflection and or the sound diffusion influence? If so, spaciousness should be discussed on the subject.
When we close our eyes at the movie where does the sound come from?

Discussions
How will it be different when sound is for the screen and when it is not related to the screen. How about when the sound is music? Does our mind move to the screen especially even when the music is not related to it?

When we discuss on learning effect, we have to talk about memory. However, the phenomena do not look to happen at the central but at peripheral. Namely it is the process before the cerebrum to understand. It happened as well when the subtitle was too quick and hard for me to understand.

References
1) My homepage at Chapter 6 (3-1) Acoustical evaluation of an auditorium

Peep into visual sensory system

Its nature and behavior
When a movie stops and we see a stopped frame, it looses clear resolution. Such discrete frames are connected with our spectrum but actually what is happening in our visual system. There they must have character to connect them up. The visual system also takes auto-correlation of a signal and leaves the remaining effect that was measured at (5-1) of this section.
It also estimates the next from a moving object. It has an attitude to connect in time.
When we see an object with naked eyes it looks natural. But if we see a pair of parallax pictures through a stereoscopic viewer, the object looks like painting on a sheet of paper, no depth. Namely, the pair is fixed and no information exchange in-between occurs. The former gets natural 3D with slight eye movements.
The system does not want to have discontinuity in space or is impossible to do so. Is it imagined with the cross correlation between both eyes? It looses some information because of the process and needs to supply it with learning and experience.
It has an attitude to connect spatial discontinuity with the mutual talk of both eyes. Visual system looks for correlations spatially and temporally. There are no information split into pieces and it tries to connect smoothly.
For an example, at the border between black and white, each side luminance is not discontinuous but smoothly connected referring to the effective luminance theory. There exists learning in its behind.
The speed of input visual information is so fast and it has to be judged clearly with short time, the concentration to processing it needs a lot of work and efforts. Accordingly, the visual processing precedes the audition system. Toward visual information, we have basically an attitude on a visual object to fix it and search it.

Formation of cells for particular purpose and switching system
When a baby is born, he can not recognize anything. Licking his fingers he gets a visual cell to recognize his hand. He does not count the number of fingers to recognize it. He sees his motherfs face always in front of him and gets a visual cell to recognize a face.
During this period, he grows a visual cell with which he can distinguish front or behind. Step by step he always learns to establish his visual system.
We can not work by hand without knowing a boundary and depth in a moment. We have to judge front or behind. We establish the quick recognition system. There has to exist particular cells to do immediately.
The system reacts so fast and needs a switching mechanism in addition. It has to react to the light and its transient response must be so fast. It is never the same of spectrum that lasts 400ms. See the section (5-1) of this chapter.
When we see characters on a sheet of paper or scenery, the location of each cell is different. And it needs switching. Our hearing system has one too, e.g. party effect. An established cell to see depth exists and obtained information is exchanged with another eye. Such exchanging or mutual talking must be established too.
As we have the regulation to luminosity, we regulate our visual system to read characters on a sheet of paper, to see near, middle or far distances. We move attention to each area consciously. In addition, we have a switching to see vaguely and widely
The switching at the regulation to luminosity is not done by our mind. The mechanism is imprinted. Other switching must be done the same. Its transform is on the fuzzy system. The concept of effective luminance is a good example.
Sitting down relaxed and enjoying surrounding scenery might be the check up of the switching system and normalization of the deformed part of the system. To have stress free must mean this process. When we think about our cells get fast metabolism, it must be related to this check up not only learning.
Looking up the blue sky, looking over the wide sea, looking down a wide spreading forest etc all is needed and important. Other wise we can not have a broad view and mind.

Existence of crystalline lens
The visual system has only eyes to get visual information and focus to an object. The diffracted light in the crystalline lens widens the view angle and gives three dimensional data as well. It might start to have the mutual talk with the other side.
The retina is a plane screen and distributes inside the crystalline lens. It collects three dimensional data and maps them on the retina. The retina starts to have the mutual talk with the other side.
There exists a function to process the perspective. Namely, it sends three dimensional data to the retina and process there to give it higher level.

Mutual talk, cross correlation and learning
If we see an object by a single eye, it looks as if painted on a plate. The distance is felt by two eyes. It is found out by both eyes. It is not done only by the parallax. They look to have cross correlation in between and add learning. Although they get a distance with the tinny difference parallax, it is not enough and they need the mutual talk. This is the same for hearing system.
They take cross correlation but get information loss by that. Learning and experience are needed to supplement. If this process can not be done fast, it is impossible to grab a thing or carry it to one place to another on a daily life.
The mutual talk between two ears is not only done by them but also through bone conduction and skin. On the other hand, the visual system does it only by the sight and cleverly maps it.
After all, an object is sensed front or behind by corresponding cells through an eye. Both eyes get the mutual information exchange to have distance, however, it is not sure yet and get quick eye movements to make it sure.
The visual and hearing systems have commonly these movements to make it sure. A 3D screen is helped by a moving object where the visual system works to understand. It is not real simulation.

Fusion of sight and audition and their comparison
Why a stereo system does create a sound image in front? The reason could be that there are visual things in front. Even if lateral reflections come from behind, they give still spaciousness. Visual objects are on the stage and the same fusion of two systems happens there.
For the visual system, a moving object gives a lot of information to fix it. It looks to be easier to do so.
It must be quick to have the direct connection to judge if it is dangerous or not. It is essential to have the function for immediate decision.
How about the mutual talk between two ears? We do not fix our head at a certain point but we move it always slightly. It looks to have this slight movement on both visual and hearing systems.
In the mutual talk level, the visual system responds to the light speed and the hearing system to the sound speed. Visual processing is done much faster than hearing one.
There was a person at the movie gathering who felt the sound was from the rear when he moved to front close to the screen. There must be a few conditions to have the fusion. In his case, the sound level from the rear loudspeakers were directly reached him and loud.
When we localize a sound source which is out of the focal plane, especially behind, we hear it through ears and with bone conduction but through skin as well.
The impulse response of hearing system, temperature sensation and visual system lasts in 5ms, 150min. 400ms respectively. They are given at (5-1) of this chapter. 400ms is for the visual system to give spectrum.

A little additional comment
If the imprinting process of learning and experience is given in a concrete box, it would be unnatural and does not adapt to nature well.
A rock singer was playing with an electrical guitar on the stage in front of two loudspeakers. Where is the sound image created, in the middle of the loudspeakers or does it move with him? It is interesting to see with experiments.
It is very difficult to set a microphone at a given spot in anechoic chamber where there is no reference near by. In such a case, visual intuition helps a lot.
A few interesting fusions of sensors: the sense of touch and visual system. Smell, look and taste.