postscript copy of project report
What is CVR -- Collabrative Virtual Environment?
A major focus of research here at the EVL.
What is an avatar?
Each user in a persistent collabrative virtual environment is represented by an avatar, so that users at other sites in the same virtual environment know where they are and what they are looking at. A collabrative VR application sends tracker information of the user in the VR across the network; at the same time, it receives tracker information of other users in the same VR and displays the avatars at the right translation and rotation in the environment.
Why live video avatars?
During the development of CVR application at EVL, 2 types of avatars have been developed and used. The first type is an IV model type. This kind of avatars can be designed by the users, or CG artists. In the EVL N.I.C.E project, a VR learning environment for children, various IV models were designed and used.
A VR participant can also be captured on a turntable on a movie. Then depending on where the current user is and where the avatar is, different frames from the movie are displayed for the right and left eyes to create stereo in VR. The avatar appears as a 3-D wax figure in the environment.
As high speed network becomes more available, video compression technology advances, using live video to display remote users becomes the next natural step. This is also my most recent research and master project.
How can it be done?
The work will involves the following steps: real-time video capturing of a remote VR user, compression, sending compressed video data through high speed network, decompression, feeding decompressed video to video interface of the supercomputer driving the CAVE or IDesk; the supercomputer processing the video images, together with trackker information of the remote user, display the image of the user at correct position and rotation.
In our preliminary implementation, we used 2 IDesks, where we have better control over lighting for the video capturing. A camera is mounted on top of the screen of the each of the IDesks. The video output is fed to an O2 placed near the IDesk. At the same time, the audio output from the microphone is also fed to the O2. Video undergoes JPEG compression, and together with uncompressed audio, are sent across a T-100 based high speed network to another O2, which is at the other IDesk. Audio is played directly from the O2 for the user at this IDesk. Video undergoes decompression and is sent to the sirius video interface of the Onyx driving the IDesk. The VR application uses the video input as a texture, and maps it onto a polygon which is placed at the right translation and rotation according to the trackker information of the remote user. The same procedure happens both ways.
To fully make use of the hardware chroma keying capabilities of the Onyx sirius video, we set up a blue screen behind the user, so that the background of the video images becomes transparent when the chroma key values are set up correctly. Although the video avatar is just a texture on a flat polygon, the background-cutout makes the avatar appear convincingly immersed in the 3D environment.
21/2D video avatars
The video image is projected onto a static head model using a projective texture matrix, this allows the view from the camera to be mapped back onto the head model.
The loading of the video image to texture memory can be done in Performer draw callbacks, this can be done with Performer routines or using the direct OpenGL call after the Performer texture has been bound.
Once in texture memory the idea is to project the texture onto the head model. The first stage of the texture projection is to apply texture coordinate generation for the camera in OBJECT_LINEAR space.
We are trying to generate texture coordinates from the camera with r along the camera viewing vector and s & t along the x and y axis in camera space. An easy way to accomplish this is once s, t and r coordinates along x y and z OBJECT_LINEAR axes are being generated, use texture matrix transformations to translate and rotate these coordinates to the camera position and orientation.
Once this is done the perspective divide of x and y by z for the camera needs to be applied to the texture coordinates so a glFrustum call which matches the camera glFrustum call can be applied, this division of s & t by r (distance from sensor) generates s & t texture coordinate between -1 and 1 within the viewing frustum of the camera.
The next step is to change this -1 to 1 coordinate mapping to the 0 to 1 coordinate mapping of the image stored as an OpenGL texture, this is another simple texture matrix manipulation. The result is that the texture of the camera view appears projected back onto the head model from the sensor position and orientation.