As mentioned in our previous post, Telepresence is all about immersion.
There are several key factors needed in order to create a truly immersive experience:
- Eye to eye contact between the speakers – speakers need to feel as they are talking in a real conference room and conference table.
- Video should look fluid and as detailed as possible, hiding the fact that you aren’t looking at a real person – but at a real person electronic representation.
- The video background should look like a continuation of the user room.
- Color and lighting should be even and seamless as all participants are sitting in the same physical room.
- Sound should be directional, so when someone speakers, all should hear him as he were really sitting in the proportional distance and angle from them.
- Physical stimuli such as robot or robotic arms, is often used for scientific and medical Telepresence. This is a different field which is currently absent from the corporate world but might find its way there, as a solution for stay-at-home workers.

HP Telepresence Room
So how can we create that immersive experience?
The first thing vastly different from video conferencing is that a telepresece solution consists of identical user endpoints – You use the same screen, furniture, camera, background objects and lighting as the users sitting at the other end. What you see is what the other users see. And that’s rarely the case with most video conferencing solutions.
The second important point is that telepresence solutions produce life-size images. This is important for the virtual presence experience.
Large telepresence rooms are also isolated (also in sound) from other workrooms in the office environment, helping reduce any environmental factor that might break the virtual presence experience.
Usually a telepresence system consists of the following hardware and software:
- Large, high quality HD Plasma/LCD screens – needed to show a life-size image of the participants, in a fluid HD stream (1080P 30FPS, or 720P 60 FPS).
- A codec capable of delivering HD stream in low latency and low bandwidth, for 3 screens setups, usually 3 codecs will be used.
- High definition cameras with good optical and video performance.
- Conferencing furniture – a table (half table) and chairs. All must look the same in each of the conference rooms. Room structure, wallpapers and lighting should be exactly the same. For optimal lighting and color balance – this is dictated by the telepresence supplier.
- Sound equipment usually consists of a high quality noise canceling microphone array and speakers. Sound is encoded in a high quality (HD) encoding.
- Laptop/accessory inputs.
- Networking equipment.
- Telepresence control software, MCU for large multi point installations.
At the software and standards side – H.323 and SIP are being used as common protocols in many (but not all) solutions – same for most of the video-conferencing equipment at the market, and interoperable with many types of telecommunication systems.
Telepresence rooms are constructed to avoid usual video conferencing problems – use of large screens with connected HD cameras allow the participants to talk to each other at the eye level and look at each other in as they where sitting at the same table. Camera placement and participant distance is crucial to allow it; this is how it looks in a regular video conferencing solution:
As you can see, the user is looking at the screen instead of the camera, because of that, the person at the other side will see her looking down and not to his eyes. This is a major flaw for most video conferencing solutions. As you can’t put a camera in the middle of the screen, the best solution is to move the camera and the screen to a larger distance from the user – thus minimizing the problem – this is one of the major reasons that multi user telepresence rooms are more natural looking then personal video conferencing solutions.
Directional speakers and well placed microphones, also help develop eye contact – when a participant talks to you, you hear it from his relative direction in the virtual space.
The 1080P video stream is running at 30 frames per second. This rate is sufficient for fluid motion – anything less does not look real (even though, might be visually pleasing for artistic reasons – movies are filmed at 24fps). You can see in this link (http://www.boallen.com/fps-compare.html) the difference between 15fps, 30fps and 60fps. For real life uses (and not a rotating cube), 30fps is sufficient.
All of these requirements dictate a high bandwidth connection in order to pass all of this information, typically, for 1080p30 quality with legacy encoding – between 2 and 4 mbps per screen is required, for a total of 12 mbps per connection for 3 screen system. Cisco “Home Telepresence” device “Umi”, which does not support multi-point video-conferencing needs at least 3.5Mbps for 1080P quality.
a good example of a telepresence point of view, is the invitation to the telepresence options virtual conference
In the next article at the series, I’ll examine different telepesence solutions and technologies.





[...] Robots – when I first mentioned telepresence robots in the IMTC Blog I talked about them as specialized medical product, that might find their ways [...]