Kinect for Windows Gets All Up In Your Grill

A recent post from the folks over at Channel 9 has caught my attention regarding a Kinect for Windows development. The original article is a bit on the technical side and goes into lots of great information on how to use the API and various resolution issues that quite frankly is a bit too technical to get real excited over, especially with this announcement competing with E3 announcements. I’ve decided to take a step back from all the supper technical code speak and talk about why this is an important step for NUI.

Connecting with a person in a face to face environment has many advantages. The ability to hear vocal inflections that provide audible cues on a person’s mood helps in the communication between people. You can tell if a person may be getting upset or excited by the sound of their voice. Something not easily achieved in text, UNLESS I PUT IT ALL IN CAPS! (I was excited, not mad if you couldn’t tell). But other valuable cues to us are those that our face creates. A smile, a wink, a nod, a roll of the eyes… can all say volumes about our level of interest in a conversation.

What the Kinect for Windows "sees" | Channel 9

In regards to the use of face tracking the article on Channel 9 points out:

The face tracking engine computes 3D positions of semantic facial feature points as well as a 3D head pose. The Face Tracking SDK could be used to drive virtual avatars, recognize facial expressions, Natural User Interfaces and other face related computer vision tasks.

Since I am tackling this subject at the height of E3, let’s look at what this could mean for you PC gamers out there. Imagine a full immersive experience in a virtual world, such as WoW or SWKOTOR, and you are interacting with another player. Instead of the standard dead pan expression and speak bubbles you could have real voice chat (other than with your guild on Ventrilo) combined with your character mimicking your own facial expressions… of course with the voice chat option you won’t be able to play out your fantasy of being a female Night Elf (that is unless they come up with some really awesome voice changing software)…

As for more of an application based use. Say you’re using the new Kinect for Windows to write a blog. Using a speech to text “Computer, take a note.” You’re in the moment and orating a great piece of writing (now there’s an image), and you change your mind on what you wrote. So you pause and direct the computer to delete something. Prior to deleting the text, the computer highlights it and asks: “Is this what you want me to delete?” At which point you could simply nod yes. All increasing the fluidity of Natural User Interfaces.

Here a few technical highlights that I found interesting:

  • The Face Tracking engine tracks faces at the speed of 4-8 ms per frame depending on your PC resources. It does its computations on a CPU (does not use GPU).
  • Light – a face should be well lit without too many harsh shadows on it. Bright backlight or sidelight may make tracking worse.
  • Distance to the Kinect camera – the closer you are to the camera the better it will track. The tracking quality is best when you are closer than 1.5 meters (4.9 feet) to the camera. At closer range Kinect’s depth data is more precise and so the face tracking engine can compute face 3D points more accurately.
  • Occlusions – if you have thick glasses or Lincoln like beard, you may have issues with the face tracking. This is still an open area for improvement.


So there you have it. Why face tracking will improve how we interface with computers!

-Eric Wilkinson



Nikolai Smolyanskiy works at Microsoft on computer vision projects. Most recently, his team released Face Tracking SDK in Kinect for Windows v1.5. Prior to that, he worked in a variety of areas like Windows, Search, Cloud Services, MS Office, Optical Character Recognition, navigational systems.

Greg Duncan


Leave a comment