Original Link: http://www.anandtech.com/show/4057/microsoft-kinect-the-anandtech-review
Microsoft Kinect: The AnandTech Reviewby Brian Klug on December 9, 2010 3:20 PM EST
For better or worse, new user interface is all the rage right now in the console gaming scene. Nintendo was first to the block in 2006 with 3D motion-controlled user interfaces, leveraging a unique combination of IR sensors and 6-axis MEMS accelerometers in a handheld remote. The motion-controlled Wii has enjoyed a nice long run being the sole platform for motion-assisted gaming. Flash forward to late 2010, and Microsoft and Sony both have readied their response to the Wii - the Microsoft Kinect and Sony Move, respectively.
It’s taken the greater part of four years (and one name change) for the software giant’s answer to make it to market, but Kinect is finally out and ready for mass consumption. We’ve spent nearly a month playing with Kinect and are finally ready to release our impressions.
First off, the Kinect is fundamentally different from Sony and Nintendo’s offerings. Instead of relying on handheld controllers and motion targets, the Kinect uses a purely optical solution which we’ll get to in a bit. The result is that there’s only one thing to purchase to add Kinect to an existing Xbox 360 install - the $149.99 Kinect sensor itself. We purchased a retail kit on launch date, which comes with the sensor itself, cables, some paperwork, and Kinect Adventures.
Packaging for the standalone Kinect package matches the style of the Xbox 360 S packaging - it’s a lot of green and purple. On the box, Microsoft stipulates that you need at least 6’ of free space in front of the sensor to play, which seems a bit optimistic as I’ll show later. There’s an unboxing gallery below in case you want to see for yourself. The Kinect is securely seated in a foam recessed area.
Inside the box is the Kinect sensor itself, Kinect Adventures, and a suite of cables.
First up is an orange-tipped USB-like cable with a special connector for connecting the Kinect to the Xbox 360 S. This cable is keyed differently than a normal USB cable and allows the Kinect to draw power from the console itself instead of requiring a standalone power supply.
If you’ve got an Xbox 360 S, this is the only cable you need to use Kinect. The cable physically looks like USB, but the connector inside is visibly different besides the obvious shape difference.
The rest of the cables are for if you’re connecting the Kinect to an older generation Xbox 360. For that, you get a power supply cable which breaks off into a Y connector - one end is orange tipped and connects to the cable coming out of the Kinect, the other goes into your original-gen Xbox 360.
But wait, what about that odd-looking grey cable?
It’s a WiFi extension cable. Remember that unlike the Xbox 360 S, the older Xbox 360 has just one USB port on the rear, and two in the front. I have an original Xbox 360 Pro from launch date, and also happen to use the Xbox dual band 802.11N adapter to connect wirelessly. If you’re using a setup like this, you’re going to need to run a cable from the wireless card - using the extension cable- all the way around to the front of the box and into one of the front USB ports. It’s an unaesthetic solution that’s an unfortunate consequence of the old Xbox 360 simply not being designed for all these accessories. It’s a bit disappointing there isn’t a hub involved somewhere here like what Microsoft did with the ill-fated HD DVD player (which included a notch and USB port on the back for the displaced wireless adapter), but perhaps bandwidth considerations over the USB hub contributed. If you’ve still got a Microsoft HD DVD player kicking around and connected, things could theoretically be getting very crowded with daisy chained USB devices. With an old Xbox 360 and Kinect hooked up, you eat up two power ports, and with a wireless adapter, are left with only one available USB port on the front for connecting controllers and USB storage. If you’re like me and have your profile stored on a USB drive (so you can migrate from box to box, ostensibly) you end up using all those ports.
With the Kinect connected using the power supply, I measured a total power draw of 5 watts, which is pretty respectable. The Prime Sense specification says 2.25 watts, but that’s probably before losses are incurred from the power supply and additional overhead from thermal management.
I used to be attached to my older Xbox 360 purely for aesthetic reasons - it looked cool with a different case, and managed to not sound like a hair dryer after a fan replacement, but my venerable launch console stopped working with the latest dashboard update that brought UI changes and Kinect support (seriously). My original intentions were to try Kinect out on the old Xbox 360 Pro and also the new Xbox 360 S, but the old console alternates between dead and alive for half hour periods so much that it isn’t worth the frustration. One warranty-repair RROD and two x-clamp RROD repairs later, the thing was on its last legs anyway. I did want to illustrate how seriously out of control the cable situation can be - ironically for a new user interface that’s entirely wireless and relies on no controllers at all. Above you can see the cable extend the wireless adapter all the way to the front. Toss in the Y connector, and there can be a heck of a lot of cables running around.
Again, it’s obvious that the least painful way to use Kinect is with the new Xbox 360 S console, thanks in no small part to its higher-power custom USB port. Microsoft will also sell you an Xbox 360 S 4 GB bundled with the Kinect sensor and an extra game for $349.98.
Kinect - The Sensor
The sensor itself matches the style and appearance of - you guessed it - the new Xbox 360 S. That means its every surface is glossy black plastic which looks nice, but immediately shows fingerprints, dust, and scratches. If you’re OCD like me and don’t happen to live in a clean room, this is a constant but minor annoyance. Oh well, at least it matches the console.
The Kinect is noticeably bottom-heavy, with a center of gravity closer to the bottom and somewhere in the base most likely. This makes sense since the horizontal arm housing the optical system and microphone array pivots vertically - the base is a stage of sorts which lets the top arm sweep through around 30 degrees to adapt itself to placement on top of a TV or below on a table, basically to suit your entertainment setup.
There’s a three-axis MEMS accelerometer onboard ostensibly for determining the tilt of the sensor relative to the base of the device - if you put Kinect on top of a TV for example, there’s no assurance that the surface the Kinect is resting on is normal to the ground.
Down on the bottom you’ll notice an obvious grating, under which hides a four-microphone array for some beamforming goodness. The result is that the Kinect can sense which direction sounds are coming from, for isolating a single speaker rather than an omnidirectional or even stereo microphone. That's critically important for also picking out voice commands while there's music and game noise blaring in the same room.
The most prominent physical feature on the Kinect, however, is its relatively sophisticated optical system. From left to right is an IR laser projector (more on this in a second), RGB camera, and IR camera. Between the RGB camera and IR laser is simply a green LED for status - it plays no role in the actual optical system. Also on the bottom of the sensor is a nondescript class-1 laser product notice - yes, the Kinect indeed uses an IR laser, but it’s completely eye safe at class-1.
There’s also a requisite laser warning buried in one of the three booklets that ship inside the Kinect sensor box, but you’ll notice that Microsoft has been careful to avoid drawing attention to the fact that there’s a laser in Kinect - people have a strange aversion to looking at or into perfectly eye-safe lasers. There’s no marked wavelength that I could find, but the Kinect’s IR projector is visible to my naked eye and exhibits trademark laser speckle - my guesses are that the laser is between 750 and 900 nm, with 808 and 880 nm being common commercial “IR” laser diode wavelengths.
From the Xbox’s perspective, there are two separate video streams which come from the Kinect - one 640x480 (VGA) 30FPS stream from the RGB color camera, and one 640x480 11-bit 30FPS image stream which is the output 3D depth image after processing. Kinect implements a subset of Prime Sense’s natural interaction reference design, which originally specified a much higher resolution color sensor and 60FPS depth image. Other subtle differences are that Prime Sense specifies two audio streams, whereas Kinect uses an obvious four, but such tweaks are ultimately the result of Microsoft having to maintain a delicate balance between optical performance and staying within a reasonable price point.
How Kinect senses depth is half of the magic behind how it works - the other half is software. There’s actually not a lot behind how Kinect creates that 11-bit depth image once you understand how it works. Kinect uses a structured light IR projector and sensor system - something widely used in both industrial manufacturing and inspection. The principle of structured light sensing is that given a specific angle between emitter and sensor, depth can be recovered from simple triangulation. Expand this to a predictable structure, and the corresponding image shift directly relates to depth.
Example structured-light system optical system, from Spacecraft hazard avoidance utilizing structured light
In Kinect, that system is comprised of an IR laser (which I’ve already touched on), a carefully engineered diffraction grating (in this case, the diffraction grating is actually a computer-generated hologram - CGH - with a specific periodic structure), and a relatively standard CMOS detector with a band-pass filter centered at the IR laser wavelength. Inspecting the sensor with the naked eye, you can easily see the characteristic rainbow-effect from the CGH atop the IR projector, and that shiny layer on the IR-sensitive CMOS is likely a band-pass filter.
That carefully-engineered CGH produces a specific periodic structure of IR light when the laser shines through it. There’s a computationally-derived periodic structure of undoubtedly square cells inside which diffract light into a periodic structure. The first day I had Kinect, I immediately set out to find what that pattern was, and it’s simple to measure. Although the laser itself is intense enough to see with the naked eye at the source, the projected pattern thankfully isn’t, but the solution is trivial. Stick a lambertian reflector (read: piece of paper) in front of Kinect, and you can see the pattern with any IR-sensitive device. Though my DSLR has a low-pass IR cut filter like most cameras (since IR is generally undesirable in visible imaging systems), the most immediate device I found which was sensitive enough to photograph the pattern was a smartphone camera - an iPhone 4. The image is obviously false-color - this radiation is actually in the far red/near IR part of the spectrum. That pattern is below:
You can immediately notice some things - first, there are 9 clearly visible repeating blocks with a specific semi-random pattern of points inside. This structure is repeated across the blocks, and it’s obvious this pattern arises from a holographic structure from the bright 0-order point at the center of each block. Note also how the structure is also engineered to be spherical, which gives it that curvy edge shape - the paper is actually being held perpendicular to the projector. This projection grid defines the field of view of the Kinect sensor, and the distance between points inside the grid likewise defines the spatial resolution of the depth sensor. I’m only holding the paper about a foot away from the projector. Close up inside one of those cells you can see the structure which consists of many small points:
The projected image doesn’t change in time - it’s fixed this way. The IR CMOS sensor images this pattern projected onto the room and scene, and given the camera’s displacement a few inches from the projector, from the displacements in the semi-random projected pattern is able to back out the corresponding depth image. That compuation is done onboard the Kinect itself, and it’s entirely possible (read: likely) that the IR sensor inside the Kinect is higher than the 640x480 resolution of the resulting image.
When iFixit tore apart the Kinect, it was immediately apparent that Microsoft had devoted a lot of engineering into the sensor’s cooling solution, which at first thought seems strange - why do two cameras and a bunch of microphones need fans? Cue RROD jokes. Further, in the disassembly photos, I noticed a peltier cooler on the back of the IR laser diode.
The peltier cooler is what's being pried off - Courtesy iFixit
The reason for the cooler should now be obvious - diffraction gratings are extremely wavelength-dependent, and the Kinect functions or doesn’t based on its ability to properly detect the projected IR image. The result is that the IR diode likely needs to be kept inside a window of under 10 degrees C of some temperature so the laser’s peak output is at or very close to the wavelength the CGH was designed to work at. The other consideration is that the top of most TVs where you could conceivably place a Kinect can be notably warm. Kinect’s somewhat overengineered thermal design now makes sense - the peltier cooler likely gets hot on the back side which connects to the metal base plate, the front which touches the diode laser gets cool, and the fan sweeps air through the whole box when required. RROD jokes aside, making sure that the system is carefully thermally regulated is an important part of the optical design.
I decided I was going to see if I could make the Kinect fans turn on, or the device overheat. I left the Kinect sensor turned on for 48+ hours atop my rather-warm LCD TV, and later an even hotter plasma TV and could never once feel it get noticeably warm, or even detect airflow through the Kinect. It’s possible that the fans were spinning, but if that’s the case I couldn’t detect it.
Kinect - Environmental Constraints
There are a lot of environmental constrains that the Kinect imposes, but the system actually works extremely well. Inside the manual and throughout the literature distributed with Kinect are notices about reducing ambient sunlight in the room you’re using the Kinect. At the same time, you need ambient light for the RGB color camera to work - this seems somewhat confusing until you consider that sunlight also contains spectral components at the same wavelength the Kinect operates at. Lots of ambient sunlight thus decreases the visibility of those spots on the room and players, and could conceivably make depth detection much worse. That sort of reminds me how the Wii sensor bar could be emulated with candles some distance apart atop the television, or how playing with the Wii next to a fireplace could easily make the system glitch out.
In practice, I never once noticed problems with depth detection on Kinect that I’d blame on ambient noise from stray light sources - even in a room with, yes, a huge fireplace (hey, it’s a reasonable test). The more important consideration actually turns out to be the relatively poor sensitivity of the RGB camera, which requires lots of room lighting to get decent framerate or augment facial detection.
The more important consideration for playing space is play space size - you need a big room to use Kinect. On the back of the Kinect packaging, Microsoft notes that a distance of 6’ from TV to players is requisite at minimum. After a lot of testing, I’d say 6’ is optimistic, even for one person. In actuality I’d say 9’-12’ is closer to what is optimal, especially if you want to play with a second person, and later inside games you’ll be told as much.
Sensor position itself makes a lot of difference here - Kinect can go either on top or below the TV, and Microsoft recommends anywhere 2’ - 6’ off the ground. Ultimately, the bar is that Kinect needs to be able to see your entire person - feet, head, and some distance around you to move, jump, and flail your arms around inside. If you place the Kinect on an entertainment center table below the TV, the hazard is that the depth of the table will block part of the depth image at the bottom. I had much more success with placing the Kinect atop TVs, especially when you encounter LCDs and pasmas of the wall-mounted sort.
The first area I tested was my living room, which is reasonably large (or so I used to think), and I played with about 7 feet away from the sensor. I moved a coffee table, unfortunately moving my couch is prohibitively difficult, and I didn’t have space to get rid of it. I was able to play with two people, but it was a bit cramped, and I felt like I needed more distance from the TV the whole time.
The consideration I didn’t think of is what’s above. Needless to say, you need space to flail your arms around, especially if you’re tall. Above me within arm’s reach is a ceiling fan. That’s a serious problem if you get excited and jump with your arms above your head, as my girlfriend experienced a few minutes into our first session of Kinect Adventures. Ultimately, I can’t blame Kinect for my living room being small, but you need to be aware of the space constraints - Kinect won’t work in dorm rooms or tiny play areas, but does work with at least average sized living rooms. It’s entirely constrained by the field of view of the sensor and CGH, and admittedly it’s hard to make very wide angle CGH systems with good uniformity and repeatability.
I later tried Kinect in a much larger space with more than 12’ of space behind a wall-mounted plasma, with the Kinect mounted atop the display about 6.5’ above the ground. I had much more success here both with detection of my feet, and two-player space.
If you’re running out of space or are playing with just barely enough, the problems that arise are what you’d expect. Kinect won’t see your feet very well, or miss your hands, but thankfully most of the time you’re given feedback visually on the display that you need to move back into the optimal position.
The interesting bit of the Kinect space story is that closer to the sensor, you get better depth resolution at the cost of having less space to work in, yet far away you sacrifice depth resolution for more working area. It’s a simple consequence of field of view and the angular separation of those points in the CGH image. Prime Sense advertises that their reference design has a depth resolution of 1 cm at two meters from the sensor. As you move further away, that depth resolution gets worse (larger).
The final note is that I’ve actually noticed that clothing choice makes a difference. In fact, there are a few warnings in the literature shipped with games which notes that some clothing choices may make detection hard. I never had serious issues being detected outside of one special combination of cargo shorts and a specific footwork-heavy move in Dance Central, but more on that later. Common sense applies here, and essentially you need to look human and have some depth contrast visible for Kinect to detect your limbs.
Setup and Calibration
The first time you connect and fire up your console (Xbox 360 or Xbox 360 S), you get to go through the setup wizard. If you don’t have Xbox Live, you’re told to sign up for a trial period and that you really should get it so you can download necessary updates. Actually, even before the trial wizard pops up, you’re prompted to install a necessary update if you somehow don’t have the tweaked Xbox 360 dashboard (NXE 2.0?).
After that, you’re told to position the sensor appropriately, check background sound, calibrate the array microphone, and then decide whether you want to use the Kinect microphone for Xbox Live party chat. The first time I ran through this wizard, it complained to me that the room was too loud - in complete silence. I think that me snapping photos was loud enough to trigger it, but they’re not messing around with wanting you to be quiet during the calibration routine. Moreover, this is important so the smart microphone system can build a profile out for the room, and probably cancel the game sounds themselves. On my 5.1 system each channel played a tone twice.
The setting to pay attention here is whether or not you truly want your Kinect to be the party chat device. Turn it on, and you’ll default to using this microphone array in online matches instead of an earpiece which you can easily mute - in fact, this is really the only inconvenience. If you turn this off, you’ll also find that later on you can’t use Kinect Video chat - more on that later. The setup tutorial also tilts the sensor appropriately depending on whether you’ve put the sensor on top or below of the display.
After the wizard completes, you’re left thinking that it’s finished. Instead, it starts a training tutorial that tells you to move all your furniture, remove any extraneous friends from the field of view of the Kinect, and walks you through interaction. What’s unnerving about this tutorial is that it requires the controller - that just doesn’t seem right. It makes even less sense given the first setup wizard’s insistence that you put down the controller, even showing a no-controller symbol if you try and mash buttons. But it walks you through the basic interactions which I’ll summarize in a second. If you’re dying to see every step of setup, they’re in the gallery below.
The Kinect tuner essentially lets you run the audio calibration, tilt, and room calibration again. There’s a card which ships with Kinect Adventures (and you can purchase online, seriously) that has a happy face.
The tuner takes requires you to hold the calibration smiley card in a variety of positions that line it up with on-screen sunglasses. This calibration routine requires lots of ambient light as it seems to use both the depth and color cameras.
The final (optional) Kinect training is auto-login facial recognition. It’s a pretty cool concept - step in front of the Xbox in a Kinect enabled game or situation, and you’re automatically signed in under the appropriate gamertag.
The facial recognition training requires you to stand in a variety of different places throughout the room and match a pose. It’s a bit confusing that facial recognition requires hand gestures, but I’m guessing this is just to keep the ADD sensibilities appeased. You walk around, turn slightly, and stand in the appropriate place until told to move. Kinect basically needs to build a 3D profile of your face since its facial recognition algorithm uses both depth and color cameras to build your profile.
In practice I found auto login to work without fuss almost all of the time. Step into the field of view of the sensor, and you’ll get a recognizing prompt to the left of the depth camera image, and if successful a “welcome back, [gamertag]” message. I only had auto login fail once or twice when the room was very dark, and once when I moved the Kinect to a completely different location before running the tuner again.
The primary interactions with Kinect are pretty simple, there are really only a handful of gestures. To start using Kinect from the normal dashboard, or pretty much anywhere, you wave your hand. That lets Kinect lock onto which hand you’re going to use to gesture with, and it applies almost everywhere - be prepared to do a lot of waving. Waving in the normal dashboard brings up the Kinect dashboard, which is essentially a Kinect-specific ‘lite’ version of the main dashboard. It’s a bit disappointing that Kinect doesn’t nicely bolt onto the main dashboard, but all the core functions like launching games and doing Kinect specific tasks are covered.
Inside the Kinect dashboard, you can navigate around and interact with your hands, or by saying “Xbox” and any of the words on the dashboard. It works pretty well, but honestly I haven’t found myself using voice very much.
Selection is done by holding your hand over an item - a progress circle rings around and chimes, letting you know you’ve made a selection. Moving from page to page on the Kinect dashboard involves hovering over the arrows at left and right and swiping appropriately. It’s probably the only gesture I don’t really think is perfect, but it works.
The next main gesture is universal pause, which involves holding your right arm at your side, and sticking your left arm out at 45 degrees. Holding it there also brings up the progress circle and chime, and then pops up the game menu.
This is essentially analogous to pressing the center Xbox button on a controller, though Microsoft calls this the Kinect Guide, from here you use the hand gestures and selection to either escape out to the dash, return, view awards, or launch the Kinect tuner. That’s really all there is to it, as further gestures are game and activity specific but always pretty intuitive. I've put together a small video showing off interaction and navigation, and a small tour of some of the Kinect apps.
When I first saw the Kinect voice commands, there was a lot of talk about other players being able to effectively troll Kinect users by yelling “Xbox Pause” or “Xbox Stop.” I randomly would shout that, and found it interesting that there aren’t too many - abruptly stop and exit what I’m doing - voice commands, and especially not any in games. Most of the time, you have to say yes afterwards, so if you want to troll, say “Xbox Pause Yes.” In fact, outside of the dashboard and a few of the Kinect-specific apps like Zune, Last.fm, and ESPN, there really aren’t a whole lot of voice command areas.
I guess that’s a good enough segue into the apps and games themselves. The first thing you should know is that everything requires an update - that’s not hyperbole, literally everything seems to require a 50 MB update. That’s all the Kinect-specific applications like Zune, Last.fm, videoKinect, and ESPN. Games also all require updates, but they’re smaller. 50 MB is about average for all the other applications, however.
I realize it’s nit-picking to complain about updates, but the whole process would be much more bearable if it was one monolithic update at the beginning instead of the scatter-shot frustration of having to wait every time you try something new. It isn’t PS3 level, where you literally need another console or distraction to occupy yourself with while you wait for device firmware, then game updates to apply, but I’d be lying if I didn’t think about how eerily similar the situation is.
So how are the apps? In general there’s nothing to complain about, they just work. ESPN isn’t exactly my cup of tea, but it’s interesting since the video-scrub and select gesture lives in here. If your ISP is a compatible partner, you get access to a variety of live games, event highlights, and other videos. I’m on Cox, note that there’s ISP branding in the top right below the ESPN symbol showing that my ISP qualifies me to use this feature.
It’s actually pretty cool you can pull up a list of live events and stream them whenever you want. They’ll continue streaming for a short time inside the large screen until you scrub to other videos.
Inside of an actual video, the controls are a bit interesting. Moving your hand up to the top brings up a video scrubber bar, where you can control fast forward and rewind speed by moving your hand left or right. Moving your hand down then selects the current frame and starts playing. There’s not really any buffering, you just start with a low quality connection and gradually scale up. HD isn’t bad, but there still are some compression artifacts and blocking with fast motion, but it’s close to Netflix quality.
Interestingly enough, when watching live events one of my immediate curiosities was what happens during commercial breaks. Sure enough, a few minutes in and I found out:
Though that’s boring, the fact that ESPN is willing to risk it all and offer streaming live events to Xbox Live subscribers is pretty awesome, and I can’t complain about the Kinect interaction. The only awkward part of this experience is that the hand gestures are best suited to you standing in front of the display - as with all Kinect interactions. It’s a reasonable expectation that you’d want to watch a game or video sitting down, and although you can select things with your hands whilst sitting, it just doesn’t work as well.
The Kinect Zune client is probably the most barebones of the notable preinstalled Kinect applications. Fire it up, and you get a screen with the usual suspects in the same Kinect style - a grid of large tiles.
What I find a bit strange here is that there really aren’t many audio commands cues beyond suggesting a movie. In fact, the problem really rises from the fact that Kinect doesn’t understand words, names, and titles that aren’t in its voice recognition corpus.
ESPN gets around this by cleverly having you vocalize which number video you want played - video one through six are your options. You can’t just speak the name of the game or the title. Similarly, you can’t search.
The Kinect-tailored Last.fm experience is similarly laid out. Six tiles you can hover over and make selections with the same way the rest of the Kinect interface Microsoft has put together works.
What’s interesting about both the Last.fm and Zune applications is that hovering over the large back arrow takes you back into the controller-land versions of these applications. You’re transported essentially right back into the vanilla experience that existed before Kinect - this is just another way of getting there. Waving your hands around inside those experiences brings you into the Kinect-ified versions of those programs.
The big question mark in my mind is where the Kinect version of Netflix is. Netflix on Xbox 360 is admittedly rich enough to get by without needing a Kinect environment, but it’d be nice to see Netflix given the Kinect treatment the same way the rest of the core Xbox services have been.
Last but not least is video Kinect. It’s awesome to see Kinect leverage the color camera for videoconferencing, if Microsoft hadn’t included something like this, they’d be missing a huge opportunity. It’s simple too, just launch the application, and you get a view of yourself which is cropped and panned to stay centered on your face, at the left are friends online. Your video stream will stay cropped around your face whenever you’re in the field of view.
At first, you’re given the option to video chat with other Xbox Live friends, but you can also sign in with a Live Messenger account and video chat that way as well. I initially set out to try video Kinect just between two Kinects. If you can find a friend online, inviting them to a kinect video chat looks just like an Xbox Live party or game invite - video Kinect presents itself just like a game you want your friend to join.
I dialed up my friend Brayden, and we were almost successful initially. The problem was that he had no voice. In the bottom of his video window, the speaker was greyed out, and it said Audio Off. This was confusing as there’s no readily apparent way to mute or unmute audio inside of the video Kinect interface. The problem - that party audio chat setting I mentioned earlier.
If you set this to off, you won’t get audio here, and you’ll inevitably spend lots of time scratching your head as to why. It’s confusing too - I don’t want to use the Kinect for party chat, I want to use my wireless headset. At the same time, I want to use Kinect for audio when video chatting.
Regardless, after we got it working the experience was pretty seamless. Video Kinect uses about 600 kilobits/s of bandwidth both ways, which isn’t a lot. There are come compression artifacts in the remote client’s video, but nothing out of the ordinary. My biggest complaint about video chat is that the color camera in Kinect really doesn’t seem impressive here.
There are really only two complaints I have - first is noise and low light sensitivity. If you don’t have lots of ambient light, the camera will expose and integrate for much longer, and tends to smear a lot more than I’ve seen on other cameras unless you have room lights cranked way up. I’d rather get noise from huge gain than become a smear when all the lights aren’t on. The other problem is that the stream itself isn’t very high resolution - it’s just VGA. While the camera sensor might be higher (as has been suggested by developers working with the platform), the Kinect will only expose a VGA stream.
The result is that video is noticeably upscaled. It’s probably the reason you can’t bring the conversation full screen. I guess that’s the other complaint I have - it’d be nice to be able to go full screen with the other party instead of have two equally sized boxes for video.
Between two Kinects, video chat works fine. You can optionally pause your video stream or turn auto zoom and crop off.
So what about between Kinect and Live Messenger clients? This was a bit more frustrating. The first time we tried logging into Live Messenger on the Xbox, we couldn’t see the other party on the desktop. After some troubleshooting and confusion, we decided to powercycle everything and login again, at which point we could finally see each other. Firing up a video chat like one normally would with a desktop worked fine.
Between a desktop and the Kinect, you can really see how the video stream isn’t of the highest quality. It’s tolerable, but a bit disappointing. I was chatting from a 720P webcam, and my friend noted my video quality was much improved on his end compared to the Kinect video.
The other interesting note is that when he paused video, I saw nothing but grey. One more pause and unpause, and I was stuck at a grey screen until we terminated chat and started over again. It works, however, minus those small glitches.
I should note also that the Kinect audio quality is actually amazing, no doubt in part to that 4-microphone array and some spatial processing. It really does a great job singling out a single person and gain was kept at a comfortable level the entire time. I have to say I’m impressed with how clear audio was - there was no feedback, echo, or strange artifacts. The one thing I didn’t test was how video chat functioned between Kinect and the older Xbox Live Vision webcam, though I hear it does work and is supported.
Video chat is becoming the rage once more, the problem is that each video chat platform is isolated to such a small sect of protocols. We’ve got FaceTime and iChat if you live in the Apple ecosystem, Qik video and a few smaller ones if you live in Android land, PS3 PlayStation Eye video chat, and now Kinect Video and Live Messenger if you live in this ecosystem. I’m reminded of how SMS used to work before carriers decided that there should be inter-carrier exchanges. I’m sure we’ll get there someday, but for video chat to be more than a quick novelty, it needs to work on a common, simple platform.
It’s hard to really complain about Kinect video chat - it’s there if you want to use it. There are better commercial alternatives that are designed specifically for this purpose, and Kinect won’t replace them, but it does make the occasional video chat possible. I find myself wanting much more resolution however, and the ability to maybe leverage video party chat with more than one other person, or even have it actually integrated into games. No doubt in time we’ll see more of that.
So, now that we’ve talked about just about everything you can do with Kinect except play games, how does it actually fare as an input paradigm for console gaming? Turns out that it isn’t half bad, in fact, on the whole the Kinect launch titles are actually pretty impressive. We’ve been playing with the launch titles for a while now and are ready to talk about impressions.
It probably makes sense to start with what I consider the most impressive Kinect title, which is far and away Dance Central. The rest of the games are entertaining as well, but something about Dance Central gives has that magical ability to dump you two hours after you started and make it feel like 15 minutes. Oh, and leave you physically sore and exhausted as well.
Dance Central isn’t a first party Microsoft Kinect title, rather it’s developed by Harmonix who unsurprisingly launched Guitar Hero and Rock Band. I’ve never been a big fan of either of those titles, but something about Dance Central appeals. First of all, the menus in Dance Central are actually pretty notable - it’s a different (and in my estimation) better navigation schema than what I’ve seen in the other Kinect titles and dashboard, and it’s shockingly simple. My girlfriend's impression was that the Dance Central menus had a striking similarity to the omnipresent arcade title Dance Dance Revolution.
You hold your hand out and angle it up to scroll up, down to scroll down, and swipe left to select. Swiping right with your left hand goes back. That’s really all there is to it, and it works so well I wish the main Xbox dashboard leveraged these gestures somehow. Supposedly Harmonix invested a lot of time into doing something different with their menu navigation scheme, and it really did pay off here.
There’s a selection of 32 titles that come with Dance Central, and a few more that you can buy for 240 Microsoft Points (which works out to $3) from the Xbox marketplace. The titles seem to be reasonably varied, ranging from some disco hits to Basement Jaxx and Snoop Dogg. I was surprised to actually find more than a few titles I was familiar with.
After you select a song, you can do a few different things - learn the dance, challenge a second person, use it as an exercise and track calories burned, or just dance it. The learning interface inside Dance Central itself is shockingly intuitive. Watch the avatar do a dance move from part of the dance, and then try your best to emulate it. Parts of the dance you’re doing wrong will be highlighted in red on the avatar on the appropriate part of your body. Fail to move your arm right, and it’ll show up in red. A circle under the avatar glows different colors depending on how close to emulating the dance move you come.
If you already know the move, you’ll get a perfect score and move to the next one. Most likely (unless you’re already some sort of dance wizard), you’ll get it wrong a few times. Three correct emulations moves you to the next move, and a few successive failures results in the move being skipped. What’s super useful, however, is the ability to “break it down” in slow motion. Swiping right with your left hand instantly slows the move down, and makes the commentator vocalize exactly the moves you should be making with the beat.
At the very end, you have to perform the song with the appropriate moves repeated and spliced in where they belong. Do well, and you’ll get transported to some kind of nightclub with cheering fans and flashing lights. Fail too many dance moves too hard, and you’ll be stuck on a boardwalk or the lunchroom. The commentator voice doesn’t really pull punches either - if you mess up or score low, you’re going to know about it.
On the whole, Dance Central is surprisingly entertaining and polished. Not only is the menu user interface and gesture choice extremely well done, but move recognition in dances themselves are very good. I only ran into problems with one particular dance move - the jazz square. This particular move requires moving your feet in a square, and although it doesn’t look particularly hard, Dance Central refused to recognize me doing it. I attributed this originally to a particular pair of cargo shorts (only this pair of shorts gave me problems), but ran into it later again at another location with different clothing. I had my family try the move and they too experienced some problems.
Certain moves are more picky than others, and most of the time they were a consequence of me not exaggerating my movements enough to really emulate the whole dance. But there are a few moves - particularly ones that involve specific depth-sensitive feet movements - that are a bit finicky. Again, having good depth contrast and making sure you’re in the field of view of the sensor is critical, and you’re luckily provided a small window with the depth image while playing so you can stay inside the optimal region.
The title that ships with Kinect really has to be awesome, and luckily Kinect Adventures is pretty much exactly what I’d expect it to be. It’s sort of a bundle of simple games that lend themselves to full body control, and the whole collection is packaged up almost like a tour of various minigames.
Most of the minigames have already been shown off, but there are a bunch more to talk about. Blocking and bouncing kickballs in a virtual tunnel of sorts, an obstacle course with full body movements, water rafting, and another which involves plugging leaks in a virtual aquarium. They’re bundled together either in an adventure mode or through free play, and can have single player or two-player support.
If Kinect Adventures is analogous to Wii Sports, (in that it ships with the platform and establishes the baseline level of expectations for immersion), then the bar is set pretty high. Adventures leverages the depth sensor and full body tracking quite well, and all of the minigames require a lot of movement.
Adventures also leverages the color camera and takes pictures of you while you’re playing. Most of the time the photo events are marked with a camera coming up on screen and are carefully placed to coincide with some jump or large movement, and the results are generally pretty hilarious. Again here the color camera seems a bit noisier and lower resolution than what I’d consider ideal, and if you’re playing in low light you can turn into a smear if Kinect tries to take a photo of you while jumping. Provide lots of ambient light, however, and the results are pretty good. I’m a bit puzzled by why they seem to be smashed into the wrong aspect ratio when played back at the end of the game (sure, they’re supposed to look like Polaroid photos, but aspect-incorrect scaling is annoying), but the photos themselves are fine. You can then upload these to kinectshare.com and from there download directly or post to Facebook if you want to embarrass yourself. There isn’t any auto-upload functionality (thankfully) so you don’t have to really worry about photos of yourself playing in nothing but underwear uploading automatically.
In fact, basically all the media that’s recorded on Kinect (there are also videos recorded in several other applications, and videos encoded from motion capture) is cleared through kinectshare.com. I guess while we’re on the subject of motion-capture and media that’s uploaded we can talk about Kinect Adventures’ trophy concept.
Complete enough adventures, and you’ll get to record a trophy. This is essentially full body motion capture and audio, which some avatar is then set to. There’s a hamster looking creature for single player, and a shark for 2 player - I’m sure there are more beyond there that I haven’t gotten to as well. The concept and result are actually remarkably polished. You can then upload the resulting video to Kinect Share where it remains (like all media) for 14 days for you to download.
If there’s one area that Kinect really shines above the Wii, it’s Kinect Sports. At first I was expecting a relatively token knockoff of Wii Sports so that Kinect can have its own motion-dominated sports title, but I was happily surprised that wasn’t the case.
The Kinect Sports game layout itself is similar to Adventures - you can play by yourself, or with friends, and superficially you can just play single games instead of taking a tour through everything.
The problem with Wii Sports was always that most of the activities relied on just a few different accelerometer inputs. The same accelerometer input as swinging your arm in a complete arc for Wii bowling could be emulated with the flick of a wrist. As soon as you figured out you could create a much larger magnitude acceleration vector by flicking your wrist, all of the immersion was completely destroyed in the game. For me at least, the remainder of Wii titles involved a similar - search for the optimal flick move that emulated inputs, and repeat - type learning curve. With Kinect, you can’t cheat that way, and you can’t lie to the sensor. Move around, flail your arms, or you’re going to get destroyed.
I think Kinect Sports is probably the best example of how full body motion capture can finally be mapped 1:1 and leveraged in such a unique way. The games all work nearly flawlessly, and playing them I’m reminded of what I wanted Wii Sports to be like.
Track and field, boxing, and table tennis are probably my three favorite titles from the sports catalog. Track and field is set in a stadium which looks curiously similar to the Beijing Bird’s Nest (seriously) and involves olympic events such as javelin and discus toss, long jump, hurdles, and sprinting. I’m impressed with just how immersive these games are - when sprinting, wave your hands into another runner’s lane and you’ll solicit angry responses. Finish a sprint, and the crowd will cheer if you wave your hands above your head, or go silent if you drop them to your side. It’s the little things like these that really make Kinect engaging. Finish the whole event, and you get a video set to music of highlights taken on the color sensor. It’s surprisingly well put together.
You can set world records inside Kinect Sports, though they’re only locally-set records - unfortunately they aren’t synced up to the cloud in a *real* world record mode, but the music and animations when you set records is just perfect. I really feel like the game designers got everything perfect here.
The gestures and motions inside Kinect Sports are also nearly flawless. I’ve yet to have a motion misinterpreted, and there’s a surprising amount of technique and dynamic range of responses possible.
Where I was really blown away is - of all things - table tennis. I’ve enjoyed playing table tennis in real life, and though I enjoyed the Wii versions, there was always something missing. I’d either miss returns or serves, and again that seemed purely a function of whether the accelerometer data I was giving was what Wii wanted. The Kinect version of Sports seemed much better and way improved in comparison, never missing when it wasn’t my fault.
But what I found extremely interesting was how integration changed when playing over Xbox Live. Matchmaking took a while, no doubt because of how close to launch it was when I tried it, but after a short wait, I was paired with another two players in a doubles match (I was also playing with a second person on my side, hence the doubles).
What was immerse beyond imagine was how I found myself gesticulating whenever we scored a point, becoming increasingly expressive with each point. Other players see your avatar at the table - they can’t look away - and the result is a whole new world of motion-enabled trash talking. Win an entire game, and the other party is forced to watch you for an opportune five or so seconds where you can literally move your body just about any expressive manner. It’s a brave new world of motion augmented trash talk, and one thing’s for sure - I’m glad Kinect doesn’t have the resolution to pick out fingers.
Kinectimals isn’t really a title marketed at me, but no doubt will sell Kinect to the younger crowd. The frustration in Kinectimals is really only one thing - there’s a big long cutscene at the beginning you can’t skip, and the disadvantage to not having a controller is that you simply lack controls to mash and skip ahead with. Even after the long cutscene, there's a monologue from about all the backstory. I guess I’m wondering how a child is going to sit through all that if even I found myself wishing I could reach out and hit skip.
There are driving minigames which employ the same two-fists-out steering techniques that Kinect Joy Ride does, and other minigames with underhand and overhand tosses which punctuate long periods of playing with your chosen animal. Actually interacting with your pet is unnervingly well done, but it seems like that portion of the game is frequently punctuated by events and circumstances that yank you away to do other things.
Kinectimals also has the only instance (at least that I’ve seen) of object-identification and profiling. Back in the E3 videos for Kinect, I remember someone scanning an object, then using it inside the game - unfortunately that’s all but removed in the Kinect being sold now, but there’s a tiny bit of it left in Kinectimals. When you select an animal, you’re given the opportunity to associate it with a real world object - a trinket, a picture, something within arm’s reach.
The first thing I grabbed was one of my coasters (which I made from some old 10 cm wafer masks), which basically look like square mirrors or glass depending on what layer they were taken from (solder mask, e.t.c.).
Admittedly, that isn’t exactly the fairest of objects to try with this kind of recognition, but it was honestly what was on my coffee table within arm's length. That didn’t go down so well, so I grabbed something else, which did work. You can then hold this object out and select the animal associated with it.
Kinect Joy Ride
Kinect’s driving game is probably the least favorite title of mine in the Kinect launch roundup. When I first saw the Joy Ride demo videos online, I knew that the title would have to emply a significant amount of assists to actually get you around corners. For one, gas and brakes are completely out of the question, and Joy Ride is at least forthcoming about how the go fast and slow down bits of the game are totally handled for you. However, the game employs a substantial amount of auto-steering to get you around the track - your input essentially trims out the driving direction a bit more.
The steering gesture involves keeping two fists out in front of you, and turning an imaginary wheel in midair. You can lean and pull your hands back to charge a boost, and push forwards to temporarily go faster, but most of the actions that will determine your performance in the race are how well you drive over acceleration strips and do tricks midair from leaning side to side.
Lag and Conclusions
So what about lag on Kinect? It’s definitely there, but it isn’t nearly as big of a problem as it’s been chalked up to be. I decided to test how much that lag is by taking an extremely self-deprecating video of myself flailing my arms about in front of Kinect, and inspect the video to see how much delay there is between me sweeping my arm up, and the TV reflecting that change. I just used the Kinect tuner since it’s full screen and does body tracking, and later counted how many frames it took after my hand reached the top of an arc for the image to also reflect the change.
At the end of the day, I measured between 8-10 frames of input lag, which at 29.96 FPS works out to 267 ms of input lag. Of course, that number also includes my Onkyo TX-SR608 A/V receiver, which (even in game mode) adds a substantial and perceptible amount of latency to the whole display chain. For the caliber of games currently rolled out which support Kinect, lag honestly isn’t that big of a deal. I found it definitely noticeable in the Kinect Adventures obstacle course, and somewhat noticeable when playing Kinect Sports and running hurdles, but everywhere else, while noticeable, it isn’t a game-killer. Don’t get me wrong, 267 ms is seriously laggy, but right now it doesn’t matter too much. Maybe when we get FPS titles that’ll change.
I think it’s fair to say that Kinect thoroughly rains on the Wii’s parade, and enjoys a substantial lead over Sony Move if nothing else entirely due to lower out-the-door cost. One of the best parts of Kinect is that you really do only need the sensor to play games - there’s no sets of controllers, camera, or kit to purchase. If you've got a room that's large enough, Kinect is perfect. On the other hand, there's no possible way that Kinect would ever work in the average dorm room - you really do need 9' - 12' behind the TV to play with two people.
The rest of what Kinect does is really just mitigate a lot of the motion-cheating I felt was possible with the Wii, some of which is still possible with Move by holding the wand close to the sensor. Adding real depth detection and forcing players to actually move around has done a lot more to make me move instead of wrist-flick than any of the other motion-augmented console addons did.
Does Kinect breathe enough life into the Xbox 360 to make it last another few years? I suppose, but only for as long as Kinect titles can deliver new and more interesting gestures, immersion, and interaction events. For now, however, I’m having enough fun motion trash-talking people in Kinect Sports to keep me entertained for at least until the next major console blockbuster title.