Kinect - The Sensor

The sensor itself matches the style and appearance of - you guessed it - the new Xbox 360 S. That means its every surface is glossy black plastic which looks nice, but immediately shows fingerprints, dust, and scratches. If you’re OCD like me and don’t happen to live in a clean room, this is a constant but minor annoyance. Oh well, at least it matches the console.

The Kinect is noticeably bottom-heavy, with a center of gravity closer to the bottom and somewhere in the base most likely. This makes sense since the horizontal arm housing the optical system and microphone array pivots vertically - the base is a stage of sorts which lets the top arm sweep through around 30 degrees to adapt itself to placement on top of a TV or below on a table, basically to suit your entertainment setup.

 

There’s a three-axis MEMS accelerometer onboard ostensibly for determining the tilt of the sensor relative to the base of the device - if you put Kinect on top of a TV for example, there’s no assurance that the surface the Kinect is resting on is normal to the ground. 

Down on the bottom you’ll notice an obvious grating, under which hides a four-microphone array for some beamforming goodness. The result is that the Kinect can sense which direction sounds are coming from, for isolating a single speaker rather than an omnidirectional or even stereo microphone. That's critically important for also picking out voice commands while there's music and game noise blaring in the same room.

The most prominent physical feature on the Kinect, however, is its relatively sophisticated optical system. From left to right is an IR laser projector (more on this in a second), RGB camera, and IR camera. Between the RGB camera and IR laser is simply a green LED for status - it plays no role in the actual optical system. Also on the bottom of the sensor is a nondescript class-1 laser product notice - yes, the Kinect indeed uses an IR laser, but it’s completely eye safe at class-1.

There’s also a requisite laser warning buried in one of the three booklets that ship inside the Kinect sensor box, but you’ll notice that Microsoft has been careful to avoid drawing attention to the fact that there’s a laser in Kinect - people have a strange aversion to looking at or into perfectly eye-safe lasers. There’s no marked wavelength that I could find, but the Kinect’s IR projector is visible to my naked eye and exhibits trademark laser speckle - my guesses are that the laser is between 750 and 900 nm, with 808 and 880 nm being common commercial “IR” laser diode wavelengths. 

From the Xbox’s perspective, there are two separate video streams which come from the Kinect - one 640x480 (VGA) 30FPS stream from the RGB color camera, and one 640x480 11-bit 30FPS image stream which is the output 3D depth image after processing. Kinect implements a subset of Prime Sense’s natural interaction reference design, which originally specified a much higher resolution color sensor and 60FPS depth image. Other subtle differences are that Prime Sense specifies two audio streams, whereas Kinect uses an obvious four, but such tweaks are ultimately the result of Microsoft having to maintain a delicate balance between optical performance and staying within a reasonable price point. 

Original Prime Sense reference design specification

How Kinect senses depth is half of the magic behind how it works - the other half is software. There’s actually not a lot behind how Kinect creates that 11-bit depth image once you understand how it works. Kinect uses a structured light IR projector and sensor system - something widely used in both industrial manufacturing and inspection. The principle of structured light sensing is that given a specific angle between emitter and sensor, depth can be recovered from simple triangulation. Expand this to a predictable structure, and the corresponding image shift directly relates to depth. 

Example structured-light system optical system, from  Spacecraft hazard avoidance utilizing structured light

In Kinect, that system is comprised of an IR laser (which I’ve already touched on), a carefully engineered diffraction grating (in this case, the diffraction grating is actually a computer-generated hologram - CGH - with a specific periodic structure), and a relatively standard CMOS detector with a band-pass filter centered at the IR laser wavelength. Inspecting the sensor with the naked eye, you can easily see the characteristic rainbow-effect from the CGH atop the IR projector, and that shiny layer on the IR-sensitive CMOS is likely a band-pass filter.

That carefully-engineered CGH produces a specific periodic structure of IR light when the laser shines through it. There’s a computationally-derived periodic structure of undoubtedly square cells inside which diffract light into a periodic structure. The first day I had Kinect, I immediately set out to find what that pattern was, and it’s simple to measure. Although the laser itself is intense enough to see with the naked eye at the source, the projected pattern thankfully isn’t, but the solution is trivial. Stick a lambertian reflector (read: piece of paper) in front of Kinect, and you can see the pattern with any IR-sensitive device. Though my DSLR has a low-pass IR cut filter like most cameras (since IR is generally undesirable in visible imaging systems), the most immediate device I found which was sensitive enough to photograph the pattern was a smartphone camera - an iPhone 4. The image is obviously false-color - this radiation is actually in the far red/near IR part of the spectrum. That pattern is below:

You can immediately notice some things - first, there are 9 clearly visible repeating blocks with a specific semi-random pattern of points inside. This structure is repeated across the blocks, and it’s obvious this pattern arises from a holographic structure from the bright 0-order point at the center of each block. Note also how the structure is also engineered to be spherical, which gives it that curvy edge shape - the paper is actually being held perpendicular to the projector. This projection grid defines the field of view of the Kinect sensor, and the distance between points inside the grid likewise defines the spatial resolution of the depth sensor. I’m only holding the paper about a foot away from the projector. Close up inside one of those cells you can see the structure which consists of many small points:

The projected image doesn’t change in time - it’s fixed this way. The IR CMOS sensor images this pattern projected onto the room and scene, and given the camera’s displacement a few inches from the projector, from the displacements in the semi-random projected pattern is able to back out the corresponding depth image. That compuation is done onboard the Kinect itself, and it’s entirely possible (read: likely) that the IR sensor inside the Kinect is higher than the 640x480 resolution of the resulting image.

When iFixit tore apart the Kinect, it was immediately apparent that Microsoft had devoted a lot of engineering into the sensor’s cooling solution, which at first thought seems strange - why do two cameras and a bunch of microphones need fans? Cue RROD jokes. Further, in the disassembly photos, I noticed a peltier cooler on the back of the IR laser diode. 

The peltier cooler is what's being pried off - Courtesy iFixit

The reason for the cooler should now be obvious - diffraction gratings are extremely wavelength-dependent, and the Kinect functions or doesn’t based on its ability to properly detect the projected IR image. The result is that the IR diode likely needs to be kept inside a window of under 10 degrees C of some temperature so the laser’s peak output is at or very close to the wavelength the CGH was designed to work at. The other consideration is that the top of most TVs where you could conceivably place a Kinect can be notably warm. Kinect’s somewhat overengineered thermal design now makes sense - the peltier cooler likely gets hot on the back side which connects to the metal base plate, the front which touches the diode laser gets cool, and the fan sweeps air through the whole box when required. RROD jokes aside, making sure that the system is carefully thermally regulated is an important part of the optical design. 

I decided I was going to see if I could make the Kinect fans turn on, or the device overheat. I left the Kinect sensor turned on for 48+ hours atop my rather-warm LCD TV, and later an even hotter plasma TV and could never once feel it get noticeably warm, or even detect airflow through the Kinect. It’s possible that the fans were spinning, but if that’s the case I couldn’t detect it. 

Introduction and Hardware Environmental Constraints of using Kinect
Comments Locked

72 Comments

View All Comments

  • Quidam67 - Friday, December 10, 2010 - link

    I agree with this. The distance is just not realistic for most lounge set ups. I could go minimalist and ditch the sofa, and you know, just sit on the floor, but really, that's asking a lot just so I can play Kinnect games.
  • Aloonatic - Friday, December 10, 2010 - link

    Unrealistic for most living rooms, so how on earth they expect this to fly in many kids bedrooms too, I have no idea. And how many kids have TVs taht would be big enough to be viewed that well from those sorts of distances too.

    Kinect seems like a great idea and tech that is perhaps just a little ahead of it's time, so unusable by many, even if they really really really* wanted to.

    * One would need to really really really really want to use Kinect to justify moving to a new house so that you might be able to :o)
  • Nataku - Monday, December 13, 2010 - link

    I've actually seen the toy in action at the mall and people were standing only 4~5' away and it seems to work ok... im getting the feeling that the bigger you are the further back you need to be and if your only a kid you can be much closer than an adult would be able to...

    i don't see how screen size is an issue though, they are demoing these things off of 27"~30" TV sets...
  • Patrick Wolf - Thursday, December 9, 2010 - link

    Kinect is going to be the new Wii, everyone will have one but no one will use it. Actually not everybody since not everyone can use it.
  • Quidam67 - Friday, December 10, 2010 - link

    Not that I want to come across all negative, but given how long ms have been working on this complex project (I assume as a means to stretch the 360's lifespan and to invade the Wii's market at the expense of snubbing their existing one) I have to say this is just a big non-event for me. Honestly, I wish they had put their resources into putting out an "evolutionary" upgrade.

    I mean, this idea that the next gen of console has to be based on completely new hardware, with incompatible development tools, so everyone is starting froom zero is a paradigm I challenge. Why couldn't they treat it like a PC upgrade? Release a new xbox 540 that is fully software compatible with all the old 360 games I own now (without resorting to buggy and expensive software emulation) but has at least twice the memory, perhaps an extra couple of cores, a more powerful gpu. eg true 1080p gaming support.

    Then they could start transitioning over to the new machine by releasing a game that will run on both machines, but will allow better graphic settings if you are running it on the new rig. I don't know, maybe I'm just bummed out that this gen of consoles is really starting to show its technological age, and I don't see how tacking on an impractical new control device prolongs the lifespan of such dated hardware. To say nothing of what this means for PC games, which are now largely driven by the console market.

    Disappointed
  • mcnabney - Friday, December 10, 2010 - link

    I thought the purpose of the console is to 100% compatibility for all owners with all games?

    What you are describing is more like a PC with incrementle improvements to the system from year to year.
  • Quidam67 - Friday, December 10, 2010 - link

    In a sense, yes, but the hardware is still far more controlled. It's not like you can buy a GPU and swap it out with the old one. I'm just suggesting a more evolutionary approach, and one that offers better compatibility with the technology that preceeded it.

    The game console industry has never worked that way, but I don't think that is in itself a reason why this is not a good idea. I know for a fact some high profile developers abandoned the console industry precicely because all their assets were rendered redundant every time a new round of consoles came out.

    It doesn't have to be that way.
  • dustcrusher - Friday, December 10, 2010 - link

    Almost every incremental console upgrade attempted thus far has been a huge failure. Atari 5200, Sega CD, Sega 32X- need I go on? Coleco had a couple of minor successes in the Expansion Module 1 and the ADAM but neither were money makers- in fact, the ADAM was one of the first consoles with cheap and easy piracy, so Coleco lost a ton on it.

    The cost in time and money would be better spent on the Xbox 720, or whatever the next system will be.

    And for a Springer-esque Final Thought, it's the fun that counts. The latest and greatest tech means nothing if the games aren't fun, and the majority of new games that tout bleeding edge graphics engines seem to be derivatives of the same tired formulas. Honestly, with a couple of exceptions I've gotten the most mileage from my 360 out of Live Arcade, because the games there focus on being fun first.
  • Quidam67 - Friday, December 10, 2010 - link

    With all due respect, those consoles are hardly comparable to the sort of market-share and brand recognition that that the Xbox 360 now enjoys.

    You say the time would be better spent developing the 720,which I assume entails the same as all the other new gen consoles, ie. no legitimate backwards compatibility, and an architecture designed to reduce manufacturing costs at the expense of requiring a whole new set of development tools -an extremely complex and expensive re-enineering task just to get you back to where you were before.

    I can only speak for myself, and yes maybe I do think differently from the masses, but if ms had launched a xbox 540 with say a Gears of War 3 enhanced version that ran in 1080p on the new console, I'd be all over it. The Kinnect, on the other hand is not something I'd want on my machine even if they offered to me for free. All it would do is gather dust.
  • gvaley - Friday, December 10, 2010 - link

    "...267 ms is seriously laggy, but right now it doesn’t matter too much. Maybe when we get FPS titles that’ll change."

    The way businesses work, I expect to see a ton of intentionally crippled AI in upcoming Kinect FPS games so you can have enough time to shoot the target.

    Not that this will be a one off. Every time something goes hip the technology bends back to cash in on it, pushing back progress with years in some cases. (Think of the iPhone/Android and the way smartphones are built today. For us people who were used to their high-end pre-smartphone era Sony Ericssons or Nokias, smartphones are a huge setback in terms of usability. [The volume rockers regulate ringer volume? Really? That's the dumbest idea ever. Not only it's not helpful, it's actually dangerous 'cause you can incidentally turn silent mode off and miss that important call.])

    Having said that, I'm eager on Kinect 2 in several years when the technology (and price) would allow for most kinks to be ironed out.

Log in

Don't have an account? Sign up now