So can anyone use the Kinect for commercial applications now, or is this just about Microsoft helping 10 startups? If it's the latter, it's barely newsworthy, and it shows how hard it is for Microsoft to imagine losing even a bit of control on their products and what you do with them.
The licensing situation with the various SDKs used for the Kinect is complicated. This effectively prevents many kind of commercial applications at the moment. (I've done a commercial UI prototype using the Kinect, so I've researched this a bit.)
There are two high-level SDKs that you can use to build Kinect applications relatively easily. The first is OpenNI+NITE by PrimeSense, the company that designed the sensor used in the Kinect. The other is Microsoft's own Kinect SDK which is based on C#.
Both of these have very restrictive licenses. If you're not a hobbyist or an academic, there's no clear path to acquiring a license that would allow you to deploy applications. I imagine PrimeSense is happy to license their toolkit, but I have no idea about the price -- I'd imagine it's not cheap. The Microsoft SDK is going to be available for commercial licensing next year, but the terms are not known yet (AFAIK).
There is a third option for building Kinect applications, and this is the one I chose for my software. Instead of using the high-level SDKs, you can access the raw depth camera data from the sensor using a free driver. The data is quite clean because the Kinect does a lot of processing internally, so all you need to do is build the high-level interface -- i.e. interpreting the depth data to detect people and gestures, or whatever it is that your app needs to do. This way, the resulting app is not tied to the PrimeSense or Microsoft SDKs.
FWIW, the Kinect SDK includes both C++ and C#/VB.NET versions, not just C#.
Regarding the the upcoming commercial release, the FAQ implies that those commercial restrictions will be lifted, but of couse, the exact details aren't yet known: "Under the terms of the [current] SDK license, remember you cannot receive payment for your application, use your application for advertising, use your application to solicit donations, or use your application in your internal business operations. The commercial release, which is coming in early 2012, will remove these constraints."
I spoke with PrimeSense a few months ago and they won't sell the sensor chips unless you can guarantee to move hundreds of thousands or millions of units.
PrimeSense doesn't make devices, they just build the sensor and SDK.
Asus has licensed the sensor for a Kinect-like product for PCs, but it's still not available off-the-shelf. It also won't work with the existing free Kinect drivers, so porting will be needed to switch to the Asus.
Sure, if the budget is big enough, it makes sense to go to PrimeSense directly rather than messing around with Kinects.
Unfortunately the project I made was nothing like that. The client was a major Finnish multinational tech company (one that doesn't make phones), but their R&D is so backwards and completely oriented towards incremental improvements that this project had to be sneaked in through the marketing budget, using whatever crumbs of money they had left.
There are actually companies in Germany using Kinects for controlling in the production industry, As far as I know they use exactly the low-level driver (libfreenect).
and it shows how hard it is for Microsoft to imagine losing even a bit of control on their products and what you do with them.
Historically, Microsoft trying to exert control over how exactly their product is used has not been a problem anyone has had.
If anything, one could argue that their product's quality (perceived and real) has been diminished by giving 3rd party developers too much freedom to run amok.
Well, historically speaking, I can't think of a product that has been quite so community-driven as the Kinect. I don't think anything has "escaped their grasp" in quite the same way.
To give a counter-example, while it's been a while since I was involved in Windows development, I do remember an instance where a developer released a unit testing plugin that happened to be compatible with VS Express, and was forced to withdraw it at Microsoft's insistence. There was no quality issue there (not that I recall, anyway), it was purely a market segmentation and control ploy to ensure that anyone who wanted "serious" functionality had to buy VS Pro.
I think the license now allows commercial development. Additionally Microsoft is announcing that they have partnered with TechStars giving your commercial kinect idea an optional funding route. The six percent goes to TechStars. Not Microsoft.
Someone should train a neural network to be able to do what the kinect does without the kinect (create 3d depth map from 2d images). Obviously microsoft won't do it since then their hardware won't be needed, but with all the millions of kinects out there collecting enough data for this shouldn;t be that difficult.
Pose estimation from just the 3D depth data took 24 hours on 1000-node cluster.
I can't even begin to imagine the massive dataset + computing resources needed to pull this off.
Also, (I might be completely off here since I am not competent enough with the ML) but to take a webcam image and then append the Z coord to a pixel would require someone to sit and label their dataset before it is useful to train the classifier (in which btw, the Z-coord can take anything between 80 - 300 cm - so > 200 classes). This definitely does not look so trivial to me.
The pose estimation you're referring to is not the same a raw depth map. Also, when creating that they had limited data (they used actors and some graphics techniques). For the just the raw depth there is now practically unlimited data available (millions of kinect boxes everywhere). No hand labelling of data needs to be done, it's what the kinect outputs 30 times or more per second.
Oh no. my 2nd paragraph was about the raw depth data. From what I see, the web-cam view doesn't seem to contribute to the depth data (firing up the streams and viewing them shows different perspectives and there isn't any way to query how far a pixel in the web-cam view is from the camera).
I remember reading somewhere that the depth sensor projects something (infra-red?) on a surface and then the transformation of the image is used to build the 3D model. So it might still need some specialized hardware (not necessarily MS hardware since Asus seems to have a similar device out in the market).
EDIT : NVM I just looked up the video of Andrew Ng's class where they discussed this. So it is possible.
When I mean easy, I mean in the sense that a community of people can create it in an open source way. I doesn't mean that i personally can whip it up on my laptop in a few minutes.
I think the problem you'll face is that there are lots of ambiguous inputs: the same pixels could be an image of something small close up or something large further off, for example. That's likely to confound the training process.
The kinect dodges this problem by actively gathering extra info. It projects a rapidly changing pattern of infrared light out in front of it; based on the pattern it detects it can figure out how long the light took to reach a reflective surface at each pixel and infer the depth from that.
Of course, it will be less accurate than the kinect itself (but spreading software is cheaper than hardware so it's worth it.). One of andrew ng's students trained this knd of system (3d from images trained using 3d data) a few years ago and was accurate to automatically drive a toy car. That was using just a limited amount of data. Using data from thousands of kinect users will better results than that. It's all a matter of data, the more data you have, the more accurate you can be.
The difference between something in a lab and a commercial product -- or even something that is moderately usable in uncontrolled conditions -- is often vastly underestimated. The effort to make something real is probably several orders of magnitude harder than what's required for "I can hack something up in a couple of weeks."
I believe it has more to do with how much data you can get your hands on. With the kinect - widely available depth data, and cloud computing, I believe a lot of things that seemed difficult are now possible. Open source movement needs to start shifting to crowd sourced data and volunteered compute time movement.
If you like Python, our group just released PyKinect which enables you write Kinect games/apps using Python as the name suggests (note: CPython, not IronPython). You can check it out at http://pytools.codeplex.com and click on the PyKinect link. (disclaimer - msft guy)
It's a wrapper around the Kinect SDK, so not in its current form. But being OSS & Python, anyone can take it & massage it into something similar for Linux/Mac Kinect libraries.
The Kinect can impossibly be so accurate that it can play violin and such things .. right? It looks ridiculous. The doctor-things looked as good use though.
Even if the Kinect could be accurate enough to allow someone to play an 'air violin,' there's the lack of tactile feedback that one would have to adjust to. Even moreso than the transition from buttons to a touchscreen, as there isn't even a device at all.
It cannot be accurate enough to allow someone who knows how to play a violin to correctly play a simulated one. It could allow someone who doesn't pretend to play one.
Completely off topic question: the background song of the video sounds horribly familiar, but I can't put my finger on it. Anyone care to give me pointers?