Not quite a Yegge long.

Noob Adventures in Machine Vision, Part 1

Friday 14 November 2008 - Filed under Code

As with many new-ish machines, my current laptop has a USB webcam built into the lid.

I know what normal people do with that kind of thing – it involves either an IM client, or an upload to YouTube.

These things don’t really seem too exciting. Video calling still sucks, since my uplink isn’t quite wide enough to avoid the video breaking up. I’m also no longer 14, or angsty enough to need to "broadcast myself" whining about my doubtless-pathetic existence.

So, what else to do with a webcam? What about using it to control a game – the idea obviously works for the Playstation Eye-Toy people…

I’m going to build some recognition software that can figure out how high my arms are – that should be enough to detect flapping, airplane impersonations, etc.

The catch is that beyond the one AI paper I took (which briefly touched on various kinds of neural nets, suggested "you might use this kind of thing for vision", and basically convinced me that I never want to touch a neural net package again), I actually know nothing about machine vision. So I’m probably going about this all wrong, but if you know where you’re going, it’s not much of a hack.

First, we need to acquire some video data. There are various ways of doing this on Windows, from the extremely sucky to the almost sane.

I’m going to use one of the oldest – Video for Windows – because it’s simple. That and I found the docs for this first.

Warning: This is not the right way to build anything real. It doesn’t deal with window messages properly, since the VFW window has some odd WndProc that does nothing useful for us. But, it does *work*.

So, let’s get ourselves some video data, and put it on the screen in pseudo-realtime:

#include <windows.h>
#include <vfw.h>
#pragma comment( lib, "vfw32.lib" )

int _stdcall WinMain( HINSTANCE, HINSTANCE,
                      LPSTR, int )
    HWND hwnd = capCreateCaptureWindow(
        L"Video Capture Test",
        0, 0, 160, 120, 0, 0 );

    capDriverConnect( hwnd, 0 );
    capDlgVideoFormat( hwnd );

        capGrabFrameNoStop( hwnd );

    return 0;  // don’t actually get here

Run that, pick 160×120, 24bit from the dialog, and hit OK. You should get a tiny window in the corner of your screen, displaying frames from the webcam and making your CPU very hot. I did say this was the wrong way to do it, right? ;)

Now let’s attach a frame callback to this, so we can analyze each frame before it is displayed. In our callback, we’re going to look at column 30 of the bitmap, and calculate the luminance of each pixel. Again, this is not quite right, since the green component should be contributing the bulk of the luminance. However, I don’t care, I just want this working fast:

#define COL  30

LRESULT _stdcall FrameCallbackProc( HWND hwnd,
                                LPVIDEOHDR p )
    BYTE * q = p->lpData + COL * 3;

    for( int i = 0; i < 120; i++ )
        float lum = (q[0]+q[1]+q[2]) 
                    / 768.0f;

        q += 160 * 3;  // next line, same pixel

        // todo: actually *do* something

    return 1;

// in WinMain, before entering the loop:

    capSetCallbackOnFrame( hwnd, FrameCallbackProc );

Now you’ve got a working frame callback, which calculates fake luminance for each pixel in column 30, and then, uh, throws it away.

A better thing to do would be to find the darkest pixel in that column (which is going to be in my arm someplace), and then draw some kind of indicator over the video frame where it thinks my arm is:

// in the frame callback

float m = 1.0f; int j = 0;
for( int i = 0; i < 120; i++ )
    float lum = …
    q += 160 * 3;

    if (lum < m)
        { m = lum; j = i; }

BYTE red[] = { 0, 0, 255 };
#define WritePixel( x, y )\
    memcpy( p->lpData + ((y) * 160 * 3) \
        + (x) * 3, red, 3 )

// draw a big ugly red mark 10px * 3px
for( int i = max(0,j-5); i < min(120,j+5); i++ )
    for( int k = -1; k < 2; k++ )
        WritePixel( COL+k,i );

This works, but it jumps around too much, because the video is noisy. An even better thing to do would be to average the positions of all the pixels with a low enough luminance. That should consistently lock onto the middle of my arm:

#define MIN_LUM  0.2f

int samples = 0; int total = 0;
for( int i = 0; i < 120; i++ )
    float lum = …
    q += 160 * 3;

    if (lum < MIN_LUM)
        { samples++; total += i; }

int j = samples ? (total/samples) : 0;

// draw on frame as before…

Because I want to track *both* arms, I also did a similar thing for column 130 – and it works.

In the second part of this, I’ll include some pictures and show it working, integrated with a toy flight simulation.

2008-11-14  »  admin

Talkback x 3

  1. Phil
    14 November 2008 @ 3:49 pm

    Check out Hornetseye:

    A machine vision framework consisting of C++ and Ruby. The creator of Hornetseye did a great demo if it at OSCON ’08. As part of the demo he aimed his camera at the projection screen his presentation was projected on and he was able to manipulate objects on his laptop’s desktop just by pointing at the projection screen. Very impressive.

  2. Jake Voytko
    16 November 2008 @ 2:12 am

    A friend of mine did this exact project in college. His algorithm was divided into two phases: skin detection, and gesture recognition.

    He found a paper on arbitrary skin recognition, which gives you “Yes/No” values for each pixel’s skin-ness. Since the person is wearing a shirt, this gives you two arms and the head. He then compared the person to each gesture and took the most likely one.

    If you got your system working, good job! It’s important to try problems on your own.

  3. Brian Grey
    17 November 2008 @ 11:05 am

    My view is never to reinvent what already exists, so following that principal, you might want to take a look at Roborealm, – or spend some time on my machine vision blog and follow some of the other links I provide.

Share your thoughts

Re: Noob Adventures in Machine Vision, Part 1

Tags you can use (optional):
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>