Webcam object tracking theory
(By Aprone)

I'm going to make this short and sweet.  This is a basic concept to get you going, and from there you can expand upon the idea in any way you wish.

Start out with 4 variables, x, y, theda, and scale.  (rename them however makes sense to you)

For this example, make an array of 54 "points" that each have their own px, py, and pvalue variables.  These will branch out from the x and y as 3 rings of dots.  The image shows how the first 6 points equally space themselves out at a radius of 1/3 the scale variable's amount.  The next 18 are at 2/3 scale and the rest are at scale.  The theda variable rotates the points around the central x,y location (spins the whole thing).

If you're not sure how to spiral the points out, this code would work for the first 6 ponits.  The tempa and tempb represents the coordinates on the overall image where the dot would be based on the x,y center.

For z = 1 To 6
tempa = x + ((scale/3) * Cos(((360 / 6 * (z - 1)) + theda) * 3.141593 / 180))
tempb = y + ((scale/3) * Sin(((360 / 6 * (z - 1)) + theda) * 3.141593 / 180))
next z

Grab your first webcam frame.

Loop through each point and record the pixel color information into the pvalue of each point.  In the picture on the right, the points would be filled with a bunch of dark browns and gray from the briefcase's glare.

Grab the next webcam frame.

Since the stuff on camera isn't flying past at 100mph, you can assume it has only traveled SO far between frames.  (Adjust and improve all this later, remember this is just to get you up and running) Run a series of loops that will take your x, y, theda, and scale values through a reasonable amount of change.  Lets say you assume the object can't have traveled farther than 50 pixels in any direction.  We also assume it probably didn't rotate more than 20 degrees and could have only gone about 50% bigger or smaller in scale.  You might loop it as follows...

best-score=9999
for x = x-50 to x+50
for y = y-50 to y+50
for theda = theda-20 to theda + 20 step 2
for scale = scale - .5 to scale + .5 step .1
score = 0
"do stuff"
next scale
next theda
next y
next x

Now for the "do stuff" part inside the loop...

We need 6 variables.  Score, best-score, best-x, best-y, best-theda, best-scale.

Position, rotate, and scale your points using the variables you are looping.  Gather color data from the pixels the points hover over.  Rather than storing it in the pvalue variables, we will use this to calculate an overall score.  For each point, we will add to the score depending on how far off the current color is compared to the one stored in the pvalue.

If you don't know how to split up the RGB colors, here is how using only math and no language specific code (to my knowledge):

temp = pixel color(x,y)
bb = Int((temp / 256) / 256)
gg = Int(temp / 256) - (256 * bb)
tempb2 = (bb * 256) * 256
tempg = ((gg * 256) + (256 * bb)) - (256 * bb)
rr = (temp - (tempb2 + tempg))
i = 0.2989 * rr + 0.587 * gg + 0.114 * bb

At this point, rr = red 0-255, gg = green 0-255, b = blue 0-255, and i = black and white data 0-255.

Use whatever method for scoring you feel is best, but to get you started it could be as simple as this (assume originalXX is the color stored in pvalue):

score = score + abs(rr-originalrr) 'difference in red
score = score + abs(gg-originalgg) 'difference in green
score = score + abs(bb-originalbb) 'difference in blue

Once all of the 54 points are done, we check to see if the current score is less than the best-score value.  If not, just ignore it and keep going with the next combination of variables in the loop.  If score is less than best-score do the following:

best-x = x
best-y = y
best-scale = scale
best-theda = theda
best-score = score

That's it for the "do stuff" section.  At the end of the nested loops you end up with all those "best" variables which represent the location, angle, and scale that best matched the points on the previous camera frame.  If your best-score is low enough, set your main variables to be equal to your "best" versions:

x = best-x
y = best-y
scale = best-scale
theda = best-theda

Decide in advance some best-score value that is just too high to be acceptable.  This would mean something blocked your target so there wasn't any good match to find.  When this happens, switch over to a search mode and start using past examples to find it again (more on this on the next line).

Now use these values to run through the 54 points again, but this time store the color data into the pvalue variables again.  This new position/rotation/scale is now your new standard to go by.  (When expanding upon this on your own, you'll want to store away this "standard" if it is unique enough from other examples you've stored.  This allows the object to be found again if it gets lost behind something else by searching for past examples of the target.)

This is a good time to draw cool moving lines/dots on the picture so you can see how it tracked the target.  Save the image somewhere so you can later upload your output video with a cool shout-out to your new pal Aprone!  : )

Grab the next webcam frame... and start over with your nested loops again.  You're set!

http://www.youtube.com/watch?v=9Xw4gGjDz14