I have been recently looking at real-time motion detection and various ways of motion extraction and object tracking from a live video stream. Most motion detection algorithms are based on temporal contrast (a complex term maths and image processing guys use for frame subtraction) some sort of low-pass filtering and an ROI (Region Of Interest) and object detection algorithm.
I discovered the OpenCV library and decided to give it a try and do some frame subtraction tests and low-pass filtering. Here are some primitive test results and an engineer's explanation of frame subtraction and simplest possible image low-pass filtering.
Absolute difference
Our first stop is frame subtraction. Subtracted frame-to-frame time sets the potential processing (subtraction) load and motion detection sensitivity. In general, simply $\tau_{s} = \frac{1}{fps} [s]$ which would give us a linearly dependent processing load of $\eta = \frac{fps}{2} [op/sec]$.
On an image sensor one can do frame subtraction in both the analog and digital domains, but generally in both cases some form of single frame memory buffering is needed. You can see an image of an 8-bit grayscale absolute frame subtraction (absdiff) below.
Applying threshold
Normally ROI and object detection algorithms need a basic sensitivity tuning function, so a natural step is to add threshold to the image and extract event-based binary motion information. For this the OpenCV cvThreshold() function comes handy. Below is an image with added binary threshold of 10 steps (out of 255). Adding a small threshold also helps in noise removal. The single-bit binary event representation conversion helps reducing the processing load during further low-pass filtering steps.
Apart from having a static threshold, other truncation/moving threshold techniques exist.
Dilation
Two very basic low-pass filtering operations used in image processing are dilation and erosion. Both methods base on element-by-element comparison with a reference (structuring) element. A one-bit binary image map makes dilation and erosion easier to implement with simple NAND logic gates.
Here is a simple example of a binary morphological dilation. Let's imagine that we have the following binary image map: $$ Img = \begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix} $$ In order to perform dilation, as like many digital filters we need to set particular filter coefficients. In the current case we can define a reference element, which defines the surrounding pixels of the pixel of interest. We can for example define: $$ RefEl = \begin{matrix} 1 & \fbox{1} & 1 \end{matrix} $$ The dilation function applies the corresponding reference element to the pixels surrounding the pixel of interest and assigns a value to the pixel of interest depending on the value of the neighbouring elements.
In the current case with the example of the binary image, the single 'zeros' would be substituted by 'ones' because the elements defined by the reference element and the neighbouring pixels in the image are ones. Binary dilation appears to be not so computationally expensive as it can be implemented with basic N/AND gates. The filtered image would result in: $$ FiltImg = \begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & \fbox{1} & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & \fbox{1} & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix} $$
In general the dilation and erosion filtering can be very non-linear depending on the shape and size of the reference element. Here is an example of a dilated image:
It is noticeable that the random single pixel noise in the image would not necessarily fully disappear with dilation.
Erosion
An eroded image results by subtracting all pixels covered by the reference element if the latter does not completely fit the binary image. An intuitive example:$$ $$ Img = \begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{matrix} $$ $$ An erosion with a reference element of: $$ RefEl = \begin{matrix} 1 & 1 & 1 \\ 1 & \fbox{1} & 1 \\ 1 & 1 & 1 \end{matrix} $$ Leads to: $$ FiltImg = \begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \fbox{1} & \fbox{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & \fbox{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & \fbox{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix} $$ It is noticeable that erosion shrinks the details in an image and also removes any random noise pixels scattered over the image. Here is a visual example:
OpenCV is a library for image processing, it was very suitable for my experiments, trying to gather more understanding about image processing in general. I have attached the code I used for my experiments here.
It would be very interesting to have a look and investigate if simple image filtering algorithms can actually be implemented on an image sensor directly in the analog domain, or even further, why not on a pixel level.