**TDI CCD and TDI CMOS Signal-to-Noise Ratio and Dynamic Range**

I was skimming through this paper from IEEE Transactions on VLSI Systems. In Section II A, the authors give a brief introduction on TDI CMOS image sensors and the basic SNR and DR dependence on the number of TDI stages $N$. What stoke me was their claim that the dynamic range of a CMOS TDI sensor decreases with the number of stages. While this holds for a CCD TDI image sensor, things stand completely different for a CMOS imager. This wrong statement is also my main motivation for this post.

**1. SNR and Dynamic Range (DR) for a TDI CCD Image Sensor**

In a CCD TDI imager, the pixel has to have a large full well capacitance in order to be able to hold the final accumulated output signal. Therefore, the CCD's full well has to be maximized in as to increase the dynamic range and avoid saturation/blooming problems.

Intuitively the signal-to-noise ratio would be equivalent to: $$SNR = 20log\Bigg(\frac{\sqrt{N}(i_{ph}t_{s})}{\sigma_{total}}\Bigg)$$ and the dynamic range of a CCD TDI sensor would be: $$ DR = 20log\Bigg(\frac{q_{fwmax}-(N i_{dc}t_{s})}{\sqrt{N}\sigma_{total}}\Bigg) $$ Therefore the DR would be limited by the charge handling capacity of the output CCD channel, which means that with a CCD we would like to have as large full well as possivle. One might observe that in the case of CCDs, the dynamic range would indeed be reduced with the increase of the number of stages due to the total noise addition at each stage. This however is not the case with a CMOS image sensor as we shall see soon.

**2. SNR and Dynamic Range (DR) for a TDI CMOS Image Sensor**

In a TDI CMOS sensor unlike CCDs, the pixel has to be designed only to handle the maximum full well capacity needed for the expected operating integration time per line.

As the integration time is low, the FW capacity can be low and the pixel constructed with high conversion gain. This also reduces noise (maximizes signal swing) and in some sense is relaxing the readout noise requirement, bar the fact that TDI in CMOS imposes tough (very low) noise requirements on the readout as the readout noise adds with the square root of the number time delay integration stages.

The signal to noise ratio as well as the dynamic range for a CMOS TDI sensor would therefore be: $$ SNR = 20log\Bigg(\frac{\sqrt{N}(i_{ph}t_{s})}{\sigma_{total}}\Bigg) $$ $$ DR = 20log\Bigg(\frac{N q_{fwmax}-(N i_{dc}t_{s})}{\sqrt{N}\sigma_{total}}\Bigg) $$ The total full well capacity equals to the sum of the separate pixel full wells $FW_{tot} = \sum\limits_{i=1}^n FW_{pix_n}$, so the total full well is practically only limited by the accummulators, which can be either implemented in the analog or digital domain. With digital accummulators the total FW can in general have no limits.

**MTF and TDI CIS**

I have been recently reading about TDI imagers and their fundamental limitations. A TDI (time delay integration) image sensor effectively performs multiple exposures of the same moving object and accumulates them later on. The aim is to increase the time available for integration of the same object spot and effectively boost the sensor's sensitivity and/or frame rate. Such sensors are typically realized in a large aspect ratio format, normally as line scanners. More about TDIs here.

One very specific and well known issue with TDI imagers is their poor contrast performance. This comes from the fact that when a moving object is captured by static orthogonally placed pixels in e.g. a rolling shutter CMOS image sensor, one can not capture the same object's spots with the same pixels. Here is a diagram of a four-line rolling shutter sensor.

In other words when the rolling shutter is triggered (e.g. left-right), the effective sampling aperture of the sensor depends on the sampling period of the adjacent pixels and the pixel line time. The sampling aperture would therefore affect the dynamic modulation transfer function of the image sensor.Lepage et. al. have an excellent publication in IEEE Transactions on Electron Devices on this problem.

I played numerically with the formulas to see how the number of stages in a TDI sensor would affect its dynamic MTF. The modulation transfer function can be computed by performing a 1D Fourier transform of the sensors' spread function, in the current case due to finite discrete sampling aperture: $$MTF_{discrete} = \frac{\sin(\frac{1}{2}f_{nyq}\pi\frac{t_{int}}{t_{line}})}{\frac{1}{2}f_{nyq}\pi\frac{t_{int}}{t_{line}}}$$ One should also note that the sensor's total MTF would also be affected by the pixel aperture, crosstalk, alignment and is a product of the latter. Below a plot of the dynamic MTF versus the normalised spatial frequency for different effective sampling apertures is shown.

Note at the aliasing peaks beyond the Nyquist frequency indicated with a vertical blue line. We can see that at its best (for a standard orthogonal rolling shutter scanner), if we have a single accummulation the MTF at $f_{n}/2$ is 0.64. As the total MTF of the imager depends also on the pixel's MTF, one can achieve a better total MTF by tweaking the pixel aperture design for e.g. adding some light shields etc... This however degrades pixel QE and therefore gives a loss of SNR.

**Bode plots with an oscilloscope**

Our group needed a good microphone for our weekly conference calls as a part of us are in Portugal and another in Glasgow. The >100 GBP camera microphone which we had did not provide satisfactory results, none at all. This is the reason why I decided to build a microphone preamplifier myself, which supposedly had to perform better and replace the camera mic. The design and soldering process however, provoked an idea for a more esoteric measurement setup.

I saw the idea implemented originally by Dave Jones in his EEVBlog. I had a look at his video again but a 25 minute ranting video seemed way too long for me to watch, so I decided to squeeze the general idea into five minutes. Here is a basic explanation and demonstration of how to do a Bode plot with a scope.

The circuit I used for the test is a non-inverting amplifier, realized with an opamp (TL072) with having an f-dependent feedback, effectively forming a high-pass filter. Practically we need some sort of DC reject filter as to avoid opamp saturation. Here is my circuit:

And two screenshots of a linear and a log sweep:

There is some noise in the system and performing averating to filter out the noise is not an option here as one can never achieve the same phase on every sweep/acquisition. Turning on averaging just in my case just distorted the picture even more.

It would be fun to find out how to do a phase plot on the scope, so that a full picture of the transfer characteristics of our circuit can be acquired.

**Simple motion detection with OpenCV**

As a continuation of my previous post, here is a simple algorithm for motion detection in a live video stream using OpenCV. It basically follows a few simple steps:

1. Subtract frames and generate a binary image

cvAbsDiff( frameTime1, frameTime2, frameForeground ); cvShowImage( "AbsDiff", frameForeground); //AbsDiff window cvThreshold( frameForeground, frameForeground, 20, //Threshold 255, //Saturate up to 255 CV_THRESH_BINARY); //CV_THRESH_BINARY_INV); //CV_THRESH_TRUNC); //CV_THRESH_TOZERO); //CV_THRESH_TOZERO_INV); // cvShowImage( "AbsDiffThreshold", frameForeground); //AbsDiffThreshold window

The threshold as one may guess, can be used as a primitive noise supression parameter.

2. Run through the binary image and accumulate events.

int row, col; uchar sig1, sig2; unsigned long int rowsum[frameForeground->height], totalsum; totalsum = 0; for( row = 0; row < frameForeground->height; row++ ) { rowsum[row] = 0; for ( col = 0; col < frameForeground->width; col++ ) { sig1 = CV_IMAGE_ELEM( frameForeground, uchar, row, col * 2 ); sig2 = CV_IMAGE_ELEM( frameForeground, uchar, row, col * 2 + 1 ); rowsum[row] += (sig1 + sig2); //printf("Y: %d X: %d Val: %d \n", row, col, sig1); //printf("Y: %d X: %d Val: %d \n", row, col, sig2); } totalsum += rowsum[row]; } printf("Totalsum: %20d \n", totalsum); if (totalsum >= 80000) // Motion detection threshold { cvRectangle(image,cvPoint(10,10), cvPoint(310, 230),cvScalar(0, 255, 0, 0),1,8,0); // Draw a green rectangle } cvShowImage( "Camera", image ); //Display the original image w/wo added green rectangle

After all event accumulation, a comparison with a motion detection coefficient is made. Once the total event count is larger than the detection coefficient a green rectangle is embedded on the original frame.

Here is my dead simple example, which should work straight out of the box.

**Playing with temporal contrast and low-pass filtering for motion detection with OpenCV**

I have been recently looking at real-time motion detection and various ways of motion extraction and object tracking from a live video stream. Most motion detection algorithms are based on temporal contrast (a complex term maths and image processing guys use for frame subtraction) some sort of low-pass filtering and an ROI (Region Of Interest) and object detection algorithm.

I discovered the OpenCV library and decided to give it a try and do some frame subtraction tests and low-pass filtering. Here are some primitive test results and an engineer's explanation of frame subtraction and simplest possible image low-pass filtering.

**Absolute difference**

Our first stop is frame subtraction. Subtracted frame-to-frame time sets the potential processing (subtraction) load and motion detection sensitivity. In general, simply $\tau_{s} = \frac{1}{fps} [s]$ which would give us a linearly dependent processing load of $\eta = \frac{fps}{2} [op/sec]$.

On an image sensor one can do frame subtraction in both the analog and digital domains, but generally in both cases some form of single frame memory buffering is needed. You can see an image of an 8-bit grayscale absolute frame subtraction (absdiff) below.

**Applying threshold**

Normally ROI and object detection algorithms need a basic sensitivity tuning function, so a natural step is to add threshold to the image and extract event-based binary motion information. For this the OpenCV cvThreshold() function comes handy. Below is an image with added binary threshold of 10 steps (out of 255). Adding a small threshold also helps in noise removal. The single-bit binary event representation conversion helps reducing the processing load during further low-pass filtering steps.

Apart from having a static threshold, other truncation/moving threshold techniques exist.

**Dilation**

Two very basic low-pass filtering operations used in image processing are *dilation* and *erosion*. Both methods base on element-by-element comparison with a reference (structuring) element. A one-bit binary image map makes dilation and erosion easier to implement with simple NAND logic gates.

Here is a simple example of a binary morphological dilation. Let's imagine that we have the following binary image map: $$ Img = \begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix} $$ In order to perform dilation, as like many digital filters we need to set particular filter coefficients. In the current case we can define a reference element, which defines the surrounding pixels of the pixel of interest. We can for example define: $$ RefEl = \begin{matrix} 1 & \fbox{1} & 1 \end{matrix} $$ The dilation function applies the corresponding reference element to the pixels surrounding the pixel of interest and assigns a value to the pixel of interest depending on the value of the neighbouring elements.

In the current case with the example of the binary image, the single 'zeros' would be substituted by 'ones' because the elements defined by the reference element and the neighbouring pixels in the image are ones. Binary dilation appears to be not so computationally expensive as it can be implemented with basic N/AND gates. The filtered image would result in: $$ FiltImg = \begin{matrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & \fbox{1} & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & \fbox{1} & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{matrix} $$

In general the dilation and erosion filtering can be very non-linear depending on the shape and size of the reference element. Here is an example of a dilated image:

It is noticeable that the random single pixel noise in the image would not necessarily fully disappear with dilation.

**Erosion**

An eroded image results by subtracting all pixels covered by the reference element if the latter does not completely fit the binary image. An intuitive example:$$ $$ Img = \begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{matrix} $$ $$ An erosion with a reference element of: $$ RefEl = \begin{matrix} 1 & 1 & 1 \\ 1 & \fbox{1} & 1 \\ 1 & 1 & 1 \end{matrix} $$ Leads to: $$ FiltImg = \begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \fbox{1} & \fbox{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & \fbox{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & \fbox{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix} $$ It is noticeable that erosion shrinks the details in an image and also removes any random noise pixels scattered over the image. Here is a visual example:

OpenCV is a library for image processing, it was very suitable for my experiments, trying to gather more understanding about image processing in general. I have attached the code I used for my experiments here.

It would be very interesting to have a look and investigate if simple image filtering algorithms can actually be implemented on an image sensor directly in the analog domain, or even further, why not on a pixel level.