Did you ever run a resize algorithm against an image and wonder "how is that done"?. Do you have a piece of hardware or code that could benefit from a basic resizer? Do you lack the basic knowledge of resizing algorithms that is assumed by most of the "scholarly" papers about resizing? Does matrix math give you a headache? If you answered 'yes' to any of those questions, you are in the right place.
First of all, lets get a few things out of the way:
- I tend to use the terms 'resize','rescale' and 'resample' interchangeably. Only one of these is truly correct. Which one is correct depends on who you are talking to at the time. IMHO, resampling is probably most correct because it is more descriptive of what is happening. However, I tend to switch around based on mood.
- I'm not going to go into too much detail about the formulas except where necessary. I'll also stick with the extremely simple stuff (like how to calculate a scaled distance) and skip difficult stuff (like Lanczos curve calculations) There are many scholarly papers on this subject available on the web. You really don't need to understand the mathematics at that level to understand the concepts.
- I'm going to assume the simplest possible rescaling environment, which is a single-plane greyscale image. All of these examples can be easily adapted to RGB by replicating the algorithm on each of the color planes (R, G, and B). They can also be adapted to YUV by performing a YUV -> RGB color space conversion. I've also adapted them to run directly on YUV data, but that is a much more difficult problem involving offsets and dynamic range differences. I won't go into any of that here.
- I'm going to assume two things about the you. The first is that you understand the concept of a 2 dimensional image and how it is stored in the system you are using. I'm also going to assume that, given the calculations for a single output pixel, you can figure out how to iterate through the rest of them.
Before any discussion about resizing, lets define the basic problem. Resizing is taking an image of one size and changing it to another size.
Thank you Captain Obvious. Everyone knows that part, but it doesn't tell us much. A much more precise way to state this is: Resizing is taking an image of one size and changing it to another size by recalculating each output pixel value based on it's position in the input image and the value of it's neighbors in the input image.
I like that one better because it describes exactly what is happening.
Our improved definition has a few implications associated with it:
The Basics: Position and Distance Calculations.
Definition: Scale Factor
- We will need to be able to determine the output pixel's position in the input image (obvious).
- We will need to be able to determine the values of the neighboring input pixels (also obvious).
- We will need to determine the output pixel's distance from it's neighbors (less obvious).
- the size of the output image dimension divided by the size of the input image dimension. If you are scaling the image down to 2/3 of its original size in both the x and y dimensions, then your scale factor is 2/3 (or 0.667). I used the word 'dimension' in the definition because the scale factors for width and height aren't always the same. If you wish to maintain the original aspect ratio, then your scale factors for x and y must be the same. If you wish to stretch or compress one dimension more than another, then they will be different.
The first thing necessary in every resampling algorithm is to find the x,y output position of the pixel in the original image. This is a relatively simple operation consisiting of dividing the output position by the scale factor.
For example: if your scale factor is 3/4 in each direction, and you would like to find the input position for the 5,16 output pixel, you would get:
(5 / .75), (16 / .75)
As you can see, the calculations don't end up with nice round integers. This is always a problem with resampling, and the main reason why it is so complex.
The next thing necessary is to determine the closest pixels and their values. This is even easier than the position calculation and only requires us to round the numbers for each dimension up and down, giving us 4 coordinate pairs. To use the previous example, we end up with (6,21), (6,22), (7,21), and (7,22). From this point, determining the pixel values is generally done by looking them up in the image data.
The next step is determining pixel distances. A mathematician would now use a little bit of trigonometry to determine the exact distance from the center of the output pixel to the center of each input pixel... but we'll pretend we aren't mathematicians. The truth is that the actual distance is far less important than the x and y distances of each pixel. Most complex algorithms are implemented in two passes, one for x and one for y. They cannot use actual distance.
So... after all that, the only thing we need to do is know that (6, 21) and (6.667, 21.333) are .667 pixels away in the x dimension, and .333 pixels apart in the y dimension. One thing to note... These are distances, and they cannot be negative. You'll need to use the absolute value of the distance. In the example above, (7,21) is .333 pixels away in the x dimension, and .333 in the y dimension.
Calculating a Pixel Value
Calculating output pixel values varies based on algorithm, and is the heart of every resampler. Once you have a grasp of this operation, the remaining work is a simple matter of iterating through all of the pixels in the image and making sure that they all end up in the right place.
Visiting your Nearest Neighbor
The simplest and fastest algorithm for rescaling is the 'nearest neighbor' algorithm. It requires the smallest number of resources, the fewest calculations, and can be implemented very quickly.
The concept is simple... calculate the output pixel's position in the input image, then find the closest pixel by rounding to the nearest integer in each dimension. The output pixel becomes the value of the pixel that you find there. Done... Simple... Next pixel please.
Example: 3/4 scaling, output pixel (5,16)
Input position: (5/.75, 16/.75) equals (6.667, 21.333)
Rounds to: (7,21)
Output value: Whatever value you find at (7,21) of the input image
Unfortunately, Nearest Neighbor returns the worst image results, especially when downscaling. Unless you happen to be scaling the image up or down by an integer factor (i.e. 1280x960 to 640x480, which is a divide by 2), the output quickly becomes blocky and unusable. In the case of integer scale factors, Nearest Neighbor can easily be replaced by a simple decimation algorithm, so it isn't even useful when it returns good results.
Thinking in the Box
The next simplest algorithm is the Block Averaging, or Box technique. It consists of finding the output pixel's position in the input image, then averaging the 'box' of pixels that are closest to it. In the example we have been using, this is a simple matter of adding up the input pixel values at (6,21), (6,22), (7,21) and (7,22), then dividing by the number of pixels averaged (4).
Though it is better than Nearest Neighbor, the Block Averaging algorithm is still terrible at downscaling. It does, however, introduce a blurring effect that is sometimes desirable when upscaling. I have used it successfully in the past for upscaling (enlarging) video and computer graphics.
The box itself does not have to be a 2x2 block surrounding the pixel. It can be any size that you want. It can even be a non-square region defined as all pixels within a certain distance of the output pixel (of course, you will need to calculate the actual distance of each pixel). The size and shape of the block will determine the extent of the blurring effect.
One of the most popular rescaling algorithms is Linear Interpolation. It is very similar to the box algorithm, except that the values are weighted by their
distance from the output pixel. A single pass Linear Interpolation algorithm is conceptually simple. Each neighboring pixel value is multiplied by a value equal to 1 minus its distance from the output pixel in each dimension. This means that the closer a neighboring pixel is, the more weight its value has in the final calculation.
Recall the previous example where we used a scale factor of 3/4 and found that the position of output pixel (5,16) was (6.667, 21.333). We also found that the nearest pixels were (6,21), (6,22), (7,21) and (7,22). The linear interpolation algorithm applies the formula output += pix_val * x_distance * y_distance
to each of the neighboring pixels.
If we apply our linear interpolation to these pixels, we end up with something like this:
(6,21) * (1 - .333) * (1 - .667) +
(6,22) * (1 - .333) * (1 - .333) +
(7,21) * (1 - .667) * (1 - .667) +
(7,22) * (1 - .667) * (1 - .333)
That's quite a bit of work to do for every pixel. That's 8 subtractions, 8 multiplications and 4 additions for each pixel, and that doesn't include the work done to calculate the position. Of course, Linear Interpolation isn't generally implemented in this way. It's far too inefficient and wasteful. It's only here as an illustration of the concept of the linear weighting of the data.
The first inefficiency is the constant recalculation of weight. Because we are using input values that are one pixel apart, the fractional distances in each direction need to add up to 1. This means that we can simply substitute the (1-distance) with the fractional distance from the other pixel. To make that a little clearer, instead of using (1 - distance(6,21)), we can substitute the distance to (6,22), which will always be equivalent. Weights don't need to be calculated, just swapped in from previous distance calculations.
Up until now, all of the resamplers we have looked at have only been concerned with the pixels in a 2x2 box round the output pixel. 2x2 (more generally called 2 tap) rescalers are the easiest to implement, but suffer from pretty serious losses in the frequency domain. A sudden change from black to white can end up with noticeable artifacts. If you are looking for more clarity or "sharpness" in your image, you will probably need to resort to something more advanced.
Matrix algorithms are very popular for high-end image scaling. They generally consist of a formula broken down into a matrix of calculations performed on at least 16 neighboring pixels. These algorithms tend to be accurate and mathematically elegant, but they consume enormous resources. If you are interested in the math behind it, check out the Wikipedia page on bicubic interpolation
. It will give you a sense of why this doesn't belong in a "common sense" article.
The good news is that most of these very complex calculations can be broken down into a few simple steps. The simplified versions are slightly less perfect (due to occasional rounding errors), but yield equally satisfactory results in most cases. Once they are boiled down to their essence, they can be implemented as an interpolation filter.
An interpolation filter is really just a simple implementation of a fancy FIR filter. The simplicity comes from the fact that all of the calculations from the matrix algorithms (well most of the algorithms... they need to be orthagonal) can be pre-calculated for each distance from the output pixel, then put in a lookup table for use as pixel weights. These pixel weights are then summed up for each input tap (generally 4 to 8) and become the output pixel value.
As a bonus, all of the distance calculations can be slaved off of a simple binary counter containing the scale factor. In the end, a good 4 tap interpolation filter need only perform 8 multiplications and 8 additions. That's even fewer than the inefficient version of the bi-linear algorithm above.
Implementing an interpolation filter in software or hardware is a bit beyond the "basic" nature of this post. Maybe another time?
General references to web sites or pages that would not result in an SEO-friendly link (i.e. www.example.com/index) are allowable. All tags will be automatically stripped