Introduction to Image Processing — Part 1: Fundamentals

5 min readJan 31, 2021

Image Processing includes methods used to transform and enhance images depending on the context to be solved. It is part of a much broader field, Computer Vision, which deals with processing and analyzing input image and video data to mimic the human visual system. We can use image processing to manipulate a single image for a specific use or as part of data preprocessing methods in preparation for machine learning (ML) model training.

This series of image processing posts will show the cool stuff we can do with our images. For the vloggers and avid socmed users out there, you will have an idea of how Instagram and Snapchat filters were implemented. For the image editors, you will know the math behind some of the commonly used Photoshop functions you are using.

Representation of Digital Image

For part 1 of this series, we will discuss some of the fundamentals of image processing, and let’s start with how an image is represented digitally. A digital image is represented using pixels. As seen in Figure 1, the pear-shaped blob is represented using different number of pixels, and notice how it differs if we use a bigger or small number of pixels.

Figure 1. Representing image using pixels

Sampling

Sampling in the context of image processing involves taking the value of the image at regular spatial intervals. In layman’s terms, it simply answers the question, “How many cells (pixels) do we want to represent our image?”. By this time, I think you must meet Lenna since we will be seeing her a lot.

Different sample sizes were used to represent the image of Lenna, as can be seen in Figure 2, where N is the number of pixels per side of the image (e.g., N=55, 55x55 pixels). See that the higher the number of pixels we use, the better the resolution and the more information we get.

It is not always the case, however, that the higher the number, the better. In N=110 and N=220 for example, notice that there is not much difference between the two pictures, but the number of pixels is doubled. In this case, and if we are talking about using this as an input to machine learning model, then we better use the 110 pixels to conserve our resources.

If you find Lenna pretty, get to know more about here in this link.[1]

Quantization

Quantization involves discretizing the intensity values of the analog image. It is the amount of information per pixel or the image representation per pixel. Let’s take Lenna again, for example.

As shown in Figure 3, k represents the intensity value or also known as its bit depth. In simple terms, it is the number of colors with which we can represent the image. It is evident that the more colors we use, the better the representation of our image will be.

Just like sampling, higher does not always mean better. It depends on the context or the problem that we are trying to solve. For example, if we only have a black and white image (which is technically 1s and 0s), we don’t need an 8-bit representation of image like in RGB because it will only be a waste of resources.

Image Types

Going back to Quantization, we can represent images in different values per pixel. Figure 4 shows the three image types, with respect to pixel value, where we can represent Lenna. Binary Images, as mentioned earlier, are represented using 1s and 0s. Grayscale, on the other hand, is represented from 0 to 255. Lastly, Colored (colormap) in this context is also represented in 0 to 255 values but using three color channels — Red, Green, Blue, or more commonly known as RGB (see Figure 5).

In the next parts of this series, we will know when and why we need to use a particular image type.

Color Spaces

As mentioned in the Image Types section, colored images can be represented using the RGB color channels. And since we are using three separate color channels, we can extract and represent the images per channel as illustrated in Figure 6.

Representing images using different color channels is useful, particularly in segmentation tasks. For example, we want to isolate apples and bananas from a picture of fruits. We could isolate these elements by determining the RBG values for red (apple) and yellow (bananas).

But what if we want to segment an apple from a picture of apples and strawberries? This could be a bit tricky since apples and strawberries are both red generally. This is where HSV color space will come in handy. HSV stands for Hue, Saturation, and Value, where [2]:

Hue: dominant wavelength or the color itself (e.g., red)

Saturation : brilliance and intensity of a color

Value : lightness or darkness of a color

Figure 7 shows the picture of Lenna in HSV space.

So even if apples and strawberries are both red in nature, they may have different levels of saturation and value, which we can use to separate them from one another.

Image segmentation will be further discussed in the next parts of this series.

References

[1] “Lenna,” Wikipedia, [Online]. Available: https://en.wikipedia.org/wiki/Lenna.

[2] “Hue, Value, Saturation,” Learn, [Online]. Available: http://learn.leighcotnoir.com/artspeak/elements-color/hue-value-saturation/.