# "Digital Image Processing" learning summary and perception: Chapter 2 Digital Image Fundamentals (5) Math Tools

### I. Introduction

This series of articles records the old ape’s self-study of Gonzalez's "Digital Image Processing" insights and summary, but it is estimated that the update will be slower. If you have to work during the day, you will find time to study at night. After studying a chapter, you will come back to summarize. Friends who want to learn You can download the electronic version of the original English version (currently to the fourth edition) and the Chinese translation (currently the third edition) to watch. If you want to compare and watch the original English version, it is recommended that you also look for the third edition.

This "Digital Image Processing" is worthy of being a classic tutorial on digital image processing. It has a wide range of knowledge, detailed content, and close-to-practice cases, which at least fits the taste of the old ape. However, there are two problems with the Chinese translation:

1. Some translations are not accurate enough or fluently. If such content appears in Laosha’s summary knowledge, it will be marked in italics, and some key terms Laosha will be accompanied by the original English words;
2. Many of the image cases in the Chinese translation are much worse than the original version, and even so bad that it affects the understanding of the content of the story. Therefore, even if you do not read the original text, it is best to compare the image cases with the original version.

### 2. Knowledge summary: the Mathematical Tools Used in Digital Image Processing (the Mathematical Tools Used in Digital Image Processing)

This part briefly introduces the mathematical knowledge and tools needed for graphics processing.

#### 2.1. Array versus Matrix Operations

Digital image pixel values stored in the rows and columns, this becomes storage array (Array), looks matrix stored data (Matrix) is the same, in fact, many of the operations based on the image matrix is performed, but Sometimes array-based image operations are very different from matrix operations.

Unless otherwise specified, the relevant processing assumes that it is an array operation rather than a matrix operation. The operation of the array is to directly apply the operation of the array to each pixel element of the array. For example, the addition of two arrays is the direct addition of the elements at the same position of the two arrays. Subtraction, multiplication, and division are the same. The exponentiation operation is right. Each element in the array is exponentiated, and all arithmetic operations on all images are array operations.

Taking a 4-element array and matrix multiplication as an example, the difference between array multiplication and matrix multiplication is as follows:

Array multiplication

Matrix multiplication

#### 2.2. Linear versus Nonlinear Operations

Suppose that the operator H acting on the image f(x, y) is such that:
H[f(x, y)] = g(x, y)
and satisfies:

where ai and aj are arbitrary constants, and fi and fj are any two For images of the same size, H is called a linear operation.

Non-linear operations are more difficult to understand than linear operations, and are not used much, but sometimes the performance is far better than linear operations, and some specific processing needs to be used.

#### 2.3, image arithmetic operations (Arithmetic Operations)

The arithmetic operations of images are all array operations, and the specific operations are very simple, so I won’t say much here.

Here are some examples of their applications:

• Image addition operation can be used for image noise reduction. This is generally used for image noise at each point that is uncorrelated and is additive noise. For example, in astronomy observation, imaging under very low illuminance will cause sensor noise, resulting in single One image cannot be analyzed, noise can be reduced by adding multiple images and then averaging
• Image subtraction operation can be used to enhance the difference between images, such as angiography and angiography subtraction before angiography to obtain the distribution of contrast agent in the blood vessel
• Image multiplication can use a template image to multiply the corresponding image to leave the ROI area of ​​interest, that is, the template area element corresponding to the ROI area is set to 1, other areas are set to 0, and then the image and the template are multiplied to obtain ROI area.
• Image multiplication or division can be used for shading correction. The old man understands that shadows are caused by uneven optical refraction. For example, when a photo of white paper is printed, it is not white or consistent in color. The background is a background color with uneven distribution. The left side of the picture below is a photo of a tungsten wire placed on a single color background, and the middle is a photo of a single background. The photo has obvious shadows. The right is the picture after shadow correction:

After dividing the left image by the middle image and processing, the right image can be obtained.

Assuming that the result image of the division operation is f(x,y), the operation process to ensure that the operation result finally falls within the range represented by a fixed number of bits is as follows:

``````fmin = f - min(f)
fmin = fmin + v #加v是为了避免除数为0，v为一大于0的非常小的数，按具体数据类型来确认，如果为整型则v=1
fresult = K*(fmin/max(fmin))，k为一图像灰度所能达到的最大值，如图像灰度是8bit，则k=255
``````

Lao Yuan believes that the above processing is normalization processing, but the normalization processing algorithm is different from OpenCV normalization processing. For OpenCV normalization processing, please refer to " OpenCV-Python image multiplication operation cv2. Multiply function detailed explanation and pixel value overflow Normalized processing ."

#### 2.4. Image set operations and logical operations (Set Operations)

The gray-scale collection operation of an image is an operation performed on the whole of one or more images, including the operation of the image pixel position and the image gray-scale.

##### Pixel position collection operation

The set operation includes the sub-intersection and complement operation of the set. When the set operation is a pixel operation of the image, the corresponding operation is a simple sub-intersection and complement operation. It should be noted that the universe of an image is generally defined as a rectangle containing all the pixels in the image, and the universe is generally denoted as U.
When two sets A and B are subtracted, the result is equal to the intersection of A and B c (that is, the complement of B).

##### Set operation of gray value
• The union of the gray levels of the two images takes the maximum gray level of each pixel corresponding to the same position of A/B as the pixel gray level of the same position in the result image
• The intersection of the gray levels of the two images takes the minimum gray level of each pixel corresponding to the same position of A/B as the pixel gray level of the same position in the result image
• The complement operation is the image obtained by subtracting the gray value of each pixel from a certain constant K (the constant K is the image gray level defined in the " Digital Image Processing" Learning Perception: Chapter 2 (3) Sampling, Quantization and Interpolation " 2 n -1 of the resolution n , where n is the number of bits representing the gray value of the image).

In addition, the concept of fuzzy sets will also be used in the image. I won't expand on this for now.

#### 2.5, the logical operation of the image (Logical Operations)

Logic operations include AND NOR and XOR. The three operations of AND NOR are fully functional, and any other logic operations can be obtained through these three operations. The logical operation of the two images is the logical operation between the pixel pairs of the two images.
The OR operation of the image. The Not operation is to complement the gray value.

#### 2.6. Spatial Operations

##### 2.6.1, definition

Spatial operations refer to operations that directly act on a given image pixel, including single-pixel operations, neighborhood operations, and geometric spatial transformations.

• Single pixel operation: the operation is directly performed on the gray value of each pixel of the image, and it is not related to other pixels.
• Neighborhood operation: For any point of the image, the gray value of the target image is determined by the specified operation and processing of the neighboring pixels around that point in the original image
• Geometric spatial transformation: Geometric transformation is to transform the spatial relationship between pixels in an image. Geometric transformation is also called rubber-sheet transformations, which can be seen as printing an image on a rubber film, and then stretching the film according to certain rules to form an image transformation.

The geometric transformation by the coordinate transform space (spatial transformation of coordinates) and pixel after grayscale conversion using spatial grayscale interpolation (intensity interpolation) is calculated composition.
##### 2.6.2. Space operation case-affine transformation

The most commonly used space transformation is an affine transform, and the affine transformation formula is expressed as follows:

For the above affine transformation formula, T is called an affine transformation matrix.

The above formula can calculate the gray value of the mapped position in the output image by the gray value of each pixel of the input image, but in this way, there may be multiple input image pixels mapped to one output image pixel, or a certain output image pixel is not calculated the values into the problems, this method of calculating an output image pixel gray value is mapped to the input image pixel is referred to by a forward mapping (forward mapping).

Another way is to use the inverse matrix of the transformation matrix to calculate the pixel position of the position in the input image based on the pixel position of the output image, and then use the interpolation method to confirm the gray value of the output pixel according to the position of the input image pixel . this is called a reverse mapping (inverse mapping). Reverse mapping is more effective than forward mapping and is used by many commercial software.

For more information about affine transformation, please refer to the introduction of " https://blog.csdn.net/LaoYuanPython/article/details/113832562 OpenCV-Python Image Processing: Detailed Explanation and Cases of Affine Transformation ".

##### 2.6.3. Space operation case-Image Registration

2.6.3.1 Definition
Image registration is used to align two or more images of the same scene. During image registration, there are input images and output images available, but how to transform from the input image to the output image is the transformation function It is unknown. The input image here is the image to be transformed, and the output image is the reference image, which is used to register the image of the input image.

Note : The old monkey believes that the output image mentioned here is only theoretical. If the registration is really in progress, it can only be said to be a reference image. Otherwise, there is an output image, why do you need to register it?

The reason for the registration of the image is that in the process of imaging the scene, the image is geometrically distorted due to the viewing angle, distance, direction, sensor resolution, target position shift and other factors, and these distortions are corrected Image registration is required, and these images are merged after registration or quantitative analysis of the images is performed.

2.6.3.2, registration method of
one of the main methods of image registration is to use constraint points (tie points, also referred to as control points, control points), the points corresponding to the input image and the reference position of the mid point of the image is known. Constraint points can be selected interactively or automatically. With enough constraint points, the transformation function can be derived.

The approximate bilinear transformation formula of the 4 constraint points is as follows:

Where v and w are the coordinates of the pixels in the original image, and x and y are the transformed coordinates.

If four points cannot calibrate an image, the image can be divided into multiple non-overlapping sub-images, and each sub-image is calibrated with 4 points, so that the entire image is calibrated by superimposing.

However, when the distortion is severe, it is difficult to achieve a perfect match using constraint points, and the error of manually selecting constraint points is generally relatively large.

The above 4 images are numbered a, b, c, d from left to right and then from top to bottom, a is the reference image, b is the input image with geometric distortion, c is the output image after registration, d It is the difference between the registered output image and the reference image.

2.6.3.3, Old Ape understands :

• Image registration is based on the reference points of the same scene and the same position in images with different distortions. The corresponding transformation function can be obtained through the spatial positions of these points in different images, and then this transformation function is applied to the entire input image. It is possible to obtain an approximate image of the same imaging environment (including angle of view, direction, distance, resolution and location, etc.) of the input image and the registered image
• This process is somewhat similar to solving the coefficients of an n-ary equation of k degree. Before solving, it is necessary to find enough unknown value pairs to substitute, so that the correlation coefficient can be calculated. The number of constraint points and the complexity of the transformation function model depend on the degree of distortion of the image to be registered. For example, if the distortion is only caused by the translation of the scene, you may only need a linear equation of 2 variables with 1 control point, if it is an angle The distortion of rotation requires 4 constraint points.
• The transformation corresponding to the registration is actually a transformation similar to the perspective transformation, but the transformation matrix is ​​unknown, and the transformation matrix needs to be obtained through the constraint points (the matrix obtained by more than 4 constraint points is no longer a 3×3 matrix)

#### 2.7, Vector and Matrix Operations

Multispectral image processing is a typical field that uses vector and matrix operations. The pixel of an RGB color image contains 3 components, which can form a three-dimensional column vector.

After the pixels are represented as vectors, the processing theories and tools related to vectors and matrices can be used. For the introduction of vector knowledge, please refer to " https://blog.csdn.net/LaoYuanPython/article/details/112410587 Artificial Intelligence Mathematical Foundation-Linear Algebra 1: Vector and Vector Addition, Subtraction and Multiplication " and " https:// blog.csdn.net/LaoYuanPython/article/details/112411742 Artificial Intelligence Mathematical Fundamentals-Linear Algebra 2: Vector Dot Product, Inner Product, Quantitative Product, and Outer Product Introduction.

The entire image can be treated as a matrix or vector, such as linear image processing:
`g = Hf + n`
where f represents the input image, which is an MN×1 vector. n represents the MN×1 vector of the M×N noise pattern, and g represents the output image, which is also an MN×1 vector.

#### 2.8. Image Transforms

Digital real plane image f (x, y) coordinates into sheets (real plane) is called the spatial domain (spatial domain), x, y called spatial variables (spatial variables), or the spatial coordinates (spatial coordinates), based on the pixel coordinates The transformation to pixel coordinates is a transformation based on the spatial domain.

And in some cases, the image processing task is preferably constructed in such a way (best formulated): converting an input image in a transform domain that perform particular tasks (transforms domain), then the result of processing performed after the completion of the inverse transform (inverse transform) back To the spatial domain.

For example, the two-dimensional linear transformation expressed as T (u, v), its general form is expressed as:

where f (x, y) is the input image of M rows and N columns, and r (x, y, u, v) is called positive Transformation kernel (forward transformation kernel, also called forward transformation kernel), use u=0,1,...,M-1,v=0,1,...,N-1 to calculate the above formula, u, v called transform variable (transform variables), T (u , v) is called f (x, y) of a forward transform (forward transform, also referred to as a forward transform).

Given T(u,v), you can use its inverse transform (also called inverse transform) to restore f(x,y):

s(x,y,u,v) is called the inverse transformation kernel (inverse transformation kernel, which can also be translated as "inverse transformation kernel").

The above two conversion formula (2.6-30) and (2.6-31) together referred transform (transform pair). The following figure shows the basic steps of the two-dimensional linear transformation described above to perform image processing: the

corresponding transform domain is the linear transform domain.

If the positive transformation kernel in formula (2.6-30):, `r(x,y,u,v)=r1(x,u)r2(y,v)`then the positive transformation kernel is said to be separable . If on this basis, r1(x,y) is equal to r2(x,y), then the transformation kernel is said to be symmetric . At this time `r(x,y,u,v)=r1(x,u)r1(y,v)`. If the forward transformation kernel r in the above description is replaced with an inverse transformation kernel s, the same applies to the inverse transformation kernel.

Fourier transform is a very common transformation method in image processing, which can transform the image from the spatial domain to the frequency domain to perform some image processing operations. The Fourier transform is separable and symmetric, and the separable and symmetric kernel allows the one-dimensional Fourier transform to be used to calculate the two-dimensional Fourier transform.

The positive transform kernel of the two-dimensional Fourier transform: the

inverse transform kernel is:

take it into equations (2.6-30) and (2.6-31), you can get the discrete Fourier transform pair (discrete Fourier transform pair) ):

When the positive and negative nuclei of the transformation pair are separable and symmetric, and the image f(x,y) is a square image of size M×M, the above formulas (2.6-30) and (2.6-31) It can be expressed as a matrix:

Among them, F is an M-order square matrix representing an image containing elements f(x, y), A is an M-order square matrix with elements a ij = r1 (i, j), and T is the result of M×M transformation. The value is T(u,v), and the value of u,v is [0,M-1].

In order to get the inverse transformation, we use the inverse transformation matrix (pre- and post-multiply) formula

B (2.6-38): if B=A -1 , that is, B is the inverse matrix of A , Then: The

above formula shows that the image F can be completely restored by its positive transformation.

##### Transform domain

The old ape understands that space transformation is a transformation that can be expressed by matrix operations (I haven't verified it, and if there are big guys willing to advise, it is very welcome). In some cases, if the input image is processed by non-matrix operations (such as Fourier transform), the image needs to be transformed into the corresponding domain first. Regarding the transformation domain, the old ape spent nearly a whole day to understand, and finally published a separate blog post " https://blog.csdn.net/LaoYuanPython/article/details/117389597 Image processing spatial domain, transformation domain, time domain And understanding the meaning of frequency domain: spatial VS transforms domain ", I won't say much here.

#### 2.9. Probabilistic Methods

Probability calculation, mean and variance in probability theory (please refer to " https://blog.csdn.net/LaoYuanPython/article/details/108864527 Artificial Intelligence Mathematical Foundation 4: Deviation, Mean Deviation, Variance, Standard Deviation, Covariance) , Pearson Correlation Coefficient "Introduction), random variables, nth-moment, etc. have important applications in image processing, especially gray-scale processing applications, so I won’t introduce them separately here.

### Three, summary and perception

This section briefly introduces the mathematical tools used in image processing, including array and matrix operations, linear and nonlinear operations, arithmetic operations, set and logical operations, spatial operations, vector and matrix operations, image transformation, and probability methods. These mathematical methods will be used more or less in the subsequent image processing. Familiarity with these mathematical methods and tools is very important to understand the methods of image processing.

For more image processing, please refer to the introductions in the columns " OpenCV-Python Graphics and Image Processing " and " Basics of Image Processing ".

For those who lack Python foundation, you can learn Python from scratch through Lao Yuan’s free column " Column: Python Basic Tutorial Directory ".

###### Blogging is not easy, please support:

1. The paid column " https://blog.csdn.net/laoyuanpython/category_9607725.html Using PyQt to Develop Graphical Interface Python Applications" specifically introduces the basic tutorials of Python-based PyQt graphical interface development. The corresponding article directory is " https://blog.csdn .net/LaoYuanPython/article/details/107580932 Use PyQt to develop a graphical interface Python application column directory ";
2. The paid column " https://blog.csdn.net/laoyuanpython/category_10232926.html moviepy audio and video development column ) details the related methods of moviepy audio and video editing and synthesis processing and the use of related methods to process related editing and synthesis scenes, corresponding to the article The directory is " https://blog.csdn.net/LaoYuanPython/article/details/107574583 moviepy audio and video development column article directory ";
3. The paid column " https://blog.csdn.net/laoyuanpython/category_10581071.html OpenCV-Python Difficult Questions for Beginners " is " https://blog.csdn.net/laoyuanpython/category_9979286.html OpenCV-Python graphics and image processing "The companion column" is an integration of the author’s personal perceptions of some of the problems encountered in the learning of OpenCV-Python graphics and image processing. The relevant information is basically the result of repeated research by the old ape, which helps OpenCV-Python beginners to more in-depth To understand OpenCV, the corresponding article directory is " https://blog.csdn.net/LaoYuanPython/article/details/109713407 OpenCV-Python Beginners Difficult Question Collection Column Directory "
4. The paid column " https://blog.csdn.net/laoyuanpython/category_10762553.html Introduction to Python Crawlers" introduces the content of crawler development from the perspective of an Internet front-end developer, including the basic knowledge of crawler introduction and crawling Take CSDN article information, blogger information, like articles, comments and other actual content.

The first two columns are suitable for novice readers who have a certain Python foundation but no relevant knowledge. The third column, please combine " https://blog.csdn.net/laoyuanpython/category_9979286.html OpenCV-Python graphics and image processing " Learning to use.

For those who lack Python foundation, you can learn Python from scratch through Lao Yuan’s free column " https://blog.csdn.net/laoyuanpython/category_9831699.html column: Python basic tutorial directory ).

If you are interested and willing to support the readers of the old ape, welcome to buy the paid column.