Graduated, use OpenCV and K-Means clustering in Python to segment the graduation photos

@Author: Runsen

Image segmentation is the process of dividing an image into multiple different regions (or segments). The goal is to turn the representation of the image into an easier and more meaningful image.

In this blog, we will see an image segmentation method, namely K-Means Clustering .

K-Means clustering is an unsupervised machine learning algorithm that aims to divide N observations into K clusters, where each observation belongs to the cluster with the nearest mean. A cluster is a collection of data points that are grouped together due to some similarity. For image segmentation, the clusters here are different image colors.

The environment we use ispip install opencv-python numpy matplotlib

The selected picture is the picture of our school's graduation photo. Don't worry, there is no me, the beautiful picture I found in the school public account.

Import the required modules:

import cv2
import numpy as np
import matplotlib.pyplot as plt
# read the image
image = cv2.imread("Graduation.jpg")

Before proceeding with image segmentation, let us convert the image to RGB format:

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

We will use the cv2.kmeans() function, which takes a 2D array as input, and since our original image is 3D (width, height, and depth are 3 RGB values), we need to flatten the height and width into a single pixel Vector (3 RGB values):

# 将图像重塑为像素和3个颜色值(RGB)的2D数组
print(image.shape) #(853, 1280, 3)
pixel_values = image.reshape((-1, 3))
# 转换为numpy的float32
pixel_values = np.float32(pixel_values)
print(pixel_values.shape) #(1091840, 3)

Regarding the kmean algorithm under opencv, the function cv2.kmeans()
is: The format of the function is:kmeans(data, K, bestLabels, criteria, attempts, flags)

  • data: categorical data, preferably np.float32 data, put a column for each feature. The reason why it is np.float32 is that this data type is fast, and if it is uint data, it will be very slow.
  • K: The number of categories. The kmeans classification of opencv2 requires a known number of categories.
  • bestLabels: Default classification label: None if not
  • Criteria: Mode selection for iteration stop, this is a tuple type number with three elements. Format (type,max_iter,epsilon), max_iternumber of iterations, epsilonthe accuracy of the results
    which, type there are three options:
  • cv2.TERM_CRITERIA_EPS: The accuracy (error) meets the epsilon stop.
  • cv2.TERM_CRITERIA_MAX_ITER: stop if the number of iterations exceeds max_iter
  • cv2.TERM_CRITERIA_EPS+cv2.TERM_CRITERIA_MAX_ITER, the two are combined, and either one meets the end. -
  • attempts: the number of times to repeat the kmeans algorithm, the best result will be returned
  • flags: initial class center selection, two methods
    cv2.KMEANS_PP_CENTERSThe center of algorithm kmeans++; cv2.KMEANS_RANDOM_CENTERSrandom initialization

Here, we need to set a criteriadefinite stop standard. We will stop when a certain number of iterations (for example, 500) is exceeded, or if the cluster movement is less than a certain epsilon value (let us choose 0.1 here), the following code defines this stopping criterion in OpenCV:

# 确定停止标准
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 500, 0.1)

In the above image, you will find five main colors (the sky, the grass, the trees, the upper body of the person is white, and the lower body of the person is black)

Therefore, we will use K=5 for this image:

k = 5
_, labels, (centers) = cv2.kmeans(pixel_values, k, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)

cv2.KMEANS_RANDOM_CENTERS just instructs OpenCV to randomly allocate the value of the cluster initially.

We converted the flattened image pixel value to a floating point number 32 type because of the cv2.kmeans() floating point number 32 type. Then, let us convert the floating point number back to an 8-bit pixel value np.uint8(centers):

# 转换回np.uint8
centers = np.uint8(centers)

# 展平标签阵列
labels = labels.flatten()

segmented_image = centers[labels.flatten()]

Convert back to the original image shape and display:

#重塑回原始图像尺寸
segmented_image = segmented_image.reshape(image.shape)
plt.imshow(segmented_image)
plt.show()


Of course, we can also disable some K-Means clusters in the image. For example, let's disable cluster number 1 and display the image:

# 禁用2号群集(将像素变为黑色)
masked_image = np.copy(segmented_image)
# 转换为像素值向量的形状
masked_image = masked_image.reshape((-1, 3))
cluster1 = 1
masked_image[labels == cluster1] = [0, 0, 0]
# 转换回原始形状
masked_image = masked_image.reshape(image.shape)
plt.imshow(masked_image)
plt.show()
Insert picture description here


The original K-Means cluster No. 2 cluster is a tree.

Please note that there are other segmentation techniques, such as Hough transform, contour detection, and current state-of-the-art semantic segmentation.

I recommend this to you guys

[Python image processing] 40. The first detailed explanation of Python image segmentation on the whole network (threshold segmentation, edge segmentation, texture segmentation, watershed algorithm, K-Means segmentation, flood fill segmentation, region positioning)