'Clustering images using unsupervised Machine Learning

I have a database of images that contains identity cards, bills and passports.
I want to classify these images into different groups (i.e identity cards, bills and passports).
As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised).
The idea for me is like this: the clustering will be based on the similarity between images (i.e images that have similar features will be grouped together).
I know also that this process can be done by using k-means.
So the problem for me is about features and using images with K-means.
If anyone has done this before, or has a clue about it, please would you recommend some links to start with or suggest any features that can be helpful.



Solution 1:[1]

Label a few examples, and use classification.

Clustering is as likely to give you the clusters "images with a blueish tint", "grayscale scans" and "warm color temperature". That is a quote reasonable way to cluster such images.

Furthermore, k-means is very sensitive to outliers. And you probably have some in there.

Since you want your clusters correspond to certain human concepts, classification is what you need to use.

Solution 2:[2]

Most simple way to get good results will be to break down the problem into two parts :

  1. Getting the features from the images: Using the raw pixels as features will give you poor results. Pass the images through a pre trained CNN(you can get several of those online). Then use the last CNN layer(just before the fully connected) as the image features.
  2. Clustering of features : Having got the rich features for each image, you can do clustering on these(like K-means).

I would recommend implementing(using already implemented) 1, 2 in Keras and Sklearn respectively.

Solution 3:[3]

I have implemented Unsupervised Clustering based on Image Similarity using Agglomerative Hierarchical Clustering.

My use case had images of People, so I had extracted the Face Embedding (aka Feature) Vector from each image. I have used dlib for face embedding and so each feature vector was 128d.

In general, the feature vector of each image can be extracted. A pre-trained VGG or CNN network, with its final classification layer removed; can be used for feature extraction.

A dictionary with KEY as the IMAGE_FILENAME and VALUE as the FEATURE_VECTOR can be created for all the images in the folder. This will make the co-relation between the filename and it’s feature vector easier.

Then create a single feature vector say X, which comprises of individual feature vectors of each image in the folder/group which needs to be clustered.

In my use case, X had the dimension as : NUMBER OF IMAGE IN THE FOLDER, 128 (i.e SIZE OF EACH FEATURE VECTOR). For instance, Shape of X : 50,128

This feature vector can then be used to fit an Agglomerative Hierarchical Cluster. One needs to fine tune the distance threshold parameter empirically.

Finally, we can write a code to identify which IMAGE_FILENAME belongs to which cluster.

In my case, there were about 50 images per folder so this was a manageable solution. This approach was able to group image of a single person into a single clusters. For example, 15 images of PERSON1 belongs to CLUSTER 0, 10 images of PERSON2 belongs to CLUSTER 2 and so on…

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Has QUIT--Anony-Mousse
Solution 2 Deepak Saini
Solution 3 Vineet Sharma