What is clustering

What Is K Means Clustering ?

Algorithm steps Of K Means

Step-1: Select the value of K, to decide the number of clusters to be formed.

Step-2: Select random K points which will act as centroids.

Step-3: Assign each data point, based on their distance from the randomly selected points (Centroid), to the nearest/closest centroid which will form the predefined clusters.

Step-4: place a new centroid of each cluster.

Step-5: Repeat step no.3, which reassign each data point to the new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.

Step-7: FINISH

Applications of K-Means Clustering :

  • Academic Performance :

Based on the scores, students are categorized into grades like A, B, or C.

  • Diagnostic systems :

The medical profession uses k-means in creating smarter medical decision support systems, especially in the treatment of liver ailments.

  • Search engines :

Clustering forms a backbone of search engines. When a search is performed, the search results need to be grouped, and the search engines very often use clustering to do this.

  • Wireless sensor networks :

The clustering algorithm plays the role of finding the cluster heads, which collect all the data in its respective cluster.

Use-Cases in Security Domain


To overcome this problem, K-Means clustering is useful, which will cluster all data into the corresponding group before applying a classifier for classification purpose with reasonable false alarm rate. This approach has resulted in high accuracy and good detection rates but with moderate false alarm on novel attacks.


Clustering detection model by using K-Means clustering approach to detect malware behavior of data based on the features of the malware. Clustering techniques that use unsupervised algorithm in machine learning plays an important role in grouping similar malware characteristics by studying the behavior of the malware which results in, model is capable to cluster normal and suspicious data into two separate groups with high detection rate which is more than 90 percent accuracy.


K-means Clustering is an effective way of identifying spam. The way that it works is by looking at the different sections of the email (header, sender, and content). The data is then grouped together. These groups can then be classified to identify which are spam. Including clustering in the classification process improves the accuracy of the filter to 97%.

Analyzing Logs from Proxy Server and Captive Portal Using K-Means Clustering Algorithm

1. Operational Framework

1.1. Data Collection

1.2. Data Preprocessing

1.3. Data Transformation

1.4. Pattern Discovery


Example : We can use these in the test data in the industry of school Management

Cyber Profiling

Profiling is information about an individual or group of individuals that are accumulated, stored, and used for various purposes, such as by monitoring their behavior through their internet activity .

Cyber Profiling process can be directed to the benefit of:

1. Identification of users of computers that have been used previously.

2.Mapping the subject of family, social life, work, or network-based organizations, including those for whom he/she worked.

3.Provision of information about the user regarding his ability, level of threat, and how vulnerable to threats .

4.Identify the suspected abuser

The way in which Cyber Profiling works :

I Hope You Like It…

Thank You For Reading…

Connect with me on LinkedIn | GitHub



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store