Skip to content Skip to navigation

Connexions

You are here: Home » Content » clustering

Navigation

Content Actions

clustering

Module by: saptarshi das

Summary: this module contains an introduction to one of the most popular datamining technique, clustering. First we will discuss what is clustering and when we need to do it. Next some general topics are discussed as how to calculate the distance or dissimilarity functions, what to do when we came across categorical attributes etc.. Next the discussion bifurcates into two majore ways of clustering the hierarchical method and partitioning method. Different methods under these two categories are discussed in detail. Working codes for these methods will be available sortly on author's personal we site http://emailsaptarshi.googlepages.com . The module ends with discussing real life problems.

clustering key words : distance function, dissimilariy matrix, hierarchical clustering, partitioning clustering method, K-Mean, PAM, single linkage, average linkage.

Clustering as the name suggest is a technique of making groups or clusters from a set of objects. When and where to use clustering, the biggest question to any one learning the subject for the first time. Before taking the subject forward lets look into a problem or a scenario where clustering can be used to get a solution.

An example: a bank xyz want to study its credit card customers. The bank want to study cusomer's payment record also the bank is going to offer some facilities. Its almost impossible for the bank to study every customer indivisually. So xyz managers want to group their customrs and want to study a group or its representatives

Comments, questions, feedback, criticisms?

Send feedback