+ All Categories
Home > Documents > Clustering Density

Clustering Density

Date post: 04-Jun-2018
Category:
Upload: guilherme-andrade
View: 224 times
Download: 0 times
Share this document with a friend

of 16

Transcript
  • 8/14/2019 Clustering Density

    1/16

    ClusteringLecture 4: Density-based Methods

    Jing GaoSUNY Buffalo

    1

  • 8/14/2019 Clustering Density

    2/16

    Outline

    Basics

    Motivation, definition, evaluation

    Methods

    Partitional

    Hierarchical

    Density-based

    Mixture model

    Spectral methods

    Advanced topics

    Clustering ensemble Clustering in MapReduce

    Semi-supervised clustering, subspace clustering, co-clustering,etc.

    2

  • 8/14/2019 Clustering Density

    3/16

    Density-based Clustering

    Basic idea Clusters are dense regions in the data space,

    separated by regions of lower object density

    A cluster is defined as a maximal set of density-connected points

    Discovers clusters of arbitrary shape

    Method

    DBSCAN

    3

  • 8/14/2019 Clustering Density

    4/16

    Density Definition

    -NeighborhoodObjects within a radius of

    froman object.

    High density - -Neighborhood of an object contains

    at least MinPtsof objects.

    q p

    -Neighborhood ofp

    -Neighborhood of q

    Density of p is high (MinPts = 4)

    Density of qis low (MinPts = 4)

    }),(|{:)(

    qpdqpN

    4

  • 8/14/2019 Clustering Density

    5/16

    Core, Border & Outlier

    Given and MinPts,categorize the objects into

    three exclusive groups.

    = 1unit, MinPts = 5

    Core

    Border

    Outlier

    A point is a core pointif it has more than a

    specified number of points (MinPts) within

    EpsThese are points that are at the

    interior of a cluster.

    A border pointhas fewer than MinPts

    within Eps, but is in the neighborhoodof a core point.

    A noise pointis any point that is not a

    core point nor a border point.

    5

  • 8/14/2019 Clustering Density

    6/16

    Example

    Original Points Point types: core,

    borderand outliers

    = 10, MinPts = 4

    6

  • 8/14/2019 Clustering Density

    7/16

    Density-reachability

    Directly density-reachable

    An object qis directly density-reachable from objectp

    ifpis a core object and qis in ps -neighborhood.

    q p

    qis directly density-reachable fromp

    pis not directly density-reachable from

    q

    Density-reachability is asymmetric

    MinPts = 4

    7

  • 8/14/2019 Clustering Density

    8/16

  • 8/14/2019 Clustering Density

    9/16

    DBSCAN Algorithm: Example

    Parameter

    = 2 cm

    MinPts= 3

    foreach o Ddo

    ifois not yet classified thenifois a core-object then

    collect all objects density-reachable from o

    and assign them to a new cluster.

    else

    assign oto NOISE

    9

  • 8/14/2019 Clustering Density

    10/16

    DBSCAN Algorithm: Example

    Parameter

    = 2 cm

    MinPts= 3

    foreach o Ddo

    ifois not yet classified thenifois a core-object then

    collect all objects density-reachable from o

    and assign them to a new cluster.

    else

    assign oto NOISE

    10

  • 8/14/2019 Clustering Density

    11/16

    DBSCAN Algorithm: Example

    Parameter

    = 2 cm

    MinPts= 3

    foreach o Ddo

    ifois not yet classified then

    ifois a core-object thencollect all objects density-reachable from o

    and assign them to a new cluster.

    else

    assign oto NOISE

    11

  • 8/14/2019 Clustering Density

    12/16

    DBSCAN: Sensitive to Parameters

    12

  • 8/14/2019 Clustering Density

    13/16

    DBSCAN: Determining EPS and MinPts

    Idea is that for points in a cluster, their kthnearest

    neighbors are at roughly the same distance

    Noise points have the kthnearest neighbor at fartherdistance

    So, plot sorted distance of every point to its kthnearest

    neighbor

    13

  • 8/14/2019 Clustering Density

    14/16

    When DBSCAN Works Well

    Original Points Clusters

    Resistant to Noise

    Can handle clusters of different shapes and sizes

    14

  • 8/14/2019 Clustering Density

    15/16

  • 8/14/2019 Clustering Density

    16/16

    Take-away Message

    The basic idea of density-based clustering The two important parameters and the definitions of

    neighborhood and density in DBSCAN

    Core, border and outlier points

    DBSCAN algorithm

    DBSCANs pros and cons

    16


Recommended