Prologue to Hierarchical Clustering Analysis

0
0
1498 days ago, 592 views
PowerPoint PPT Presentation
Foundation. Cell/Tissue 1. Cell/Tissue 2. Cell/Tissue N.

Presentation Transcript

Slide 1

Prologue to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005

Slide 2

Data 1 Data 2 … Data N Background Cell/Tissue 1 Cell/Tissue 2 … Cell/Tissue N Put comparable specimens/passages together.

Slide 3

Background Clustering is a standout amongst the most vital unsupervised learning forms that sorting out items into gatherings whose individuals are comparative somehow. Grouping discovers structures in an accumulation of unlabeled information. A group is an accumulation of items which are comparative amongst them and are not at all like the articles having a place with different bunches.

Slide 4

Motivation I Microarray information quality checking Does repeats group together? Does comparative conditions, time focuses, tissue sorts bunch together?

Slide 5

Data: Rat Schizophrenia Data (Allen Fienberg and Mayetri Gupta) Two time points:35 days (PD 35) and 60 days (PD60) past birth. Two mind areas: Prefrontal cortex (PFC) and Nucleus accumbens (NA). Two reproduces (Samples are from a similar arrangement of tissue split into various tubes so that imitates ought to be in close understanding.) dChip was utilized to standardize the information and get show based expression values, utilizing the full PM/MM display. Test IDs How to peruse this bunching result? Quality IDs Clustering comes about Heat delineate length Problem?

Slide 6

Motivation II Cluster qualities  Prediction of elements of obscure qualities by known ones

Slide 7

Functional noteworthy quality groups Two-way bunching Sample groups Gene groups

Slide 8

Motivation II Cluster qualities  Prediction of elements of obscure qualities by known ones Cluster tests  Discover clinical attributes (e.g. survival, marker status) shared by tests.

Slide 9

Bhattacharjee et al. (2001) Human lung carcinomas mRNA expression profiling uncovers unmistakable adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA, Vol. 98, 13790-13795.

Slide 10

Motivation II Cluster qualities  Prediction of elements of obscure qualities by known ones Cluster tests  Discover clinical attributes (e.g. survival, marker status) shared by tests Promoter examination of normally controlled qualities

Slide 11

Promoter investigation of regularly managed qualities David J. Lockhart & Elizabeth A. Winzeler, NATURE | VOL 405 | 15 JUNE 2000, p827

Slide 12

Clustering Algorithms Start with an accumulation of n questions each spoke to by a p–dimensional include vector x i , i=1, … n . The objective is to separate these n objects into k bunches so that items inside a groups are more "comparable" than articles between bunches. k is generally obscure. Famous strategies: various leveled, k-implies, SOM, blend models, and so forth

Slide 13

Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Slide 14

Hierarchical Clustering (Cont.) Multilevel bunching: level 1 has n groups  level n has one bunch. Agglomerative HC: begins with singleton and union groups. Divisive HC: begins with one example and split groups.

Slide 15

Nearest Neighbor Algorithm Nearest Neighbor Algorithm is an agglomerative approach (base up). Begins with n hubs ( n is the extent of our specimen), blends the 2 most comparable hubs at each progression, and stops when the wanted number of bunches is come to. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Slide 16

Nearest Neighbor, Level 2, k = 7 groups . From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Slide 17

Nearest Neighbor, Level 3, k = 6 groups .

Slide 18

Nearest Neighbor, Level 4, k = 5 bunches .

Slide 19

Nearest Neighbor, Level 5, k = 4 bunches .

Slide 20

Nearest Neighbor, Level 6, k = 3 groups .

Slide 21

Nearest Neighbor, Level 7, k = 2 groups .

Slide 22

Nearest Neighbor, Level 8, k = 1 bunch .

Slide 23

Hierarchical Clustering Calculate the closeness between every conceivable mix of two profiles Keys Similarity Clustering Two most comparable bunches are assembled together to shape another group Calculate the comparability between the new group and every residual bunch.

Slide 24

Similarity Measurements Pearson Correlation Two profiles (vectors) and +1  Pearson Correlation  – 1

Slide 25

Similarity Measurements Pearson Correlation: Trend Similarity

Slide 26

Similarity Measurements Euclidean Distance

Slide 27

Similarity Measurements Euclidean Distance: Absolute distinction

Slide 28

Similarity Measurements Cosine Correlation +1  Cosine Correlation  – 1

Slide 29

Similarity Measurements Cosine Correlation: Trend + Mean Distance

Slide 30

Similarity Measurements

Slide 31

Similarity Measurements Similar?

Slide 32

Clustering C 1 Merge which match of groups? C 2 C 3

Slide 33

Clustering Single Linkage Dissimilarity between two groups = Minimum divergence between the individuals from two bunches + C 2 C 1 Tend to create "long chains"

Slide 34

Clustering Complete Linkage Dissimilarity between two groups = Maximum disparity between the individuals from two groups + C 2 C 1 Tend to produce "bunches"

Slide 35

Clustering Average Linkage Dissimilarity between two groups = Averaged separations of all sets of items (one from each group). + C 2 C 1

Slide 36

Clustering Average Group Linkage Dissimilarity between two bunches = Distance between two bunch implies. + C 2 C 1

Slide 37

Considerations What qualities are utilized to group tests? Expression variety Inherent variety Prior information (unessential qualities) Etc.

Slide 38

Take Home Questions Which grouping technique is better? How to slice the grouping tree to get moderately tight bunches of qualities or tests?

SPONSORS