Prologue to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005
Slide 2Data 1 Data 2 … Data N Background Cell/Tissue 1 Cell/Tissue 2 … Cell/Tissue N Put comparable specimens/passages together.
Slide 3Background Clustering is a standout amongst the most vital unsupervised learning forms that sorting out items into gatherings whose individuals are comparative somehow. Grouping discovers structures in an accumulation of unlabeled information. A group is an accumulation of items which are comparative amongst them and are not at all like the articles having a place with different bunches.
Slide 4Motivation I Microarray information quality checking Does repeats group together? Does comparative conditions, time focuses, tissue sorts bunch together?
Slide 5Data: Rat Schizophrenia Data (Allen Fienberg and Mayetri Gupta) Two time points:35 days (PD 35) and 60 days (PD60) past birth. Two mind areas: Prefrontal cortex (PFC) and Nucleus accumbens (NA). Two reproduces (Samples are from a similar arrangement of tissue split into various tubes so that imitates ought to be in close understanding.) dChip was utilized to standardize the information and get show based expression values, utilizing the full PM/MM display. Test IDs How to peruse this bunching result? Quality IDs Clustering comes about Heat delineate length Problem?
Slide 6Motivation II Cluster qualities Prediction of elements of obscure qualities by known ones
Slide 7Functional noteworthy quality groups Two-way bunching Sample groups Gene groups
Slide 8Motivation II Cluster qualities Prediction of elements of obscure qualities by known ones Cluster tests Discover clinical attributes (e.g. survival, marker status) shared by tests.
Slide 9Bhattacharjee et al. (2001) Human lung carcinomas mRNA expression profiling uncovers unmistakable adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA, Vol. 98, 13790-13795.
Slide 10Motivation II Cluster qualities Prediction of elements of obscure qualities by known ones Cluster tests Discover clinical attributes (e.g. survival, marker status) shared by tests Promoter examination of normally controlled qualities
Slide 11Promoter investigation of regularly managed qualities David J. Lockhart & Elizabeth A. Winzeler, NATURE | VOL 405 | 15 JUNE 2000, p827
Slide 12Clustering Algorithms Start with an accumulation of n questions each spoke to by a p–dimensional include vector x i , i=1, … n . The objective is to separate these n objects into k bunches so that items inside a groups are more "comparable" than articles between bunches. k is generally obscure. Famous strategies: various leveled, k-implies, SOM, blend models, and so forth
Slide 13Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt
Slide 14Hierarchical Clustering (Cont.) Multilevel bunching: level 1 has n groups level n has one bunch. Agglomerative HC: begins with singleton and union groups. Divisive HC: begins with one example and split groups.
Slide 15Nearest Neighbor Algorithm Nearest Neighbor Algorithm is an agglomerative approach (base up). Begins with n hubs ( n is the extent of our specimen), blends the 2 most comparable hubs at each progression, and stops when the wanted number of bunches is come to. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt
Slide 16Nearest Neighbor, Level 2, k = 7 groups . From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt
Slide 17Nearest Neighbor, Level 3, k = 6 groups .
Slide 18Nearest Neighbor, Level 4, k = 5 bunches .
Slide 19Nearest Neighbor, Level 5, k = 4 bunches .
Slide 20Nearest Neighbor, Level 6, k = 3 groups .
Slide 21Nearest Neighbor, Level 7, k = 2 groups .
Slide 22Nearest Neighbor, Level 8, k = 1 bunch .
Slide 23Hierarchical Clustering Calculate the closeness between every conceivable mix of two profiles Keys Similarity Clustering Two most comparable bunches are assembled together to shape another group Calculate the comparability between the new group and every residual bunch.
Slide 24Similarity Measurements Pearson Correlation Two profiles (vectors) and +1 Pearson Correlation – 1
Slide 25Similarity Measurements Pearson Correlation: Trend Similarity
Slide 26Similarity Measurements Euclidean Distance
Slide 27Similarity Measurements Euclidean Distance: Absolute distinction
Slide 28Similarity Measurements Cosine Correlation +1 Cosine Correlation – 1
Slide 29Similarity Measurements Cosine Correlation: Trend + Mean Distance
Slide 30Similarity Measurements
Slide 31Similarity Measurements Similar?
Slide 32Clustering C 1 Merge which match of groups? C 2 C 3
Slide 33Clustering Single Linkage Dissimilarity between two groups = Minimum divergence between the individuals from two bunches + C 2 C 1 Tend to create "long chains"
Slide 34Clustering Complete Linkage Dissimilarity between two groups = Maximum disparity between the individuals from two groups + C 2 C 1 Tend to produce "bunches"
Slide 35Clustering Average Linkage Dissimilarity between two groups = Averaged separations of all sets of items (one from each group). + C 2 C 1
Slide 36Clustering Average Group Linkage Dissimilarity between two bunches = Distance between two bunch implies. + C 2 C 1
Slide 37Considerations What qualities are utilized to group tests? Expression variety Inherent variety Prior information (unessential qualities) Etc.
Slide 38Take Home Questions Which grouping technique is better? How to slice the grouping tree to get moderately tight bunches of qualities or tests?
SPONSORS
SPONSORS
SPONSORS