0

0

1498 days ago,
592 views

PowerPoint PPT Presentation
Foundation. Cell/Tissue 1. Cell/Tissue 2. Cell/Tissue N.

Prologue to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005

Data 1 Data 2 … Data N Background Cell/Tissue 1 Cell/Tissue 2 … Cell/Tissue N Put comparable specimens/passages together.

Background Clustering is a standout amongst the most vital unsupervised learning forms that sorting out items into gatherings whose individuals are comparative somehow. Grouping discovers structures in an accumulation of unlabeled information. A group is an accumulation of items which are comparative amongst them and are not at all like the articles having a place with different bunches.

Motivation I Microarray information quality checking Does repeats group together? Does comparative conditions, time focuses, tissue sorts bunch together?

Data: Rat Schizophrenia Data (Allen Fienberg and Mayetri Gupta) Two time points:35 days (PD 35) and 60 days (PD60) past birth. Two mind areas: Prefrontal cortex (PFC) and Nucleus accumbens (NA). Two reproduces (Samples are from a similar arrangement of tissue split into various tubes so that imitates ought to be in close understanding.) dChip was utilized to standardize the information and get show based expression values, utilizing the full PM/MM display. Test IDs How to peruse this bunching result? Quality IDs Clustering comes about Heat delineate length Problem?

Motivation II Cluster qualities Prediction of elements of obscure qualities by known ones

Functional noteworthy quality groups Two-way bunching Sample groups Gene groups

Motivation II Cluster qualities Prediction of elements of obscure qualities by known ones Cluster tests Discover clinical attributes (e.g. survival, marker status) shared by tests.

Bhattacharjee et al. (2001) Human lung carcinomas mRNA expression profiling uncovers unmistakable adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA, Vol. 98, 13790-13795.

Motivation II Cluster qualities Prediction of elements of obscure qualities by known ones Cluster tests Discover clinical attributes (e.g. survival, marker status) shared by tests Promoter examination of normally controlled qualities

Promoter investigation of regularly managed qualities David J. Lockhart & Elizabeth A. Winzeler, NATURE | VOL 405 | 15 JUNE 2000, p827

Clustering Algorithms Start with an accumulation of n questions each spoke to by a p–dimensional include vector x i , i=1, … n . The objective is to separate these n objects into k bunches so that items inside a groups are more "comparable" than articles between bunches. k is generally obscure. Famous strategies: various leveled, k-implies, SOM, blend models, and so forth

Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Hierarchical Clustering (Cont.) Multilevel bunching: level 1 has n groups level n has one bunch. Agglomerative HC: begins with singleton and union groups. Divisive HC: begins with one example and split groups.

Nearest Neighbor Algorithm Nearest Neighbor Algorithm is an agglomerative approach (base up). Begins with n hubs ( n is the extent of our specimen), blends the 2 most comparable hubs at each progression, and stops when the wanted number of bunches is come to. From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Nearest Neighbor, Level 2, k = 7 groups . From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Nearest Neighbor, Level 3, k = 6 groups .

Nearest Neighbor, Level 4, k = 5 bunches .

Nearest Neighbor, Level 5, k = 4 bunches .

Nearest Neighbor, Level 6, k = 3 groups .

Nearest Neighbor, Level 7, k = 2 groups .

Nearest Neighbor, Level 8, k = 1 bunch .

Hierarchical Clustering Calculate the closeness between every conceivable mix of two profiles Keys Similarity Clustering Two most comparable bunches are assembled together to shape another group Calculate the comparability between the new group and every residual bunch.

Similarity Measurements Pearson Correlation Two profiles (vectors) and +1 Pearson Correlation – 1

Similarity Measurements Pearson Correlation: Trend Similarity

Similarity Measurements Euclidean Distance

Similarity Measurements Euclidean Distance: Absolute distinction

Similarity Measurements Cosine Correlation +1 Cosine Correlation – 1

Similarity Measurements Cosine Correlation: Trend + Mean Distance

Similarity Measurements

Similarity Measurements Similar?

Clustering C 1 Merge which match of groups? C 2 C 3

Clustering Single Linkage Dissimilarity between two groups = Minimum divergence between the individuals from two bunches + C 2 C 1 Tend to create "long chains"

Clustering Complete Linkage Dissimilarity between two groups = Maximum disparity between the individuals from two groups + C 2 C 1 Tend to produce "bunches"

Clustering Average Linkage Dissimilarity between two groups = Averaged separations of all sets of items (one from each group). + C 2 C 1

Clustering Average Group Linkage Dissimilarity between two bunches = Distance between two bunch implies. + C 2 C 1

Considerations What qualities are utilized to group tests? Expression variety Inherent variety Prior information (unessential qualities) Etc.

Take Home Questions Which grouping technique is better? How to slice the grouping tree to get moderately tight bunches of qualities or tests?

SPONSORS

No comments found.

SPONSORS

SPONSORS