0

0

1919 days ago,
648 views

PowerPoint PPT Presentation
Bunching and Functional Data. Bunch Analysis: The craft of discovering gatherings in information. Focuses in the same bunch ought to be as comparable as could reasonably be expected and focuses in disjoint groups ought to be generally separatedFunctional Data: Observations for a subject comprise of bends or directions as opposed to limited dimensional vectors.Growth curvesLongitudinal estimations of clinical statusTechnology evolutionSp

Bunching Functional Data: Methods and Applications Catherine Sugar University of Southern California sugar@usc.edu This is joint work with Gareth James of USC UCLA May first, 2006

Clustering and Functional Data Cluster Analysis: The specialty of discovering gatherings in information. Focuses in a similar bunch ought to be as comparative as could reasonably be expected and focuses in disjoint groups ought to be generally isolated Functional Data: Observations for a subject comprise of bends or directions instead of limited dimensional vectors. Development bends Longitudinal estimations of clinical status Technology advancement Spectra

Outline Traditional ways to deal with grouping bends and issues with inadequate information another approach utilizing premise capacities and a blend display Applications of our approach in medication and business Tools, expansions, and model determination issues

Functional Examples Spinal Bone Mineral Density Data Technology Evolution Curves

Functional Examples: Membranous Nephropathy Data

Traditional Approaches To Functional Clustering Regularization: Form a matrix of similarly separated time focuses. Assess each bend at the time focuses, giving a limited portrayal of each bend. Apply a standard limited dimensional technique perhaps with a regularization imperative Filtering: Fit a smooth bend to each subject utilizing a limited arrangement of premise capacities, Perform grouping on the premise coefficients ( )

Problems With the Traditional Approaches Regularization: Cannot be effortlessly connected when bends are measured at various or unevenly separated time focuses or when the information are excessively scanty Even when it can be utilized, the subsequent information vectors are high-dimensional and auto-corresponded Filtering: Measurements might be excessively meager, making it impossible to fit a bend for each subject Requires fitting numerous parameters If subjects are measured at various time focuses, the premise coefficients won't have a typical covariance

Our Model Let g i (t),Y i (t) and i (t) individually be the genuine esteem, watched esteem and mistake for i th bend at time t. i.e. We speak to g(t) utilizing a characteristic cubic spline premise: where s(t) is a spline premise vector and i is the vector of spline coefficients. The coefficients are dealt with as arbitrary impacts with where z i means group participation

Our Model Our model gets to be We fit this model utilizing the watched time focuses and an EM calculation.

Fitting The Model: Bone Density Data In Two Clusters

Fitting The Model: Technology Data In Two Clusters

Model Applications I: Low Dimensional Representations One can plot utilitarian information however it is difficult to survey relative "separations" between bends We utilize the premise coefficients to venture information into a low-dimensional space where it can be plotted as focuses Projecting causes no data misfortune as far as group task The projections are correct analogs of the discriminants utilized as a part of LDA

Model Applications I: Low Dimensional Representations

Model Applications I Low Dimensional Representations: Bone Data

Model Applications I: Low Dimensional Representations: Technology Data

Model Applications I: Low Dimensional Representations: Nephrology Data

Model Applications II: Dimensions of Separation It is valuable to realize what measurements do the best employment of isolating the bunches. This relies on upon a blend of separation amongst groups and inside bunch covariance, and is identical to recognizing which measurements decide bunch task The ideal weights for bunch task are given by an expansion of the established discriminant work for a Gaussian blend:

Model Applications II: Dimensions of Separation Correlation and Covariance: Discriminating Functions:

Model Applications III: Prediction and Confidence Intervals Another favorable position of our technique is that it gives precise forecasts to missing segments of g(t) Natural gauge: The expectation with least mean squared mistake is CI's and PI's: Two stage methodology—locate the arrangement of bunches destined to contain g(t) and afterward make interims contingent on group participation

Model Applications III Prediction on Bone Data

Model Applications III Prediction on Technology Data Optical piece thickness Magnetic stockpiling bit thickness

HDD 3.5 in. capacity limit Black = Functional Clustering Red = Linear Gompertz Green = Mansfield-Blackman Cyan = Weibull Orange = Bass Blue = S-bend

A Comparison With Standard Approaches We took the initial 10 years as preparing information and attempted to anticipate the accompanying 5 years utilizing different diverse methodologies. Here we report the MSE on the left out information as a rate of that from utilizing a customary S-bend (strategic bend).

Advantages of Our Model Borrows quality from all bend pieces to all the while assess blend parameters and requires fitting less parameters. Permits one to make more precise expectations into the future in view of just a couple of perceptions. Adaptable. Can be utilized adequately when information are meager, unpredictably divided or examined at various time focuses Automatically puts the right weights on evaluated premise coefficients Can be effortlessly reached out to incorporate numerous useful and limited dimensional covariates.

Extensions I: Multiple Functional Covariates Just as limited dimensional grouping calculations can consolidate different covariates one ought to have the capacity to utilize various utilitarian factors We can do this making a square slanting spline premise grid utilizing the sections for the p singular bends: More care must be brought with the mistake structure however a similar essential model and fitting system apply.

Extensions II: Finite Dimensional Covariates It is similarly as simple to add limited dimensional covariates to the model Let X i be the vector of limited dimensional covariates. We supplant the spline premise framework, S i , by the personality, I ix The model can be fit similarly as before Note this gives a method for doing high dimensional standard grouping issues with missing information—simply erase the relating lines of the character grid

Extensions III: Dimension Reduction Reducing measurements early (e.g. by PCA) might be dangerous. Case beneath demonstrates a situation where the measurements that clarify the majority of the inconstancy are not the ones deciding group division. Our technique (right) makes a better showing with regards than

References: Bacrach, L. et al.(1999) Bone mineral Acquisition in sound Asian, Hispanic, Black, and Caucasian youth; a longitudinal review. Diary of Clinical Endocrinology & Metabolism 84, 4702-4712 Banfield, J. furthermore, Raftery, A. (1993). Show based Gaussian and non-gaussian bunching. Biometrics 49, 803-821 James, G., and Hastie, T. (2001). Useful Linear Discriminant investigation for sporadically tested bends. JRSSB 63, 533-550 James, G., Hastie, T., and Sugar, C. (2000). Chief segment models for inadequately examined useful information. Biometrika 87, 587-602 James, G. also, Sugar, C. (2003) Clustering for meagerly tested practical information. JASA, 98, 397-408 Sugar, C., and James, G. (2003) Finding the quantity of groups in an informational collection: A data theoretic approach. JASA, 98, 750-763

Model Selection Issues: Choosing the spline premise and number and arrangement of bunches Choosing the measurement of the mean space Choosing the covariance structure for the groups Choosing the quantity of bunches

How Many Clusters: Raftery et al. recommend utilizing estimated Bayes figures the limited dimensional setting We propose an approach based hypothesis from Electrical Engineering including twisting Distortion is Plot bending as an element of k, the quantity of bunches Rate contortion hypothesis recommends the type of the subsequent "mutilation bend"

How Many Clusters: Basic Results If the information are produced from a solitary group in q measurements then asymptotically the twisting bend can be made direct, particularly When there are an obscure number, K, of bunches, the opposite contortion plot will be straight both prior and then afterward K, and will encounter its most extreme seize K subject to specific conditions.

How Many Clusters: Examples The figures underneath demonstrate a changed mutilation bend when there is (an) a solitary part and (b) six segments in the creating blend appropriation

A General Functional Model Let g(t) be the bend for an arbitrarily picked person. (We will accept g(t) takes after a Gaussian procedure.) If g(t) is in the k th group we compose If Y is the vector of watched values now and again t 1 ,… ,t n and mistakes are expected autonomous then

A General Functional Model Regularization and sifting can be seen as ways to deal with fitting the general useful bunching model. The regularization approach gauges k (t) and k (t,t ') on a fine network of time focuses with basic requirements on k (t,t ') The sifting approach expect g(t) = (t) where (t) is a vector of premise capacities and is the vector of coefficients. The 's are assessed independently for every person and after that bunched

Our Model It is further helpful to parameterize the group mean coefficients as where 0 and k are separately q and h dimensional vectors and is a qh network. In the event that h < K-1 then the we are accepting the group mean coefficients lie in a limited subspace. Our model gets to be

Fitting Our Model Via EM Fitting our model includes assessing We can do this by augmenting either the characterization probability or the grouping probability, taking note of that contingent on class enrollment We can utilize an iterative proc

SPONSORS

No comments found.

SPONSORS

SPONSORS