0

0

1796 days ago,
604 views

PowerPoint PPT Presentation
Part II: Algorithms and Applications. Part I: FundamentalsPart II: Algorithms and ApplicationsSupport Vector MachinesFace and person on foot detectionAdaBoostFacesBuilding Fast ClassifiersTrading off rate for exactness

Learning and Vision: Discriminative Models Chris Bishop and Paul Viola

Part II: Algorithms and Applications Part I: Fundamentals Part II: Algorithms and Applications Support Vector Machines Face and person on foot discovery AdaBoost Faces Building Fast Classifiers Trading off speed for precision… Face and protest location Memory Based Learning Simard Moghaddam

History Lesson 1950's Perceptrons are cool Very basic learning guideline, can learn "complex" ideas Generalized perceptrons are better - an excessive number of weights 1960's Perceptron's stink (M+P) Some straightforward ideas require exponential # of components Can't in any way, shape or form discover that, correct? 1980's MLP's are cool (R+M/PDP) Sort of basic learning standard, can learn anything (?) Create only the components you require 1990 MLP's stink Hard to prepare : Slow/Local Minima 1996 Perceptron's are cool

Why did we require multi-layer perceptrons? Issues like this appear to require exceptionally complex non-linearities. Minsky and Papert demonstrated that an exponential number of elements is important to take care of bland issues.

fourteenth Order??? 120 Features Why an exponential number of components? N=21, k=5 - > 65,000 components

MLP's versus Perceptron MLP's are difficult to prepare… Takes quite a while (erratically long) Can join to poor minima MLP are difficult to comprehend What are they truly doing? Perceptrons are anything but difficult to prepare… Type of direct programming. Polynomial time. One least which is worldwide. Summed up perceptrons are less demanding to get it. Polynomial capacities.

What about straightly indistinguishable? Perceptron Training is Linear Programming Polynomial time in the quantity of factors and in the quantity of imperatives.

Support Vector Machines Rebirth of Perceptrons How to prepare viably Linear Programming (… later quadratic programming) Though on-line works incredible as well. How to get such a variety of elements reasonably?!? Portion Trick How to sum up with such a large number of elements? VC measurement. (Then again is it regularization?)

Lemma 1: Weight vectors are straightforward The weight vector lives in a sub-space spread over by the cases… Dimensionality is dictated by the quantity of cases not the many-sided quality of the space.

Lemma 2: Only need to think about cases

Simple Kernels yield Complex Features

But Kernel Perceptrons Can Generalize Poorly

Perceptron Rebirth: Generalization Too many components … Occam is miserable Perhaps we ought to support smoothness? Smoother

Linear Program is not interesting The straight program can give back any numerous of the right weight vector... Slack factors & Weight earlier - Force the arrangement toward zero

Definition of the Margin Geometric Margin: Gap amongst negatives and positives measured opposite to a hyperplane Classifier Margin

Require non-zero edge Allows arrangements with zero edge Enforces a non-zero edge amongst illustrations and the choice limit.

Constrained Optimization Find the smoothest work that isolates information Quadratic Programming (like Linear Programming) Single Minima Polynomial Time calculation

Constrained Optimization 2

SVM: cases

SVM: Key Ideas Augment contributions with a vast list of capabilities Polynomials, and so forth. Utilize Kernel Trick(TM) to do this proficiently Enforce/Encourage Smoothness with weight punishment Introduce Margin Find best arrangement utilizing Quadratic Programming

SVM: Zip Code acknowledgment Data measurement: 256 Feature Space: 4 th arrange approximately 100,000,000 darken

Larger Scale Smallest Scale The Classical Face Detection Process 50,000 Locations/Scales

Classifier is Learned from Labeled Data Training Data 5000 confronts All frontal 10 8 non confronts Faces are standardized Scale, interpretation Many varieties Across people Illumination Pose (turn both in plane and out)

Key Properties of Face Detection Each picture contains 10 - 50 thousand locs/scales Faces are uncommon 0 - 50 for every picture 1000 circumstances the same number of non-faces as confronts Extremely little # of false positives: 10 - 6

Sung and Poggio

Rowley, Baluja & Kanade First Fast System - Low Res to Hi

Osuna, Freund, and Girosi

Support Vectors

P, O, & G: First Pedestrian Work

On to AdaBoost Given an arrangement of powerless classifiers None much superior to arbitrary Iteratively consolidate classifiers Form a direct blend Training blunder focalizes to 0 rapidly Test mistake is identified with preparing edge

Weak Classifier 1 Weights Increased Weak Classifier 2 Weak classifier 3 Final classifier is straight mix of frail classifiers Freund & Shapire AdaBoost

AdaBoost Properties

AdaBoost: Super Efficient Feature Selector Features = Weak Classifiers Each round chooses the ideal element given: Previous chose highlights Exponential Loss

Boosted Face Detection: Image Features "Rectangle channels" Similar to Haar wavelets Papageorgiou, et al. Extraordinary Binary Features

Feature Selection For each round of boosting: Evaluate every rectangle channel on every illustration Sort cases by channel values Select best edge for each channel (min Z ) Select best channel/edge (= Feature) Reweight cases M channels, T edges, N cases, L learning time O( MT L(MTN) ) Naïve Wrapper Method O( MN ) Adaboost highlight selector

Example Classifier for Face Detection A classifier with 200 rectangle components was found out utilizing AdaBoost 95% right location on test set with 1 in 14084 false positives. Not exactly focused... ROC bend for 200 component classifier

% False Pos 0 50 versus false neg controlled by 50 100 % Detection T IMAGE SUB-WINDOW Classifier 2 Classifier 3 FACE Classifier 1 F NON-FACE NON-FACE NON-FACE NON-FACE Building Fast Classifiers Given a settled arrangement of classifier speculation classes Computational Risk Minimization

Other Fast Classification Work Simard Rowley (Faces) Fleuret & Geman (Faces)

Cascaded Classifier A 1 include classifier accomplishes 100% discovery rate and around half false positive rate. A 5 include classifier accomplishes 100% location rate and 40% false positive rate (20% combined) utilizing information from past stage. A 20 highlight classifier accomplish 100% recognition rate with 10% false positive rate (2% aggregate) half 20% 2% IMAGE SUB-WINDOW 5 Features 20 Features FACE 1 Feature F NON-FACE NON-FACE NON-FACE

10 31 50 65 78 95 110 167 422 Viola-Jones 78.3 85.2 88.8 90.0 90.1 90.8 91.1 91.8 93.7 Rowley-Baluja-Kanade 83.2 86.0 89.2 90.1 89.9 Schneiderman-Kanade 94.4 Roth-Yang-Ahuja (94.8) Comparison to Other Systems False Detections Detector

Output of Face Detector on Test Images

Solving other "Face" Tasks Profile Detection Facial Feature Localization Demographic Analysis

Feature Localization Surprising properties of our structure The cost of location is not an element of picture size Just the quantity of elements Learning naturally centers consideration around key areas Conclusion: the "include" indicator can incorporate a vast logical district around the component

Feature Localization Features Learned elements mirror the undertaking

Profile Detection

More Results

Profile Features

Thanks to Andrew Moore One-Nearest Neighbor … One closest neighbor for fitting is portrayed without further ado… Similar to Join The Dots with two Pros and one Con. Genius: It is anything but difficult to execute with multivariate information sources. CON: It no longer interjects locally. Ace: A great prologue to case based learning…

Thanks to Andrew Moore 1-Nearest Neighbor is a case of… . Occurrence based learning Four things make a memory based learner: A separation metric what number close-by neighbors to take a gander at? A weighting capacity (discretionary) How to fit with the neighborhood focuses? x 1 y 1 x 2 y 2 x 3 y 3 . . x n y n A capacity approximator that has been around since around 1910. To make an expectation, look database for comparable datapoints, and fit with the nearby focuses.

Thanks to Andrew Moore Nearest Neighbor Four things make a memory based learner: A separation metric Euclidian what number close-by neighbors to take a gander at? One A weighting capacity (optional) Unused How to fit with the nearby points? J ust foresee an indistinguishable yield from the closest neighbor.

Thanks to Andrew Moore Multivariate Distance Metrics Suppose the information vectors x1, x2, … xn are two dimensional: x 1 = ( x 11 , x 12 ) , x 2 = ( x 21 , x 22 ) , … x N = ( x N1 , x N2 ). One can draw the closest neighbor areas in info space. The relative scalings out there metric influence locale shapes.

Thanks to Andrew Moore Euclidean Distance Metric Other Metrics… Mahalanobis, Rank-based, Correlation-based (Stanfill+Waltz, Maes' Ringo framework… ) Or identically, where

Thanks to Andrew Moore Notable Distance Metrics

Simard: Tangent Distance

Simard: Tangent Distance

Thanks to Baback Moghaddam FERET Photobook Moghaddam & Pentland (1995)

Normalized Eigenfaces Thanks to Baback Moghaddam Eigenfaces Moghaddam & Pentland (1995)

Thanks to Baback Moghaddam Euclidean (Standard) "Eigenfaces" Turk & Pentland (1992) Moghaddam & Pentland (1995) Projects all the preparation faces onto a general eigenspace t

SPONSORS

No comments found.

SPONSORS

SPONSORS