The Disputed Federalist Papers : SVM Feature Selection by means of Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison, Wisconsin
Slide 2Outline of Talk Support Vector Machines (SVM) Introduction Standard Quadratic Programming Formulation 1-standard Linear SVMs SVM Feature Selection Successive Linearization Algorithm (SLA) The Disputed Federalist Papers Description of the Classification Problem Description of Previous Work Results Separating Hyperplane in Three Dimensions Only Classification Agrees with Previous Results
Slide 3What is a Support Vector Machine? An ideally characterized surface Typically nonlinear in the info space Linear in a higher dimensional space Implicitly characterized by a piece work
Slide 4What are Support Vector Machines Used For? Grouping Regression & Data Fitting Supervised & Unsupervised Learning (Will focus on order)
Slide 5Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-
Slide 6in class +1 or –1 determined by: Membership of each A m-by-m slanting framework D with +1 & - 1 sections Separate by two bouncing planes, where e is a vector of ones. Variable based math of the Classification Problem 2-Category Linearly Separable Case Given m focuses in n dimensional space Represented by a m-by-n lattice A More concisely:
Slide 7Support vectors Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-
Slide 8min s.t . where is the heaviness of the preparation mistake Maximize the edge by limiting Support Vector Machines: Quadratic Programming Formulation Solve the accompanying quadratic program:
Slide 9min s.t . min s.t . Bolster Vector Machines: Linear Programming Formulation Use the 1-standard rather than the 2-standard: This is equal to the accompanying straight program:
Slide 10Feature Selection and SVMs min s.t . Where: Use the progression capacity to smother parts of the ordinary to the isolating hyperplane:
Slide 11Smooth Approximation of the Step Function
Slide 12SVM Formulation with Feature Selection For , we utilize the guess of the progression vector by the curved exponential: Here is the base of common logarithms. This prompts to: min s.t .
Slide 13Successive Linearization Algorithm (SLA) for Feature Selection Choose . Begin with some . Having , decide the following repeat by unraveling the LP: min s.t . Stop when: Proposition: Algorithm ends in a limited number of steps ( regularly 5 to 7 ) at a stationary point.
Slide 14The Federalist Papers Written in 1787-1788 by Alexander Hamilton , John Jay and James Madison to convince the subjects of New York to confirm the constitution. Papers comprised of short expositions, 900 to 3500 words long. Origin of 12 of those papers have been in question ( Madison or Hamilton ). These papers are alluded to as the debated Federalist papers.
Slide 15Previous Work Mosteller and Wallace (1964) Using factual deduction, decided the creation of the 12 debated papers . Bosch and Smith (1998). Utilizing straight programming systems and the assessment of each conceivable mix of one, two and three elements, acquired an isolating hyperplane utilizing just three words.
Slide 16Description of the information For each paper: Machine comprehensible content was made utilizing a scanner. Processed relative frequencies of 70 words, that Mosteller-Wallace recognized as great contender for creator attribution contemplates. Each report is spoken to as a vector containing the 70 genuine numbers relating to the 70 word frequencies. The dataset comprises of 118 papers: 50 Madison papers 56 Hamilton papers 12 questioned papers
Slide 17Function Words Based on Relative Frequencies
Slide 18The parameter was acquired by a tuning methodology. SLA Feature Selection for Classifying the Disputed Federalist Papers Apply the progressive linearization calculation to: Train on the 106 Federalist papers with known creators Find an arrangement hyperplane that utilizations as few words as conceivable Use the hyperplane to order the 12 debated papers
Slide 19Hyperplane Classifier Using 3 Words A hyperplane relying upon three words was discovered: 0.5368 to +24.6634 upon +2.9532 would =66.6159 All questioned papers wound up on the Madison side of the plane
Slide 20Results: 3d plot of coming about hyperplane
Slide 21Comparison with Previous Work & Conclusion Bosch and Smith (1998) ascertained all the conceivable arrangements of one, two and three words to discover an isolating hyperplane. They tackled 118,895 straight projects. Our SLA calculation for highlight determination required the arrangement of just 6 straight projects. Our characterization of the debated Federalist papers concurs with that of Mosteller-Wallace and Bosch-Smith .
Slide 22More on SVMs: My site page: www.cs.wisc.edu/~gfung Olvi Mangasarian website page: www.cs.wisc.edu/~olvi
SPONSORS
SPONSORS
SPONSORS