CS546: Machine Learning and Natural Language Lecture 7: Introduction to Classification: Linear Learning Algorithms 2

0
0
1468 days ago, 694 views
PowerPoint PPT Presentation
2. Direct Functions. Selective OR: . y = (x1 ? x2) v (x1 ? x2). Non-paltry DNF: . y = (x1 ? x2) v (x3 ? x4) . . . . 3. Straight Functions. . . . . w

Presentation Transcript

Slide 1

CS546: Machine Learning and Natural Language Lecture 7: Introduction to Classification: Linear Learning Algorithms 2009

Slide 2

{ 1 if w 1 x 1 + w 2 x 2 +. . . w n x n >=  0 Otherwise f (x) = y = x 1  x 3  x 5 Disjunctions: y = ( 1• x 1 + 1• x 3 + 1• x 5 >= 1) y = no less than 2 of { x 1 , x 3 , x 5 } At minimum m of n: y = ( 1• x 1 + 1• x 3 + 1• x 5 >=2) Linear Functions y = ( x 1  x 2 ) v ( x 1  x 2 ) Exclusive-OR: y = ( x 1  x 2 ) v ( x 3  x 4 ) Non-paltry DNF:

Slide 3

Linear Functions w ¢ x =  - w ¢ x = 0

Slide 4

1 2 T 7 3 4 5 6 Perceptron learning principle On-line, botch driven calculation. Rosenblatt (1959) recommended that when an objective yield esteem is accommodated a solitary neuron with settled information, it can incrementally change weights and figure out how to create the yield utilizing the Perceptron learning standard Perceptron == Linear Threshold Unit

Slide 5

Given Labeled illustrations: Initialize w=0 2. Go through all cases a. Anticipate the name of occasion x to be y' = sgn{wx) b. In the event that y'y, overhaul the weight vector: w = w + r y x (r - a consistent, learning rate) Otherwise, if y'=y, leave weights unaltered. Perceptron learning standard We learn f:X {-1,+1} spoke to as f = sgn{wx) Where X= or X= w

Slide 6

Footnote About the Threshold On past slide, Perceptron has no limit But we don't lose sweeping statement:

Slide 7

Geometric View

Slide 11

Initialize w=0 2. Push through all cases a. Foresee the name of occasion x to be y' = sgn{wx) b. In the event that y'y, redesign the weight vector to w = w + r y x (r - a steady, learning rate) Otherwise, if y'=y, leave weights unaltered. Perceptron learning principle If x is Boolean, just weights of dynamic elements are redesigned.

Slide 12

Perceptron Learnability Obviously can't realize what it can't speak to Only straightly distinct capacities Minsky and Papert (1969) composed a persuasive book exhibiting Perceptron's representational impediments Parity capacities can't be scholarly (XOR) In vision, if examples are spoken to with neighborhood highlights, can't speak to symmetry, availability Research on Neural Networks ceased for a considerable length of time Rosenblatt himself (1959) asked, "What design acknowledgment issues can be changed in order to end up distinctly directly distinguishable?"

Slide 13

( x 1  x 2 ) v ( x 3  x 4 ) y 1  y 2

Slide 14

Perceptron Convergence Perceptron Convergence Theorem: If there exist an arrangement of weights that are predictable with the (I.e., the information is straightly detachable) the perceptron learning calculation will unite - How long would it take to join ? Perceptron Cycling Theorem: If the preparation information is not straightly the perceptron learning calculation will in the end rehash a similar arrangement of weights and in this way enter an unbounded circle. - How to give strength, greater expressivity ?

Slide 15

Perceptron: Mistake Bound Theorem Maintains a weight vector w  R N , w 0 = (0,… ,0). After accepting an illustration x  R N Predicts as per the direct edge work w•x  0. Hypothesis [Novikoff,1963] Let (x 1 ; y 1 ),… ,: (x t ; y t ), be a grouping of named cases with x i  R N ,  x i  R and y i  {-1,1} for all i. Let u  R N ,  > 0 be with the end goal that, || u || = 1 and y i u • x i   for all i. At that point Perceptron makes at most R 2/ 2 botches on this illustration succession . (see extra notes) Margin Complexity Parameter

Slide 16

Perceptron-Mistake Bound Proof: Let v k be the theory before the k - th botch. Accept that the k - th botch happens on the info case ( x i , y i ). Suspicions v 1 = 0 ||u|| ≤ 1 y i u • x i   Multiply by u By meaning of u By acceptance Projection K < R 2/ 2

Slide 17

Perceptron for Boolean Functions what number slip-ups will the Perceptron calculations make when taking in a k-disjunction? It can make O(n) botches on k-disjunction on n properties. Our bound: R 2/ 2 w : 1/k 1/2 – for k segments, 0 for others,  : contrast just in one variable : 1/k ½ R : n 1/2 Thus, we get : n k Is it conceivable to improve? This is vital if n — the quantity of elements is vast

Slide 18

Winnow Algorithm The Winnow Algorithm learns Linear Threshold Functions. For the class of disjunction, rather than downgrade we can utilize end .

Slide 19

Winnow - Example

Slide 20

Winnow - Example Notice that a similar calculation will take in a conjunction over these factors ( w =(256,256,0,… 32,… 256,256) )

Slide 21

Winnow - Mistake Bound Claim : Winnow makes O(k log n) botches on k - disjunctions u - # of missteps on positive illustrations ( advancements ) v - # of slip-ups on negative cases ( downgrades )

Slide 22

Winnow - Mistake Bound Claim : Winnow makes O(k log n) botches on k - disjunctions u - # of slip-ups on positive cases ( advancements ) v - # of errors on negative cases ( downgrades ) 1. u < k log(2n)

Slide 23

Winnow - Mistake Bound Claim : Winnow makes O(k log n) botches on k - disjunctions u - # of oversights on positive cases ( advancements ) v - # of missteps on negative illustrations ( downgrades ) 1. u < k log(2n) A weight that relates to a decent factor is just advanced. At the point when these weights get to n there will no more oversights on positives

Slide 24

Winnow - Mistake Bound u - # of mix-ups on positive illustrations ( advancements ) v - # of missteps on negative cases ( downgrades ) 2. v < 2( u + 1)

Slide 25

Winnow - Mistake Bound u - # of oversights on positive cases ( advancements ) v - # of errors on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight: TW=n at first

Slide 26

Winnow - Mistake Bound u - # of slip-ups on positive cases ( advancements ) v - # of missteps on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight: TW=n at first Mistake on positive: TW(t+1) < TW(t) + n

Slide 27

Winnow - Mistake Bound u - # of mix-ups on positive cases ( advancements ) v - # of slip-ups on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight TW=n at first Mistake on positive: TW(t+1) < TW(t) + n Mistake on negative: TW(t+1) < TW(t) - n/2

Slide 28

Winnow - Mistake Bound u - # of errors on positive cases ( advancements ) v - # of oversights on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight TW=n at first Mistake on positive: TW(t+1) < TW(t) + n Mistake on negative: TW(t+1) < TW(t) - n/2 0 < TW < n + u n - v n/2  v < 2( u +1)

Slide 29

Winnow - Mistake Bound u - # of errors on positive illustrations ( advancements ) v - # of slip-ups on negative cases ( downgrades ) # of mix-ups: u + v < 3 u + 2 = O(k log n)

Slide 30

Winnow - Extensions This calculation learns monotone capacities in Boolean polynomial math sense For the general case: - Duplicate factors For the nullification of variable x, present another variable y. Learn monotone capacities more than 2n factors - Balanced rendition: Keep two weights for every variable; powerful weight is the distinction

Slide 31

Winnow - A Robust Variation Winnow is vigorous within the sight of different sorts of clamor. (characterization commotion, quality clamor) Importance: here and there we learn under some dispersion yet test under a somewhat extraordinary one. (e.g., characteristic dialect applications)

Slide 32

Winnow - A Robust Variation Modeling: Adversary's turn: may change the objective idea by including or expelling some factor from the objective disjunction. Cost of every expansion move is 1 . Learner's turn: makes expectation on the illustrations given, and is then told the right answer (as indicated by current target work) Winnow-R: Same as Winnow, just doesn't release weights underneath 1/2 Claim: Winnow-R makes O(c log n) botches, ( c - cost of foe) (speculation of past claim)

Slide 33

Additive overhaul calculations: Perceptron SVM (not on-line, but rather a nearby relative of Perceptron) Multiplicative upgrade calculations: Winnow Close relatives: Boosting; Max Entropy Algorithmic Approaches Focus: Two groups of calculations (one of the on-line agent) Which Algorithm to pick?

Slide 34

Additive weight overhaul calculation (Perceptron, Rosenblatt, 1958. Varieties exist) Algorithm Descriptions Multiplicative weight redesign calculation (Winnow, Littlestone, 1988. Varieties exist)

Slide 35

How to Compare? Speculation (since the portrayal is the same) what number illustrations are expected to get to a given level of exactness? Proficiency How long does it take to take in a theory and assess it (per-illustration)? Heartiness; Adaptation to another area, … .

Slide 36

Sentence Representation S= I don't know whether to snicker or cry - Define an arrangement of components: elements are relations that hold in the sentence - Map a sentence to its element based portrayal The element based portrayal will give a portion of the data in the sentence - Use this for instance to your calculation

Slide 37

Sentence Representation S= I don't know whether to chuckle or cry -

SPONSORS