0

0

1468 days ago,
694 views

PowerPoint PPT Presentation
2. Direct Functions. Selective OR: . y = (x1 ? x2) v (x1 ? x2). Non-paltry DNF: . y = (x1 ? x2) v (x3 ? x4) . . . . 3. Straight Functions. . . . . w

CS546: Machine Learning and Natural Language Lecture 7: Introduction to Classification: Linear Learning Algorithms 2009

{ 1 if w 1 x 1 + w 2 x 2 +. . . w n x n >= 0 Otherwise f (x) = y = x 1 x 3 x 5 Disjunctions: y = ( 1• x 1 + 1• x 3 + 1• x 5 >= 1) y = no less than 2 of { x 1 , x 3 , x 5 } At minimum m of n: y = ( 1• x 1 + 1• x 3 + 1• x 5 >=2) Linear Functions y = ( x 1 x 2 ) v ( x 1 x 2 ) Exclusive-OR: y = ( x 1 x 2 ) v ( x 3 x 4 ) Non-paltry DNF:

Linear Functions w ¢ x = - w ¢ x = 0

1 2 T 7 3 4 5 6 Perceptron learning principle On-line, botch driven calculation. Rosenblatt (1959) recommended that when an objective yield esteem is accommodated a solitary neuron with settled information, it can incrementally change weights and figure out how to create the yield utilizing the Perceptron learning standard Perceptron == Linear Threshold Unit

Given Labeled illustrations: Initialize w=0 2. Go through all cases a. Anticipate the name of occasion x to be y' = sgn{wx) b. In the event that y'y, overhaul the weight vector: w = w + r y x (r - a consistent, learning rate) Otherwise, if y'=y, leave weights unaltered. Perceptron learning standard We learn f:X {-1,+1} spoke to as f = sgn{wx) Where X= or X= w

Footnote About the Threshold On past slide, Perceptron has no limit But we don't lose sweeping statement:

Geometric View

Initialize w=0 2. Push through all cases a. Foresee the name of occasion x to be y' = sgn{wx) b. In the event that y'y, redesign the weight vector to w = w + r y x (r - a steady, learning rate) Otherwise, if y'=y, leave weights unaltered. Perceptron learning principle If x is Boolean, just weights of dynamic elements are redesigned.

Perceptron Learnability Obviously can't realize what it can't speak to Only straightly distinct capacities Minsky and Papert (1969) composed a persuasive book exhibiting Perceptron's representational impediments Parity capacities can't be scholarly (XOR) In vision, if examples are spoken to with neighborhood highlights, can't speak to symmetry, availability Research on Neural Networks ceased for a considerable length of time Rosenblatt himself (1959) asked, "What design acknowledgment issues can be changed in order to end up distinctly directly distinguishable?"

( x 1 x 2 ) v ( x 3 x 4 ) y 1 y 2

Perceptron Convergence Perceptron Convergence Theorem: If there exist an arrangement of weights that are predictable with the (I.e., the information is straightly detachable) the perceptron learning calculation will unite - How long would it take to join ? Perceptron Cycling Theorem: If the preparation information is not straightly the perceptron learning calculation will in the end rehash a similar arrangement of weights and in this way enter an unbounded circle. - How to give strength, greater expressivity ?

Perceptron: Mistake Bound Theorem Maintains a weight vector w R N , w 0 = (0,… ,0). After accepting an illustration x R N Predicts as per the direct edge work w•x 0. Hypothesis [Novikoff,1963] Let (x 1 ; y 1 ),… ,: (x t ; y t ), be a grouping of named cases with x i R N , x i R and y i {-1,1} for all i. Let u R N , > 0 be with the end goal that, || u || = 1 and y i u • x i for all i. At that point Perceptron makes at most R 2/ 2 botches on this illustration succession . (see extra notes) Margin Complexity Parameter

Perceptron-Mistake Bound Proof: Let v k be the theory before the k - th botch. Accept that the k - th botch happens on the info case ( x i , y i ). Suspicions v 1 = 0 ||u|| ≤ 1 y i u • x i Multiply by u By meaning of u By acceptance Projection K < R 2/ 2

Perceptron for Boolean Functions what number slip-ups will the Perceptron calculations make when taking in a k-disjunction? It can make O(n) botches on k-disjunction on n properties. Our bound: R 2/ 2 w : 1/k 1/2 – for k segments, 0 for others, : contrast just in one variable : 1/k ½ R : n 1/2 Thus, we get : n k Is it conceivable to improve? This is vital if n — the quantity of elements is vast

Winnow Algorithm The Winnow Algorithm learns Linear Threshold Functions. For the class of disjunction, rather than downgrade we can utilize end .

Winnow - Example

Winnow - Example Notice that a similar calculation will take in a conjunction over these factors ( w =(256,256,0,… 32,… 256,256) )

Winnow - Mistake Bound Claim : Winnow makes O(k log n) botches on k - disjunctions u - # of missteps on positive illustrations ( advancements ) v - # of slip-ups on negative cases ( downgrades )

Winnow - Mistake Bound Claim : Winnow makes O(k log n) botches on k - disjunctions u - # of slip-ups on positive cases ( advancements ) v - # of errors on negative cases ( downgrades ) 1. u < k log(2n)

Winnow - Mistake Bound Claim : Winnow makes O(k log n) botches on k - disjunctions u - # of oversights on positive cases ( advancements ) v - # of missteps on negative illustrations ( downgrades ) 1. u < k log(2n) A weight that relates to a decent factor is just advanced. At the point when these weights get to n there will no more oversights on positives

Winnow - Mistake Bound u - # of mix-ups on positive illustrations ( advancements ) v - # of missteps on negative cases ( downgrades ) 2. v < 2( u + 1)

Winnow - Mistake Bound u - # of oversights on positive cases ( advancements ) v - # of errors on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight: TW=n at first

Winnow - Mistake Bound u - # of slip-ups on positive cases ( advancements ) v - # of missteps on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight: TW=n at first Mistake on positive: TW(t+1) < TW(t) + n

Winnow - Mistake Bound u - # of mix-ups on positive cases ( advancements ) v - # of slip-ups on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight TW=n at first Mistake on positive: TW(t+1) < TW(t) + n Mistake on negative: TW(t+1) < TW(t) - n/2

Winnow - Mistake Bound u - # of errors on positive cases ( advancements ) v - # of oversights on negative illustrations ( downgrades ) 2. v < 2( u + 1) Total weight TW=n at first Mistake on positive: TW(t+1) < TW(t) + n Mistake on negative: TW(t+1) < TW(t) - n/2 0 < TW < n + u n - v n/2 v < 2( u +1)

Winnow - Mistake Bound u - # of errors on positive illustrations ( advancements ) v - # of slip-ups on negative cases ( downgrades ) # of mix-ups: u + v < 3 u + 2 = O(k log n)

Winnow - Extensions This calculation learns monotone capacities in Boolean polynomial math sense For the general case: - Duplicate factors For the nullification of variable x, present another variable y. Learn monotone capacities more than 2n factors - Balanced rendition: Keep two weights for every variable; powerful weight is the distinction

Winnow - A Robust Variation Winnow is vigorous within the sight of different sorts of clamor. (characterization commotion, quality clamor) Importance: here and there we learn under some dispersion yet test under a somewhat extraordinary one. (e.g., characteristic dialect applications)

Winnow - A Robust Variation Modeling: Adversary's turn: may change the objective idea by including or expelling some factor from the objective disjunction. Cost of every expansion move is 1 . Learner's turn: makes expectation on the illustrations given, and is then told the right answer (as indicated by current target work) Winnow-R: Same as Winnow, just doesn't release weights underneath 1/2 Claim: Winnow-R makes O(c log n) botches, ( c - cost of foe) (speculation of past claim)

Additive overhaul calculations: Perceptron SVM (not on-line, but rather a nearby relative of Perceptron) Multiplicative upgrade calculations: Winnow Close relatives: Boosting; Max Entropy Algorithmic Approaches Focus: Two groups of calculations (one of the on-line agent) Which Algorithm to pick?

Additive weight overhaul calculation (Perceptron, Rosenblatt, 1958. Varieties exist) Algorithm Descriptions Multiplicative weight redesign calculation (Winnow, Littlestone, 1988. Varieties exist)

How to Compare? Speculation (since the portrayal is the same) what number illustrations are expected to get to a given level of exactness? Proficiency How long does it take to take in a theory and assess it (per-illustration)? Heartiness; Adaptation to another area, … .

Sentence Representation S= I don't know whether to snicker or cry - Define an arrangement of components: elements are relations that hold in the sentence - Map a sentence to its element based portrayal The element based portrayal will give a portion of the data in the sentence - Use this for instance to your calculation

Sentence Representation S= I don't know whether to chuckle or cry -

SPONSORS

No comments found.

SPONSORS

SPONSORS