Choice Tree Learning

1931 days ago, 733 views
PowerPoint PPT Presentation
Berkeley Chapter 18, p.13. . Berkeley Chapter 18, p.14. . Berkeley Chapter 18, p.14. . Berkeley Chapter 18, p.15. . Berkeley Chapter 18, p.20. Choice Tree Learning. . (Standpoint = Sunny ? Mugginess = Normal) ? (Viewpoint = Overcast) ? (Standpoint = Rain ? Wind = Weak)

Presentation Transcript

Slide 1

Choice Tree Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) Decision tree enlistment is a basic yet capable learning worldview. In this strategy an arrangement of preparing illustrations is separated into littler and littler subsets while in the meantime a related choice tree get incrementally created. Toward the finish of the learning procedure, a choice tree covering the preparation set is returned. The choice tree can be considered as a set sentences (in Disjunctive Normal Form) composed propositional rationale. A few qualities of issues that are appropriate to Decision Tree Learning are: Attribute-esteem matched components Discrete target work Disjunctive portrayals (of target capacity) Works well with absent or wrong preparing information

Slide 2

Berkeley Chapter 18, p.13

Slide 3

Berkeley Chapter 18, p.14

Slide 4

Berkeley Chapter 18, p.14

Slide 5

Berkeley Chapter 18, p.15

Slide 6

Berkeley Chapter 18, p.20

Slide 7

Decision Tree Learning (Outlook = Sunny  Humidity = Normal)  (Outlook = Overcast)  (Outlook = Rain  Wind = Weak) [See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]

Slide 8

Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No Decision Tree Learning [See: Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997]

Slide 9

Decision Tree Learning Building a Decision Tree First test all traits and select the one that would work as the best root; Break-up the preparation set into subsets in light of the branches of the root hub; Test the rest of the ascribes to see which ones fit best underneath the branches of the root hub; Continue this procedure for all different branches until all cases of a subset are of one write there are no cases left (return dominant part grouping of the parent) there are no more qualities left (default esteem ought to be lion's share characterization)

Slide 10

Decision Tree Learning Determining which characteristic is ideal (Entropy & Gain) Entropy (E) is the base number of bits required keeping in mind the end goal to arrange a subjective case as yes or no E(S) =  c i=1 –p i log 2 p i , Where S is an arrangement of preparing illustrations, c is the quantity of classes, and p i is the extent of the preparation set that is of class i For our entropy condition 0 log 2 0 = 0 The data pick up G(S,A) where An is a property G(S,A)  E(S) -  v in Values(A) (|S v |/|S|) * E(Sv)

Slide 11

Decision Tree Learning Let's Try an Example! Let E([X+,Y-]) speak to that there are X positive preparing components and Y negative components. In this manner the Entropy for the preparation information, E(S), can be spoken to as E([9+,5-]) due to the 14 preparing illustrations 9 of them are yes and 5 of them are no .

Slide 12

Decision Tree Learning: A Simple Example Let's begin off by computing the Entropy of the Training Set. E(S) = E([9+,5-]) = (- 9/14 log 2 9/14) + (- 5/14 log 2 5/14) = 0.94

Slide 13

Decision Tree Learning: A Simple Example Next we should compute the data pick up G(S,A) for each property A where An is taken from the set {Outlook, Temperature, Humidity, Wind}.

Slide 14

Decision Tree Learning: A Simple Example The data pick up for Outlook is: G(S,Outlook) = E(S) – [5/14 * E(Outlook=sunny) + 4/14 * E(Outlook = cloudy) + 5/14 * E(Outlook=rain)] G(S,Outlook) = E([9+,5-]) – [5/14*E(2+,3-) + 4/14*E([4+,0-]) + 5/14*E([3+,2-])] G(S,Outlook) = 0.94 – [5/14*0.971 + 4/14*0.0 + 5/14*0.971] G(S,Outlook) = 0.246

Slide 15

Decision Tree Learning: A Simple Example G(S,Temperature) = 0.94 – [4/14*E(Temperature=hot) + 6/14*E(Temperature=mild) + 4/14*E(Temperature=cool)] G(S,Temperature) = 0.94 – [4/14*E([2+,2-]) + 6/14*E([4+,2-]) + 4/14*E([3+,1-])] G(S,Temperature) = 0.94 – [4/14 + 6/14*0.918 + 4/14*0.811] G(S,Temperature) = 0.029

Slide 16

Decision Tree Learning: A Simple Example G(S,Humidity) = 0.94 – [7/14*E(Humidity=high) + 7/14*E(Humidity=normal)] G(S,Humidity = 0.94 – [7/14*E([3+,4-]) + 7/14*E([6+,1-])] G(S,Humidity = 0.94 – [7/14*0.985 + 7/14*0.592] G(S,Humidity) = 0.1515

Slide 17

Decision Tree Learning: A Simple Example G(S,Wind) = 0.94 – [8/14*0.811 + 6/14*1.00] G(S,Wind) = 0.048

Slide 18

Decision Tree Learning: A Simple Example Outlook is our champ!

Slide 19

Decision Tree Learning: A Simple Example Now that we have found the base of our choice tree we should now recursively discover the hubs that ought to go beneath Sunny, Overcast, and Rain.

Slide 20

Decision Tree Learning: A Simple Example G(Outlook=Rain, Humidity) = 0.971 – [2/5*E(Outlook=Rain ^ Humidity=high) + 3/5*E(Outlook=Rain ^Humidity=normal] G(Outlook=Rain, Humidity) = 0.02 G(Outlook=Rain,Wind) = 0.971-[3/5*0 + 2/5*0] G(Outlook=Rain,Wind) = 0.971

Slide 21

Decision Tree Learning: A Simple Example Now our choice tree resembles:

Slide 22

Decision Trees: Other Issues There are various issues identified with choice tree learning (Mitchell 1997): Overfitting Avoidance Overfit Recovery (Post-Pruning) Working with Continuous Valued Attributes Other Methods for Attribute Selection Working with Missing Values Most regular esteem Most basic incentive at Node K Value in view of likelihood Dealing with Attributes with Different Costs

Slide 23

Decision Tree Learning: Other Related Issues Overfitting when our learning calculation proceeds create speculations that decrease preparing set mistake at the cost of an expanded test set blunder. As indicated by Mitchell, a theory, h, is said to overfit the preparation set, D, when there exists a speculation, h', that outflanks h on the aggregate dissemination of cases that D is a subset of. We can endeavor to abstain from overfitting by utilizing an approval set. On the off chance that we see that a resulting tree diminishes preparing set blunder however at the cost of an expanded approval set mistake then we know we can quit developing the tree.

Slide 24

Decision Tree Learning: Reduced Error Pruning In Reduced Error Pruning, Step 1. Develop the Decision Tree concerning the Training Set, Step 2. Haphazardly Select and Remove a Node. Step 3. Supplant the hub with its larger part grouping. Step 4. In the event that the execution of the altered tree is similarly as great or better on the approval set as the present tree then set the present tree equivalent to the changed tree. While (not done) goto Step 2.

Slide 25

Decision Tree Learning: Other Related Issues However the strategy for decision for forestalling overfitting is to utilize post-pruning. In post-pruning, we at first develop the tree in light of the preparation set without sympathy toward overfitting. Once the tree has been produced we can prune some portion of it and perceive how the subsequent tree performs on the approval set (made out of around 1/3 of the accessible preparing examples) The two sorts of Post-Pruning Methods are: Reduced Error Pruning, and Rule Post-Pruning.

Slide 26

Decision Tree Learning: Rule Post-Pruning In Rule Post-Pruning: Step 1. Develop the Decision Tree concerning the Training Set, Step 2. Change over the tree into an arrangement of principles. Step 3. Evacuate forerunners that outcome in a decrease of the approval set mistake rate. Step 4. Sort the subsequent rundown of guidelines in view of their precision and utilize this sorted rundown as a succession for characterizing concealed occasions.

Slide 27

Decision Tree Learning: Rule Post-Pruning Given the choice tree: Rule1: If (Outlook = sunny ^ Humidity = high ) Then No Rule2: If (Outlook = sunny ^ Humidity = ordinary Then Yes Rule3: If (Outlook = cloudy) Then Yes Rule4: If (Outlook = rain ^ Wind = solid) Then No Rule5: If (Outlook = rain ^ Wind = powerless) Then Yes

Slide 28

Decision Tree Learning: Other Methods for Attribute Selection The data pick up condition, G(S,A), introduced prior is one-sided toward qualities that have countless over properties that have fewer qualities. The 'Super Attributes' will effectively be chosen as the root, result in a wide tree that arranges impeccably yet performs inadequately on inconspicuous cases. We can punish qualities with extensive quantities of qualities by utilizing an option technique for property choice, alluded to as GainRatio.

Slide 29

Decision Tree Learning: Using GainRatio for Attribute Selection Let SplitInformation(S,A) = -  v i=1 (|S i |/|S|) log 2 (|S i |/|S|), where v is the quantity of estimations of Attribute A. GainRatio(S,A) = G(S,A)/SplitInformation(S,A)

Slide 30

Decision Tree Learning: Dealing with Attributes of Different Cost Sometimes the best quality for part the preparation components is expensive. With a specific end goal to settle on the general choice process more financially savvy we may wish to punish the data pick up of a property by its cost. G'(S,A) = G(S,A)/Cost(A), G'(S,A) = G(S,A) 2/Cost(A) [see Mitchell 1997], G'(S,A) = (2 G(S,A) – 1)/(Cost(A)+1) w [see Mitchell 1997]