Pecan Analysis data - BioQUEST€¦ · Web viewPecan Analysis data. ... Combined (in word) as...

5/14/23 7:32 AM

Pecan Analysis data

Got excel file NAmPecanAll.xls

Made into ARFF format

Deleted columns that were categoricalSOILSOIL2

Made all cols numeric except a fewSEASON 1,2,3,4CLASS D,WPecan 0,1

Made pecans.arffHeader from the “variable names” worksheetExported the data as data.csvCombined (in word) as Pecans.arff

Trouble at line 191If we include thru case 4, it’s oK (pecans-small.arff)If thru 100, not OK (pecans-small2.arff)1 – 89 pecans-small3 no good1-79 pecans-small4 – OKtherefore b/w 80 & 89

looked in excel file#83 starts to be real # not 1-4fixed in data.csv only

blew up on line 2898fixed RCORR on station 2790

was blank made it 1.0000

fixed only in pecans.arff

blew up on 2921 = station 2813it was formatted as 1,032.67removed the commas in data.csvalso fixed 2790

re-made pecans.arff

ran with weka explorer J48 –C 0.25 –M 2

remember to give extra memory java –Xmx300m –jar weka.jar

1 of 27

5/14/23 7:32 AMJ48 classifier – ran 10-fold validation in about 5 mins=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1Instances: 4637Attributes: 103 [list of attributes omitted]Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree------------------

MWM <= 24.28: 0 (2358.0/7.0)MWM > 24.28| RLOW <= 20.32: 0 (710.0/3.0)| RLOW > 20.32| | LPTOAE <= 0.068| | | TRANGE <= 24.63| | | | PERWRET <= 0.5714| | | | | PERWLTG <= 0.375: 0 (14.0)| | | | | PERWLTG > 0.375| | | | | | COKLM <= 35.9: 0 (4.0/1.0)| | | | | | COKLM > 35.9: 1 (12.0)| | | | PERWRET > 0.5714| | | | | RLOW <= 89.92| | | | | | RRCORR3 <= 7.5| | | | | | | LCRR <= 3.2087: 0 (560.0/1.0)| | | | | | | LCRR > 3.2087| | | | | | | | ELEV <= 15| | | | | | | | | MWM <= 27.67: 0 (5.0)| | | | | | | | | MWM > 27.67: 1 (6.0/1.0)| | | | | | | | ELEV > 15: 0 (28.0)| | | | | | RRCORR3 > 7.5| | | | | | | SEASON = 1: 0 (0.0)| | | | | | | SEASON = 2: 0 (0.0)| | | | | | | SEASON = 3| | | | | | | | ELEV <= 413: 1 (6.0)| | | | | | | | ELEV > 413: 0 (5.0)| | | | | | | SEASON = 4: 0 (24.0/1.0)| | | | | RLOW > 89.92| | | | | | RRCORR3 <= 2.5: 0 (28.0)| | | | | | RRCORR3 > 2.5| | | | | | | PGROW <= 29| | | | | | | | LPET <= 3.028| | | | | | | | | ELEV <= 290| | | | | | | | | | WSTORAGE <= 120.4052: 1 (10.0)| | | | | | | | | | WSTORAGE > 120.4052| | | | | | | | | | | TEMP <= 55.6976: 1 (8.0)| | | | | | | | | | | TEMP > 55.6976| | | | | | | | | | | | AVWAT <= 7: 0 (5.0)| | | | | | | | | | | | AVWAT > 7| | | | | | | | | | | | | SUCSTAB <= 0.0111: 1 (4.0)| | | | | | | | | | | | | SUCSTAB > 0.0111: 0 (4.0)| | | | | | | | | ELEV > 290: 0 (4.0)| | | | | | | | LPET > 3.028: 0 (6.0)| | | | | | | PGROW > 29: 0 (6.0)| | | TRANGE > 24.63

2 of 27

5/14/23 7:32 AM| | | | LPTOWATR <= 1.0465| | | | | ELEV <= 1051| | | | | | LEXPREY <= 2.7946| | | | | | | LCOKLM <= 2.3347: 0 (16.0)| | | | | | | LCOKLM > 2.3347| | | | | | | | LCMAT <= 1.6691| | | | | | | | | LBIO5 <= 4.3056: 1 (3.0)| | | | | | | | | LBIO5 > 4.3056| | | | | | | | | | SEASON = 1| | | | | | | | | | | SUCSTAB <= 0.0118| | | | | | | | | | | | COKLM <= 336.8: 0 (7.0)| | | | | | | | | | | | COKLM > 336.8| | | | | | | | | | | | | ELEV <= 725: 1 (3.0)| | | | | | | | | | | | | ELEV > 725: 0 (7.0)| | | | | | | | | | | SUCSTAB > 0.0118: 1 (3.0)| | | | | | | | | | SEASON = 2: 0 (13.0)| | | | | | | | | | SEASON = 3: 0 (2.0)| | | | | | | | | | SEASON = 4: 0 (0.0)| | | | | | | | LCMAT > 1.6691| | | | | | | | | WATRGRC <= 4| | | | | | | | | | WATDGRC <= 2: 0 (4.0)| | | | | | | | | | WATDGRC > 2| | | | | | | | | | | REVEN <= 1.3688| | | | | | | | | | | | EXPREY <= 292.2337: 1 (2.0)| | | | | | | | | | | | EXPREY > 292.2337: 0 (11.0/1.0)| | | | | | | | | | | REVEN > 1.3688: 1 (4.0)| | | | | | | | | WATRGRC > 4| | | | | | | | | | MCM <= -1.17| | | | | | | | | | | WATDGRC <= 1| | | | | | | | | | | | EXPREY <= 464.7716| | | | | | | | | | | | | TEMP <= 46.4227: 0 (3.0)| | | | | | | | | | | | | TEMP > 46.4227: 1 (2.0)| | | | | | | | | | | | EXPREY > 464.7716: 1 (16.0/2.0)| | | | | | | | | | | WATDGRC > 1| | | | | | | | | | | | TEMP <= 47.1188: 1 (16.0)| | | | | | | | | | | | TEMP > 47.1188: 0 (2.0)| | | | | | | | | | MCM > -1.17| | | | | | | | | | | MEDSTAB <= 0.054| | | | | | | | | | | | RUNGRC <= 6| | | | | | | | | | | | | WATRGRC <= 5: 0 (4.0/1.0)| | | | | | | | | | | | | WATRGRC > 5| | | | | | | | | | | | | | RRCORR <= -1.5| | | | | | | | | | | | | | | WSTORAGE <= 155.083: 1 (2.0)| | | | | | | | | | | | | | | WSTORAGE > 155.083: 0 (5.0/1.0)| | | | | | | | | | | | | | RRCORR > -1.5: 1 (3.0)| | | | | | | | | | | | RUNGRC > 6: 1 (5.0)| | | | | | | | | | | MEDSTAB > 0.054: 0 (13.0/2.0)| | | | | | LEXPREY > 2.7946: 0 (18.0)| | | | | ELEV > 1051: 0 (44.0/1.0)| | | | LPTOWATR > 1.0465| | | | | ELEV <= 710: 1 (14.0)| | | | | ELEV > 710: 0 (3.0/1.0)| | LPTOAE > 0.068| | | LCOKLM <= 1.8716: 0 (46.0)| | | LCOKLM > 1.8716| | | | ELEV <= 1205| | | | | CLASS = D| | | | | | CVRAIN <= 42.1182: 1 (6.0/1.0)

3 of 27

5/14/23 7:32 AM| | | | | | CVRAIN > 42.1182: 0 (16.0/1.0)| | | | | CLASS = W| | | | | | AVWAT <= 6| | | | | | | LCRR <= 2.9411| | | | | | | | PERWLTG <= 0.5714| | | | | | | | | PTORUN <= 2.633: 0 (2.0)| | | | | | | | | PTORUN > 2.633: 1 (12.0)| | | | | | | | PERWLTG > 0.5714| | | | | | | | | WATD <= 304.7457: 0 (12.0)| | | | | | | | | WATD > 304.7457| | | | | | | | | | RHIGH <= 115.1: 0 (9.0/2.0)| | | | | | | | | | RHIGH > 115.1: 1 (6.0)| | | | | | | LCRR > 2.9411| | | | | | | | LWATD <= 2.2506| | | | | | | | | WLTGRC <= 0| | | | | | | | | | MWM <= 25.11: 0 (2.0)| | | | | | | | | | MWM > 25.11| | | | | | | | | | | PGROW <= 26: 1 (16.0)| | | | | | | | | | | PGROW > 26| | | | | | | | | | | | RLOW <= 64.26| | | | | | | | | | | | | REVEN <= 1.3928: 0 (4.0)| | | | | | | | | | | | | REVEN > 1.3928: 1 (5.0/1.0)| | | | | | | | | | | | RLOW > 64.26: 1 (9.0)| | | | | | | | | WLTGRC > 0: 0 (2.0)| | | | | | | | LWATD > 2.2506: 1 (163.0/6.0)| | | | | | AVWAT > 6| | | | | | | PERWLTG <= 0.375| | | | | | | | RRCORR <= -3.5| | | | | | | | | SNOWAC <= 37.9349| | | | | | | | | | SUCSTAB <= 0.0099| | | | | | | | | | | ELEV <= 307: 1 (7.0)| | | | | | | | | | | ELEV > 307: 0 (8.0/1.0)| | | | | | | | | | SUCSTAB > 0.0099: 0 (21.0)| | | | | | | | | SNOWAC > 37.9349: 1 (2.0)| | | | | | | | RRCORR > -3.5| | | | | | | | | LATITUDE <= 38.73| | | | | | | | | | WATDGRC <= 2| | | | | | | | | | | RHIGH <= 154.43| | | | | | | | | | | | RHIGH <= 121.41: 0 (2.0)| | | | | | | | | | | | RHIGH > 121.41: 1 (26.0)| | | | | | | | | | | RHIGH > 154.43: 0 (4.0)| | | | | | | | | | WATDGRC > 2| | | | | | | | | | | LREVEN <= 0.0918: 1 (24.0/1.0)| | | | | | | | | | | LREVEN > 0.0918| | | | | | | | | | | | WSTORAGE <= 136.7894: 1 (36.0/7.0)| | | | | | | | | | | | WSTORAGE > 136.7894| | | | | | | | | | | | | SEASON = 1| | | | | | | | | | | | | | MWM <= 27.5: 0 (6.0)| | | | | | | | | | | | | | MWM > 27.5| | | | | | | | | | | | | | | LATITUDE <= 36.63: 1 (4.0)| | | | | | | | | | | | | | | LATITUDE > 36.63: 0 (2.0)| | | | | | | | | | | | | SEASON = 2: 0 (5.0)| | | | | | | | | | | | | SEASON = 3| | | | | | | | | | | | | | ELEV <= 513| | | | | | | | | | | | | | | WRET <= 107.2064: 1 (7.0)| | | | | | | | | | | | | | | WRET > 107.2064: 0 (2.0)| | | | | | | | | | | | | | ELEV > 513: 0 (3.0)

4 of 27

5/14/23 7:32 AM| | | | | | | | | | | | | SEASON = 4| | | | | | | | | | | | | | REVEN <= 1.307: 1 (9.0/2.0)| | | | | | | | | | | | | | REVEN > 1.307: 0 (9.0)| | | | | | | | | LATITUDE > 38.73| | | | | | | | | | EXPREY <= 366.9686: 1 (3.0)| | | | | | | | | | EXPREY > 366.9686| | | | | | | | | | | WATRGRC <= 3| | | | | | | | | | | | ELEV <= 556: 1 (3.0)| | | | | | | | | | | | ELEV > 556: 0 (5.0)| | | | | | | | | | | WATRGRC > 3: 0 (9.0)| | | | | | | PERWLTG > 0.375: 1 (14.0)| | | | ELEV > 1205| | | | | PERWRET <= 0.6364| | | | | | LET <= 1.1751: 0 (52.0)| | | | | | LET > 1.1751| | | | | | | MWM <= 27.83: 0 (9.0)| | | | | | | MWM > 27.83| | | | | | | | TRANGE <= 20.88: 1 (4.0)| | | | | | | | TRANGE > 20.88: 0 (2.0)| | | | | PERWRET > 0.6364| | | | | | RHIGH <= 126.24| | | | | | | WLTGRC <= 0: 0 (8.0/1.0)| | | | | | | WLTGRC > 0: 1 (2.0)| | | | | | RHIGH > 126.24: 1 (7.0)

Number of Leaves : 96

Size of the tree : 185

Time taken to build model: 13.38 seconds

=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 4368 94.1988 %Incorrectly Classified Instances 269 5.8012 %Kappa statistic 0.6889Mean absolute error 0.0635Root mean squared error 0.2306Relative absolute error 33.6934 %Root relative squared error 75.1416 %Total Number of Instances 4637

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class 0.969 0.287 0.966 0.969 0.968 0 0.713 0.031 0.73 0.713 0.721 1

=== Confusion Matrix ===

a b <-- classified as 4020 129 | a = 0 140 348 | b = 1

Conclusion:

5 of 27

5/14/23 7:32 AM

94% accurate!!!Kappa is low because the pecans are rare in the data set.

Should be able to do this on the command line and get the classified instances(looked in the Weka tutorial)

in the weka directory

java –mx300m weka.classifiers.trees.J48 – C 0.25 – M 2 –t ../PecanData/pecans.arff -d ../PecanData/J48-classifier.model

doesn’t work from command linecan’t find class weka/classifiers/trees/J48

hmmm...tryand also add in stuff –i –k to get more info

java -cp weka.jar -mx300m weka.classifiers.trees.J48 -C 0.25 -M 2 -t ../PecanData/peca.arff -i -k -d ../PecanData/J48-classifier.model

worked!Time taken to build model: 12.72 secondsTime taken to test model on training data: 0.1 seconds

=== Error on training data ===

Correctly Classified Instances 4587 98.9217 %Incorrectly Classified Instances 50 1.0783 %Kappa statistic 0.9427K&B Relative Info Score 412419.0911 %K&B Information Score 2004.0102 bits 0.4322 bits/instanceClass complexity | order 0 2250.7576 bits 0.4854 bits/instanceClass complexity | scheme 277.1448 bits 0.0598 bits/instanceComplexity improvement (Sf) 1973.6128 bits 0.4256 bits/instanceMean absolute error 0.019 Root mean squared error 0.0974Relative absolute error 10.0721 %Root relative squared error 31.7479 %Total Number of Instances 4637



6 of 27

5/14/23 7:32 AM=== Confusion Matrix ===


=== Stratified cross-validation ===

Correctly Classified Instances 4373 94.3067 %Incorrectly Classified Instances 264 5.6933 %Kappa statistic 0.6949K&B Relative Info Score 268786.582 %K&B Information Score 1305.6899 bits 0.2816 bits/instanceClass complexity | order 0 2250.7629 bits 0.4854 bits/instanceClass complexity | scheme 131711.8722 bits 28.4045 bits/instanceComplexity improvement (Sf) -129461.1092 bits -27.9192 bits/instanceMean absolute error 0.0629Root mean squared error 0.2301Relative absolute error 33.3937 %Root relative squared error 74.9854 %Total Number of Instances 4637





looks good!

now have classifier J48-classifier.model

7 of 27

5/14/23 7:32 AMtry to get it to classify the data

labuser% java -cp weka.jar weka.classifiers.trees.J48 -l ../PecanData/J48-classifier.model -T ../PecanData/pecans.arff -p 1

works and gives data lines like

4633 0 0.9970313825275657 0 (4634)

the values are1. the instance number (0-indexed)2. the predicted value3. the confidence in the prediction4. the actual value5. (the first attribute) – in this case, the station ID

ran to put results into J48-output.txt

opened in excel and made J48output.xlsneed to fixsince the station ID comes in as (1), it is entered as a negative #!

multiplied by -1 and copied values

8 of 27

5/14/23 7:32 AMTried 1b1 – lazy single nearest neighbor – took about 20 mins

=== Run information ===

Scheme: weka.classifiers.lazy.IB1 Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation


IB1 classifier








looks a little better

9 of 27

5/14/23 7:32 AMtry K-nearest neighbors – K = 3 (3 nearest neighbors)


Scheme: weka.classifiers.lazy.IBk -K 3 -W 0Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation


IB1 instance-based classifierusing 3 nearest neighbour(s) for classification








slightly better still

10 of 27

5/14/23 7:32 AMIt might be worth trying a “reduced error pruned tree”it is supposed to make smaller treessee if it is better.runs in less than 10 mins!!


Scheme: weka.classifiers.trees.J48 -R -N 3 -Q 1 -M 2Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation


J48 pruned tree------------------

11 of 27

5/14/23 7:32 AMMWM <= 24.5: 0 (1662.0/10.0)MWM > 24.5| RLOW <= 20.57: 0 (434.0/2.0)| RLOW > 20.57| | LPTOAE <= 0.0575| | | TRANGE <= 26.33: 0 (437.0/27.0)| | | TRANGE > 26.33| | | | EXPREY <= 540.4335| | | | | MCM <= -3.5: 0 (13.0/2.0)| | | | | MCM > -3.5| | | | | | BIO5 <= 25632.9578: 1 (25.0/2.0)| | | | | | BIO5 > 25632.9578: 0 (12.0/3.0)| | | | EXPREY > 540.4335: 0 (32.0/4.0)| | LPTOAE > 0.0575| | | LCOKLM <= 1.9957: 0 (41.0)| | | LCOKLM > 1.9957| | | | WATDGRC <= 3| | | | | LPTOAE <= 0.0751| | | | | | RRCORR <= -3.5: 0 (32.0/3.0)| | | | | | RRCORR > -3.5| | | | | | | ELEV <= 831| | | | | | | | LRRANGE <= 1.6697: 0 (6.0)| | | | | | | | LRRANGE > 1.6697| | | | | | | | | RRCORR3 <= 9| | | | | | | | | | ELEV <= 413: 1 (23.0)| | | | | | | | | | ELEV > 413| | | | | | | | | | | CLIM <= 3| | | | | | | | | | | | PTORUN <= 1.8019: 0 (18.0/8.0)| | | | | | | | | | | | PTORUN > 1.8019: 1 (12.0)| | | | | | | | | | | CLIM > 3: 0 (2.0)| | | | | | | | | RRCORR3 > 9| | | | | | | | | | WRET <= 104.2606: 0 (8.0)| | | | | | | | | | WRET > 104.2606: 1 (4.0)| | | | | | | ELEV > 831: 0 (11.0)| | | | | LPTOAE > 0.0751| | | | | | LCRR <= 2.922| | | | | | | RRCORR3 <= 3| | | | | | | | PERWRET <= 0.6364: 0 (17.0/4.0)| | | | | | | | PERWRET > 0.6364| | | | | | | | | COKLM <= 693: 1 (4.0)| | | | | | | | | COKLM > 693: 0 (4.0/2.0)| | | | | | | RRCORR3 > 3: 0 (3.0)| | | | | | LCRR > 2.922: 1 (218.0/38.0)| | | | WATDGRC > 3| | | | | LET <= 1.1957| | | | | | PERWDEF <= 0.5: 0 (40.0)| | | | | | PERWDEF > 0.5| | | | | | | RLOW <= 26.67: 0 (13.0/1.0)| | | | | | | RLOW > 26.67: 1 (3.0)| | | | | LET > 1.1957| | | | | | AVWAT <= 4: 0 (11.0/4.0)| | | | | | AVWAT > 4: 1 (7.0)




12 of 27

5/14/23 7:32 AM=== Stratified cross-validation ====== Summary ===






not quite as good as the full tree but it is very fast

try other rule-generating things because they give interpretable output

try JRipran in 15 mins


Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation


JRIP rules:===========

(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)

(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)

(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)

(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)

13 of 27

5/14/23 7:32 AM(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)

(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)

(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)

(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0) => Pecan=0 (4064.0/52.0)

Number of Rules : 9








about as good as the J45.

14 of 27

5/14/23 7:32 AMfor comparison purposes, do the “null model” = zeroR (pick the majority type)


ZeroR predicts class value: 0



Correctly Classified Instances 4149 89.476 %Incorrectly Classified Instances 488 10.524 %Kappa statistic 0 Mean absolute error 0.1885Root mean squared error 0.3069Relative absolute error 100 %Root relative squared error 100 %Total Number of Instances 4637


TP Rate FP Rate Precision Recall F-Measure Class 1 1 0.895 1 0.944 0 0 0 0 0 0 1



only 89% agreement.so the others are an improvement

Get some scored data sets for mapping

1) “J48 reduced” = the one from page 11 – using “reduced error pruning”

java -cp weka.jar -mx300m weka.classifiers.trees.J48 -R -N 3 -Q 1 -M 2 -t ../PecanData/pecans.arff -i -k -d ../PecanData/J48-reduced-classifier.model

got this result

Options: -R -N 3 -Q 1 -M 2

15 of 27

5/14/23 7:32 AMJ48 pruned tree------------------

MWM <= 24.5: 0 (1662.0/10.0)MWM > 24.5| RLOW <= 20.57: 0 (434.0/2.0)| RLOW > 20.57| | LPTOAE <= 0.0575| | | TRANGE <= 26.33: 0 (437.0/27.0)| | | TRANGE > 26.33| | | | EXPREY <= 540.4335| | | | | MCM <= -3.5: 0 (13.0/2.0)| | | | | MCM > -3.5| | | | | | BIO5 <= 25632.9578: 1 (25.0/2.0)| | | | | | BIO5 > 25632.9578: 0 (12.0/3.0)| | | | EXPREY > 540.4335: 0 (32.0/4.0)| | LPTOAE > 0.0575| | | LCOKLM <= 1.9957: 0 (41.0)| | | LCOKLM > 1.9957| | | | WATDGRC <= 3| | | | | LPTOAE <= 0.0751| | | | | | RRCORR <= -3.5: 0 (32.0/3.0)| | | | | | RRCORR > -3.5| | | | | | | ELEV <= 831| | | | | | | | LRRANGE <= 1.6697: 0 (6.0)| | | | | | | | LRRANGE > 1.6697| | | | | | | | | RRCORR3 <= 9| | | | | | | | | | ELEV <= 413: 1 (23.0)| | | | | | | | | | ELEV > 413| | | | | | | | | | | CLIM <= 3| | | | | | | | | | | | PTORUN <= 1.8019: 0 (18.0/8.0)| | | | | | | | | | | | PTORUN > 1.8019: 1 (12.0)| | | | | | | | | | | CLIM > 3: 0 (2.0)| | | | | | | | | RRCORR3 > 9| | | | | | | | | | WRET <= 104.2606: 0 (8.0)| | | | | | | | | | WRET > 104.2606: 1 (4.0)| | | | | | | ELEV > 831: 0 (11.0)| | | | | LPTOAE > 0.0751| | | | | | LCRR <= 2.922| | | | | | | RRCORR3 <= 3| | | | | | | | PERWRET <= 0.6364: 0 (17.0/4.0)| | | | | | | | PERWRET > 0.6364| | | | | | | | | COKLM <= 693: 1 (4.0)| | | | | | | | | COKLM > 693: 0 (4.0/2.0)| | | | | | | RRCORR3 > 3: 0 (3.0)| | | | | | LCRR > 2.922: 1 (218.0/38.0)| | | | WATDGRC > 3| | | | | LET <= 1.1957| | | | | | PERWDEF <= 0.5: 0 (40.0)| | | | | | PERWDEF > 0.5| | | | | | | RLOW <= 26.67: 0 (13.0/1.0)| | | | | | | RLOW > 26.67: 1 (3.0)| | | | | LET > 1.1957| | | | | | AVWAT <= 4: 0 (11.0/4.0)| | | | | | AVWAT > 4: 1 (7.0)



16 of 27

5/14/23 7:32 AM

Time taken to build model: 6.6 secondsTime taken to test model on training data: 0.11 seconds

=== Error on training data ===

Correctly Classified Instances 4453 96.0319 %Incorrectly Classified Instances 184 3.9681 %Kappa statistic 0.781 K&B Relative Info Score 287357.2097 %K&B Information Score 1396.3146 bits 0.3011 bits/instanceClass complexity | order 0 2250.7576 bits 0.4854 bits/instanceClass complexity | scheme 19037.9527 bits 4.1057 bits/instanceComplexity improvement (Sf) -16787.1951 bits -3.6203 bits/instanceMean absolute error 0.0644Root mean squared error 0.1852Relative absolute error 34.1766 %Root relative squared error 60.3369 %Total Number of Instances 4637









17 of 27



looks the same as when run from explorer – good!

now, classify the pecan data

java -cp weka.jar weka.classifiers.trees.J48 -l ../PecanData/J48-reduced-classifier.model -T ../PecanData/pecans.arff -p 1 > ../PecanData/J48-reduced-output.txt

open in excel & fix to make J48-reduced-output.xls

2) Do this for the JRip from page 13 as well

java -cp weka.jar -mx300m weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1 -t ../PecanData/pecans.arff -i -k -d ../PecanData/JRip-classifier.model

it gave this output:

Options: -F 3 -N 2.0 -O 2 -S 1


(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0) => Pecan=0 (4064.0/52.0)

Number of Rules : 9

Time taken to build model: 62.44 secondsTime taken to test model on training data: 0.19 seconds

18 of 27

5/14/23 7:32 AM=== Error on training data ===

Correctly Classified Instances 4448 95.9241 %Incorrectly Classified Instances 189 4.0759 %Kappa statistic 0.799 K&B Relative Info Score 301337.6929 %K&B Information Score 1464.248 bits 0.3158 bits/instanceClass complexity | order 0 2250.7576 bits 0.4854 bits/instanceClass complexity | scheme 791.9992 bits 0.1708 bits/instanceComplexity improvement (Sf) 1458.7584 bits 0.3146 bits/instanceMean absolute error 0.0607Root mean squared error 0.1742Relative absolute error 32.1921 %Root relative squared error 56.7583 %Total Number of Instances 4637









19 of 27



now, classify the pecan data

java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/JRip-classifier.model -T ../PecanData/pecans.arff -p 1 > ../PecanData/JRip-output.txt

set up in excel

Now, try doing the J48 with only the “raw variables” – not the derived ones. This is since the tree and rule schemes seem to use derived variables mostlyit will be interesting to see if and how it works with the raw ones

The “raw” ones are:CMATCRRMWMMCMRHIGHRLOWELEVWSTORAGE COKLM


Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1-2,9,12-36,38-103Instances: 4637Attributes: 10 ELEV COKLM MWM MCM RHIGH RLOW CMAT CRR WSTORAGE PecanTest mode: 10-fold cross-validation



20 of 27

5/14/23 7:32 AM(MWM >= 25.3) and (RLOW >= 25.91) and (COKLM >= 220) and (MWM >= 26.94) and (CRR >= 909.56) and (MWM >= 27.6) => Pecan=1 (157.0/13.0)(MWM >= 24.6) and (RLOW >= 25.91) and (COKLM >= 352) and (WSTORAGE <= 136.7894) and (ELEV <= 719) and (MCM <= 1.1) => Pecan=1 (73.0/6.0)(MWM >= 25.9) and (RLOW >= 25.91) and (COKLM >= 340) and (MWM >= 26.94) and (ELEV <= 1030) => Pecan=1 (82.0/18.0)(MWM >= 26.2) and (RLOW >= 26.42) and (WSTORAGE >= 188.644) and (RLOW >= 44.96) and (RHIGH >= 114.81) => Pecan=1 (27.0/2.0)(MWM >= 24.6) and (RLOW >= 20.83) and (RLOW <= 45.47) and (MCM <= 3.56) and (RLOW >= 27.69) and (RLOW >= 41.66) => Pecan=1 (21.0/5.0)(MWM >= 24.3) and (RLOW >= 23.62) and (CRR <= 1130.55) and (RHIGH >= 119.63) and (COKLM >= 113.75) and (ELEV <= 825) and (MCM >= -3) => Pecan=1 (24.0/5.0)(MWM >= 26.5) and (RLOW >= 71.88) and (MWM >= 27.44) and (COKLM >= 15.9) and (ELEV <= 116) => Pecan=1 (51.0/17.0)(MWM >= 24.2) and (RLOW >= 20.57) and (CRR <= 1097.05) and (COKLM >= 139.74) and (ELEV <= 549) and (RLOW <= 45.21) => Pecan=1 (15.0/3.0)(MWM >= 24.1) and (WSTORAGE >= 188.644) and (RLOW >= 26.42) and (COKLM >= 563.3) and (COKLM <= 716.5) and (MCM <= 3.89) and (MWM >= 26.2) => Pecan=1 (14.0/1.0) => Pecan=0 (4173.0/94.0)

Number of Rules : 10








almost as good!!

21 of 27

5/14/23 7:32 AMso, set up a classified data set for this:

need to get data set with just raw attributesedited data.csv to data-raw.csvpasted into pecans.arff to make pecans-raw.arff

ran with explorer and JRip as before.

printed output to be sure it’s the same


Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1Relation: PresenceOfPecansInstances: 4637Attributes: 11 Site ELEV COKLM MWM MCM RHIGH RLOW CMAT CRR WSTORAGE PecanTest mode: 10-fold cross-validation



(MWM >= 25.3) and (RLOW >= 25.91) and (COKLM >= 272) and (MWM >= 27) and (ELEV <= 660) => Pecan=1 (175.0/23.0)(MWM >= 25.1) and (RLOW >= 25.91) and (WSTORAGE >= 188.644) and (RLOW >= 40.89) => Pecan=1 (76.0/11.0)(MWM >= 24.6) and (RLOW >= 28.45) and (COKLM >= 352) and (CRR <= 1263.13) and (ELEV <= 830) and (ELEV <= 605) => Pecan=1 (74.0/12.0)(MWM >= 26.17) and (RLOW >= 25.91) and (COKLM >= 507) and (Site <= 3496) and (MWM >= 26.94) and (COKLM <= 693) => Pecan=1 (30.0/3.0)(MWM >= 24.3) and (RLOW >= 20.57) and (MCM <= 0.94) and (RHIGH >= 128.52) and (WSTORAGE <= 136.7894) and (ELEV <= 690) => Pecan=1 (11.0/1.0)(MWM >= 24.4) and (RLOW >= 20.57) and (CRR <= 1107.68) and (COKLM >= 117.12) and (ELEV <= 1025) and (ELEV <= 507) and (RLOW <= 54.1) => Pecan=1 (12.0/0.0)(MWM >= 24.3) and (RLOW >= 20.57) and (MWM >= 27.44) and (RLOW >= 71.88) and (RLOW >= 91.95) and (RHIGH <= 177.8) => Pecan=1 (27.0/7.0)(MWM >= 24.4) and (RLOW >= 20.57) and (CRR <= 1107.44) and (Site <= 3159) and (RHIGH >= 128.02) and (ELEV <= 1050) and (MCM >= -3.3) => Pecan=1 (39.0/13.0)(MWM >= 24) and (RLOW >= 23.62) and (WSTORAGE <= 161.2) and (RLOW >= 72.9) and (MWM >= 27.44) and (RLOW <= 81.53) => Pecan=1 (20.0/5.0) => Pecan=0 (4173.0/99.0)

Number of Rules : 10

Time taken to build model: 22 seconds

22 of 27

5/14/23 7:32 AM







looks not exactly the same but OK.

make a classifier

java -cp weka.jar -mx300m weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1 -t ../PecanData/pecans-raw.arff -i -k -d ../PecanData/raw-JRip-classifier.model

same output as above

classify the raw data

java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/raw-JRip-classifier.model -T ../PecanData/pecans-raw.arff -p 1 > ../PecanData/raw-JRip-output.txt

made raw-JRip-output.xls

to get towards max kappa, try the lazy nearest neighbor one & run it to find best Kuse full dataset

in explorerLBK –

crossvalidate = trueKNN = 6

this will evaluate, by cross-validation, 1 – 6 nearest neighborslet it run overnight

it came up with K = 3 as the best one.(same as before)

23 of 27

5/14/23 7:32 AMlast data point – do OneR to see what it’d be & the improvement in Kappa

pecans.arff

it used Site...

so exclude that one=== Run information ===

Scheme: weka.classifiers.rules.OneR -B 6Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1Instances: 4637Attributes: 103 [list of attributes omitted]Test mode: 10-fold cross-validation


LSSTAB2:< -1.71435 -> 0< -1.71225 -> 1< -1.70995 -> 0< -1.7081499999999998 -> 1< -1.70755 -> 0< -1.7056 -> 1< -1.7053 -> 0< -1.7036 -> 1< -1.7032 -> 0< -1.7015 -> 1< -1.69895 -> 0< -1.6969 -> 1< -1.69665 -> 0< -1.6948 -> 1< -1.69455 -> 0< -1.6931500000000002 -> 1< -1.69245 -> 0< -1.6905000000000001 -> 1< -1.68595 -> 0< -1.68435 -> 1< -0.04535 -> 0< 0.10985 -> 1< 0.1921 -> 0< 0.20995 -> 1< 0.5104 -> 0< 0.52035 -> 1>= 0.52035 -> 0

(4227/4637 instances correct)


24 of 27

5/14/23 7:32 AM=== Stratified cross-validation ====== Summary ===






not very good.

try OneR on the raw data only


Scheme: weka.classifiers.rules.OneR -B 6Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1Instances: 4637Attributes: 10 ELEV COKLM MWM MCM RHIGH RLOW CMAT CRR WSTORAGE PecanTest mode: 10-fold cross-validation

25 of 27

5/14/23 7:32 AM=== Classifier model (full training set) ===

MWM:< 27.314999999999998 -> 0< 27.395 -> 1< 27.53 -> 0< 27.58 -> 1< 27.71 -> 0< 27.79 -> 1< 27.92 -> 0< 27.97 -> 1< 28.105 -> 0< 28.185000000000002 -> 1< 28.21 -> 0< 28.25 -> 1< 28.395 -> 0< 28.47 -> 1< 28.64 -> 0< 28.685000000000002 -> 1< 29.314999999999998 -> 0< 29.47 -> 1>= 29.47 -> 0

(4202/4637 instances correct)



Correctly Classified Instances 4164 89.7994 %Incorrectly Classified Instances 473 10.2006 %Kappa statistic 0.2739Mean absolute error 0.102 Root mean squared error 0.3194Relative absolute error 54.1204 %Root relative squared error 104.0799 %Total Number of Instances 4637





it used mean of the warmest month

26 of 27

5/14/23 7:32 AMSummary

Model 10x cross-validationMethod % correct Kappa % correct Kappa

ZeroR 84.98 0.000OneR 90.12 0.233

J48 – small tree 96.03 0.781 93.77 0.652J48 – large tree 98.92 0.943 94.30 0.694

JRip – raw attributes only 94.54 0.713LB1 94.72 0.721JRip 95.92 0.799 94.76 0.715LB3 95.21 0.745

did analysis of world dataput in “World” folder inside “PecanData”

used WrldRunPecan.csv as source filedid not have a last, pecan, columnused pecans.arff for header (deleted “@attribute pecan” line)

saved as world.arff

opened in explorer to check syntaxlooks OK

run with Jrip classifier from page 20

java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/JRip-classifier.model -T ../PecanData/World/world.arff -p 1 > ../PecanData/World/world-JRip-output.txt

did something strange....didn’t make predictions???

try putting in a dummy “Pecans” attribute.edit WrldRunPecan.csvre-make world.arff

try again worked AOK saved as world-JRip-output.xls

27 of 27

Date post:	02-May-2019
Category:	Documents
Upload:	lamkhanh
View:	213 times
Download:	0 times

Pecan Analysis data - BioQUEST€¦ · Web viewPecan Analysis data. ... Combined (in word) as...

Documents