5/14/23 7:32 AM
Pecan Analysis data
Got excel file NAmPecanAll.xls
Made into ARFF format
Deleted columns that were categoricalSOILSOIL2
Made all cols numeric except a fewSEASON 1,2,3,4CLASS D,WPecan 0,1
Made pecans.arffHeader from the “variable names” worksheetExported the data as data.csvCombined (in word) as Pecans.arff
Trouble at line 191If we include thru case 4, it’s oK (pecans-small.arff)If thru 100, not OK (pecans-small2.arff)1 – 89 pecans-small3 no good1-79 pecans-small4 – OKtherefore b/w 80 & 89
looked in excel file#83 starts to be real # not 1-4fixed in data.csv only
blew up on line 2898fixed RCORR on station 2790
was blank made it 1.0000
fixed only in pecans.arff
blew up on 2921 = station 2813it was formatted as 1,032.67removed the commas in data.csvalso fixed 2790
re-made pecans.arff
ran with weka explorer J48 –C 0.25 –M 2
remember to give extra memory java –Xmx300m –jar weka.jar
1 of 27
5/14/23 7:32 AMJ48 classifier – ran 10-fold validation in about 5 mins=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1Instances: 4637Attributes: 103 [list of attributes omitted]Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree------------------
MWM <= 24.28: 0 (2358.0/7.0)MWM > 24.28| RLOW <= 20.32: 0 (710.0/3.0)| RLOW > 20.32| | LPTOAE <= 0.068| | | TRANGE <= 24.63| | | | PERWRET <= 0.5714| | | | | PERWLTG <= 0.375: 0 (14.0)| | | | | PERWLTG > 0.375| | | | | | COKLM <= 35.9: 0 (4.0/1.0)| | | | | | COKLM > 35.9: 1 (12.0)| | | | PERWRET > 0.5714| | | | | RLOW <= 89.92| | | | | | RRCORR3 <= 7.5| | | | | | | LCRR <= 3.2087: 0 (560.0/1.0)| | | | | | | LCRR > 3.2087| | | | | | | | ELEV <= 15| | | | | | | | | MWM <= 27.67: 0 (5.0)| | | | | | | | | MWM > 27.67: 1 (6.0/1.0)| | | | | | | | ELEV > 15: 0 (28.0)| | | | | | RRCORR3 > 7.5| | | | | | | SEASON = 1: 0 (0.0)| | | | | | | SEASON = 2: 0 (0.0)| | | | | | | SEASON = 3| | | | | | | | ELEV <= 413: 1 (6.0)| | | | | | | | ELEV > 413: 0 (5.0)| | | | | | | SEASON = 4: 0 (24.0/1.0)| | | | | RLOW > 89.92| | | | | | RRCORR3 <= 2.5: 0 (28.0)| | | | | | RRCORR3 > 2.5| | | | | | | PGROW <= 29| | | | | | | | LPET <= 3.028| | | | | | | | | ELEV <= 290| | | | | | | | | | WSTORAGE <= 120.4052: 1 (10.0)| | | | | | | | | | WSTORAGE > 120.4052| | | | | | | | | | | TEMP <= 55.6976: 1 (8.0)| | | | | | | | | | | TEMP > 55.6976| | | | | | | | | | | | AVWAT <= 7: 0 (5.0)| | | | | | | | | | | | AVWAT > 7| | | | | | | | | | | | | SUCSTAB <= 0.0111: 1 (4.0)| | | | | | | | | | | | | SUCSTAB > 0.0111: 0 (4.0)| | | | | | | | | ELEV > 290: 0 (4.0)| | | | | | | | LPET > 3.028: 0 (6.0)| | | | | | | PGROW > 29: 0 (6.0)| | | TRANGE > 24.63
2 of 27
5/14/23 7:32 AM| | | | LPTOWATR <= 1.0465| | | | | ELEV <= 1051| | | | | | LEXPREY <= 2.7946| | | | | | | LCOKLM <= 2.3347: 0 (16.0)| | | | | | | LCOKLM > 2.3347| | | | | | | | LCMAT <= 1.6691| | | | | | | | | LBIO5 <= 4.3056: 1 (3.0)| | | | | | | | | LBIO5 > 4.3056| | | | | | | | | | SEASON = 1| | | | | | | | | | | SUCSTAB <= 0.0118| | | | | | | | | | | | COKLM <= 336.8: 0 (7.0)| | | | | | | | | | | | COKLM > 336.8| | | | | | | | | | | | | ELEV <= 725: 1 (3.0)| | | | | | | | | | | | | ELEV > 725: 0 (7.0)| | | | | | | | | | | SUCSTAB > 0.0118: 1 (3.0)| | | | | | | | | | SEASON = 2: 0 (13.0)| | | | | | | | | | SEASON = 3: 0 (2.0)| | | | | | | | | | SEASON = 4: 0 (0.0)| | | | | | | | LCMAT > 1.6691| | | | | | | | | WATRGRC <= 4| | | | | | | | | | WATDGRC <= 2: 0 (4.0)| | | | | | | | | | WATDGRC > 2| | | | | | | | | | | REVEN <= 1.3688| | | | | | | | | | | | EXPREY <= 292.2337: 1 (2.0)| | | | | | | | | | | | EXPREY > 292.2337: 0 (11.0/1.0)| | | | | | | | | | | REVEN > 1.3688: 1 (4.0)| | | | | | | | | WATRGRC > 4| | | | | | | | | | MCM <= -1.17| | | | | | | | | | | WATDGRC <= 1| | | | | | | | | | | | EXPREY <= 464.7716| | | | | | | | | | | | | TEMP <= 46.4227: 0 (3.0)| | | | | | | | | | | | | TEMP > 46.4227: 1 (2.0)| | | | | | | | | | | | EXPREY > 464.7716: 1 (16.0/2.0)| | | | | | | | | | | WATDGRC > 1| | | | | | | | | | | | TEMP <= 47.1188: 1 (16.0)| | | | | | | | | | | | TEMP > 47.1188: 0 (2.0)| | | | | | | | | | MCM > -1.17| | | | | | | | | | | MEDSTAB <= 0.054| | | | | | | | | | | | RUNGRC <= 6| | | | | | | | | | | | | WATRGRC <= 5: 0 (4.0/1.0)| | | | | | | | | | | | | WATRGRC > 5| | | | | | | | | | | | | | RRCORR <= -1.5| | | | | | | | | | | | | | | WSTORAGE <= 155.083: 1 (2.0)| | | | | | | | | | | | | | | WSTORAGE > 155.083: 0 (5.0/1.0)| | | | | | | | | | | | | | RRCORR > -1.5: 1 (3.0)| | | | | | | | | | | | RUNGRC > 6: 1 (5.0)| | | | | | | | | | | MEDSTAB > 0.054: 0 (13.0/2.0)| | | | | | LEXPREY > 2.7946: 0 (18.0)| | | | | ELEV > 1051: 0 (44.0/1.0)| | | | LPTOWATR > 1.0465| | | | | ELEV <= 710: 1 (14.0)| | | | | ELEV > 710: 0 (3.0/1.0)| | LPTOAE > 0.068| | | LCOKLM <= 1.8716: 0 (46.0)| | | LCOKLM > 1.8716| | | | ELEV <= 1205| | | | | CLASS = D| | | | | | CVRAIN <= 42.1182: 1 (6.0/1.0)
3 of 27
5/14/23 7:32 AM| | | | | | CVRAIN > 42.1182: 0 (16.0/1.0)| | | | | CLASS = W| | | | | | AVWAT <= 6| | | | | | | LCRR <= 2.9411| | | | | | | | PERWLTG <= 0.5714| | | | | | | | | PTORUN <= 2.633: 0 (2.0)| | | | | | | | | PTORUN > 2.633: 1 (12.0)| | | | | | | | PERWLTG > 0.5714| | | | | | | | | WATD <= 304.7457: 0 (12.0)| | | | | | | | | WATD > 304.7457| | | | | | | | | | RHIGH <= 115.1: 0 (9.0/2.0)| | | | | | | | | | RHIGH > 115.1: 1 (6.0)| | | | | | | LCRR > 2.9411| | | | | | | | LWATD <= 2.2506| | | | | | | | | WLTGRC <= 0| | | | | | | | | | MWM <= 25.11: 0 (2.0)| | | | | | | | | | MWM > 25.11| | | | | | | | | | | PGROW <= 26: 1 (16.0)| | | | | | | | | | | PGROW > 26| | | | | | | | | | | | RLOW <= 64.26| | | | | | | | | | | | | REVEN <= 1.3928: 0 (4.0)| | | | | | | | | | | | | REVEN > 1.3928: 1 (5.0/1.0)| | | | | | | | | | | | RLOW > 64.26: 1 (9.0)| | | | | | | | | WLTGRC > 0: 0 (2.0)| | | | | | | | LWATD > 2.2506: 1 (163.0/6.0)| | | | | | AVWAT > 6| | | | | | | PERWLTG <= 0.375| | | | | | | | RRCORR <= -3.5| | | | | | | | | SNOWAC <= 37.9349| | | | | | | | | | SUCSTAB <= 0.0099| | | | | | | | | | | ELEV <= 307: 1 (7.0)| | | | | | | | | | | ELEV > 307: 0 (8.0/1.0)| | | | | | | | | | SUCSTAB > 0.0099: 0 (21.0)| | | | | | | | | SNOWAC > 37.9349: 1 (2.0)| | | | | | | | RRCORR > -3.5| | | | | | | | | LATITUDE <= 38.73| | | | | | | | | | WATDGRC <= 2| | | | | | | | | | | RHIGH <= 154.43| | | | | | | | | | | | RHIGH <= 121.41: 0 (2.0)| | | | | | | | | | | | RHIGH > 121.41: 1 (26.0)| | | | | | | | | | | RHIGH > 154.43: 0 (4.0)| | | | | | | | | | WATDGRC > 2| | | | | | | | | | | LREVEN <= 0.0918: 1 (24.0/1.0)| | | | | | | | | | | LREVEN > 0.0918| | | | | | | | | | | | WSTORAGE <= 136.7894: 1 (36.0/7.0)| | | | | | | | | | | | WSTORAGE > 136.7894| | | | | | | | | | | | | SEASON = 1| | | | | | | | | | | | | | MWM <= 27.5: 0 (6.0)| | | | | | | | | | | | | | MWM > 27.5| | | | | | | | | | | | | | | LATITUDE <= 36.63: 1 (4.0)| | | | | | | | | | | | | | | LATITUDE > 36.63: 0 (2.0)| | | | | | | | | | | | | SEASON = 2: 0 (5.0)| | | | | | | | | | | | | SEASON = 3| | | | | | | | | | | | | | ELEV <= 513| | | | | | | | | | | | | | | WRET <= 107.2064: 1 (7.0)| | | | | | | | | | | | | | | WRET > 107.2064: 0 (2.0)| | | | | | | | | | | | | | ELEV > 513: 0 (3.0)
4 of 27
5/14/23 7:32 AM| | | | | | | | | | | | | SEASON = 4| | | | | | | | | | | | | | REVEN <= 1.307: 1 (9.0/2.0)| | | | | | | | | | | | | | REVEN > 1.307: 0 (9.0)| | | | | | | | | LATITUDE > 38.73| | | | | | | | | | EXPREY <= 366.9686: 1 (3.0)| | | | | | | | | | EXPREY > 366.9686| | | | | | | | | | | WATRGRC <= 3| | | | | | | | | | | | ELEV <= 556: 1 (3.0)| | | | | | | | | | | | ELEV > 556: 0 (5.0)| | | | | | | | | | | WATRGRC > 3: 0 (9.0)| | | | | | | PERWLTG > 0.375: 1 (14.0)| | | | ELEV > 1205| | | | | PERWRET <= 0.6364| | | | | | LET <= 1.1751: 0 (52.0)| | | | | | LET > 1.1751| | | | | | | MWM <= 27.83: 0 (9.0)| | | | | | | MWM > 27.83| | | | | | | | TRANGE <= 20.88: 1 (4.0)| | | | | | | | TRANGE > 20.88: 0 (2.0)| | | | | PERWRET > 0.6364| | | | | | RHIGH <= 126.24| | | | | | | WLTGRC <= 0: 0 (8.0/1.0)| | | | | | | WLTGRC > 0: 1 (2.0)| | | | | | RHIGH > 126.24: 1 (7.0)
Number of Leaves : 96
Size of the tree : 185
Time taken to build model: 13.38 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4368 94.1988 %Incorrectly Classified Instances 269 5.8012 %Kappa statistic 0.6889Mean absolute error 0.0635Root mean squared error 0.2306Relative absolute error 33.6934 %Root relative squared error 75.1416 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.969 0.287 0.966 0.969 0.968 0 0.713 0.031 0.73 0.713 0.721 1
=== Confusion Matrix ===
a b <-- classified as 4020 129 | a = 0 140 348 | b = 1
Conclusion:
5 of 27
5/14/23 7:32 AM
94% accurate!!!Kappa is low because the pecans are rare in the data set.
Should be able to do this on the command line and get the classified instances(looked in the Weka tutorial)
in the weka directory
java –mx300m weka.classifiers.trees.J48 – C 0.25 – M 2 –t ../PecanData/pecans.arff -d ../PecanData/J48-classifier.model
doesn’t work from command linecan’t find class weka/classifiers/trees/J48
hmmm...tryand also add in stuff –i –k to get more info
java -cp weka.jar -mx300m weka.classifiers.trees.J48 -C 0.25 -M 2 -t ../PecanData/peca.arff -i -k -d ../PecanData/J48-classifier.model
worked!Time taken to build model: 12.72 secondsTime taken to test model on training data: 0.1 seconds
=== Error on training data ===
Correctly Classified Instances 4587 98.9217 %Incorrectly Classified Instances 50 1.0783 %Kappa statistic 0.9427K&B Relative Info Score 412419.0911 %K&B Information Score 2004.0102 bits 0.4322 bits/instanceClass complexity | order 0 2250.7576 bits 0.4854 bits/instanceClass complexity | scheme 277.1448 bits 0.0598 bits/instanceComplexity improvement (Sf) 1973.6128 bits 0.4256 bits/instanceMean absolute error 0.019 Root mean squared error 0.0974Relative absolute error 10.0721 %Root relative squared error 31.7479 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.994 0.051 0.994 0.994 0.994 0 0.949 0.006 0.949 0.949 0.949 1
6 of 27
5/14/23 7:32 AM=== Confusion Matrix ===
a b <-- classified as 4124 25 | a = 0 25 463 | b = 1
=== Stratified cross-validation ===
Correctly Classified Instances 4373 94.3067 %Incorrectly Classified Instances 264 5.6933 %Kappa statistic 0.6949K&B Relative Info Score 268786.582 %K&B Information Score 1305.6899 bits 0.2816 bits/instanceClass complexity | order 0 2250.7629 bits 0.4854 bits/instanceClass complexity | scheme 131711.8722 bits 28.4045 bits/instanceComplexity improvement (Sf) -129461.1092 bits -27.9192 bits/instanceMean absolute error 0.0629Root mean squared error 0.2301Relative absolute error 33.3937 %Root relative squared error 74.9854 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.969 0.281 0.967 0.969 0.968 0 0.719 0.031 0.734 0.719 0.727 1
=== Confusion Matrix ===
a b <-- classified as 4022 127 | a = 0 137 351 | b = 1
looks good!
now have classifier J48-classifier.model
7 of 27
5/14/23 7:32 AMtry to get it to classify the data
labuser% java -cp weka.jar weka.classifiers.trees.J48 -l ../PecanData/J48-classifier.model -T ../PecanData/pecans.arff -p 1
works and gives data lines like
4633 0 0.9970313825275657 0 (4634)
the values are1. the instance number (0-indexed)2. the predicted value3. the confidence in the prediction4. the actual value5. (the first attribute) – in this case, the station ID
ran to put results into J48-output.txt
opened in excel and made J48output.xlsneed to fixsince the station ID comes in as (1), it is entered as a negative #!
multiplied by -1 and copied values
8 of 27
5/14/23 7:32 AMTried 1b1 – lazy single nearest neighbor – took about 20 mins
=== Run information ===
Scheme: weka.classifiers.lazy.IB1 Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
IB1 classifier
Time taken to build model: 0.16 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4392 94.7164 %Incorrectly Classified Instances 245 5.2836 %Kappa statistic 0.7212Mean absolute error 0.0528Root mean squared error 0.2299Relative absolute error 28.0327 %Root relative squared error 74.9065 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.97 0.244 0.971 0.97 0.97 0 0.756 0.03 0.745 0.756 0.751 1
=== Confusion Matrix ===
a b <-- classified as 4023 126 | a = 0 119 369 | b = 1
looks a little better
9 of 27
5/14/23 7:32 AMtry K-nearest neighbors – K = 3 (3 nearest neighbors)
=== Run information ===
Scheme: weka.classifiers.lazy.IBk -K 3 -W 0Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
IB1 instance-based classifierusing 3 nearest neighbour(s) for classification
Time taken to build model: 0.08 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4415 95.2124 %Incorrectly Classified Instances 222 4.7876 %Kappa statistic 0.7449Mean absolute error 0.0602Root mean squared error 0.1951Relative absolute error 31.9603 %Root relative squared error 63.5862 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.974 0.232 0.973 0.974 0.973 0 0.768 0.026 0.775 0.768 0.772 1
=== Confusion Matrix ===
a b <-- classified as 4040 109 | a = 0 113 375 | b = 1
slightly better still
10 of 27
5/14/23 7:32 AMIt might be worth trying a “reduced error pruned tree”it is supposed to make smaller treessee if it is better.runs in less than 10 mins!!
=== Run information ===
Scheme: weka.classifiers.trees.J48 -R -N 3 -Q 1 -M 2Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree------------------
11 of 27
5/14/23 7:32 AMMWM <= 24.5: 0 (1662.0/10.0)MWM > 24.5| RLOW <= 20.57: 0 (434.0/2.0)| RLOW > 20.57| | LPTOAE <= 0.0575| | | TRANGE <= 26.33: 0 (437.0/27.0)| | | TRANGE > 26.33| | | | EXPREY <= 540.4335| | | | | MCM <= -3.5: 0 (13.0/2.0)| | | | | MCM > -3.5| | | | | | BIO5 <= 25632.9578: 1 (25.0/2.0)| | | | | | BIO5 > 25632.9578: 0 (12.0/3.0)| | | | EXPREY > 540.4335: 0 (32.0/4.0)| | LPTOAE > 0.0575| | | LCOKLM <= 1.9957: 0 (41.0)| | | LCOKLM > 1.9957| | | | WATDGRC <= 3| | | | | LPTOAE <= 0.0751| | | | | | RRCORR <= -3.5: 0 (32.0/3.0)| | | | | | RRCORR > -3.5| | | | | | | ELEV <= 831| | | | | | | | LRRANGE <= 1.6697: 0 (6.0)| | | | | | | | LRRANGE > 1.6697| | | | | | | | | RRCORR3 <= 9| | | | | | | | | | ELEV <= 413: 1 (23.0)| | | | | | | | | | ELEV > 413| | | | | | | | | | | CLIM <= 3| | | | | | | | | | | | PTORUN <= 1.8019: 0 (18.0/8.0)| | | | | | | | | | | | PTORUN > 1.8019: 1 (12.0)| | | | | | | | | | | CLIM > 3: 0 (2.0)| | | | | | | | | RRCORR3 > 9| | | | | | | | | | WRET <= 104.2606: 0 (8.0)| | | | | | | | | | WRET > 104.2606: 1 (4.0)| | | | | | | ELEV > 831: 0 (11.0)| | | | | LPTOAE > 0.0751| | | | | | LCRR <= 2.922| | | | | | | RRCORR3 <= 3| | | | | | | | PERWRET <= 0.6364: 0 (17.0/4.0)| | | | | | | | PERWRET > 0.6364| | | | | | | | | COKLM <= 693: 1 (4.0)| | | | | | | | | COKLM > 693: 0 (4.0/2.0)| | | | | | | RRCORR3 > 3: 0 (3.0)| | | | | | LCRR > 2.922: 1 (218.0/38.0)| | | | WATDGRC > 3| | | | | LET <= 1.1957| | | | | | PERWDEF <= 0.5: 0 (40.0)| | | | | | PERWDEF > 0.5| | | | | | | RLOW <= 26.67: 0 (13.0/1.0)| | | | | | | RLOW > 26.67: 1 (3.0)| | | | | LET > 1.1957| | | | | | AVWAT <= 4: 0 (11.0/4.0)| | | | | | AVWAT > 4: 1 (7.0)
Number of Leaves : 27
Size of the tree : 53
Time taken to build model: 7.96 seconds
12 of 27
5/14/23 7:32 AM=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4348 93.7675 %Incorrectly Classified Instances 289 6.2325 %Kappa statistic 0.6524Mean absolute error 0.0729Root mean squared error 0.2247Relative absolute error 38.6837 %Root relative squared error 73.2403 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.972 0.35 0.959 0.972 0.965 0 0.65 0.028 0.729 0.65 0.687 1
=== Confusion Matrix ===
a b <-- classified as 4031 118 | a = 0 171 317 | b = 1
not quite as good as the full tree but it is very fast
try other rule-generating things because they give interpretable output
try JRipran in 15 mins
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1Relation: PresenceOfPecansInstances: 4637Attributes: 104 [list of attributes omitted]Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:===========
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)
(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)
(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)
(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)
13 of 27
5/14/23 7:32 AM(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)
(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)
(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)
(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0) => Pecan=0 (4064.0/52.0)
Number of Rules : 9
Time taken to build model: 69.58 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4394 94.7595 %Incorrectly Classified Instances 243 5.2405 %Kappa statistic 0.7153Mean absolute error 0.0744Root mean squared error 0.2155Relative absolute error 39.4775 %Root relative squared error 70.2129 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.974 0.275 0.968 0.974 0.971 0 0.725 0.026 0.765 0.725 0.744 1
=== Confusion Matrix ===
a b <-- classified as 4040 109 | a = 0 134 354 | b = 1
about as good as the J45.
14 of 27
5/14/23 7:32 AMfor comparison purposes, do the “null model” = zeroR (pick the majority type)
=== Classifier model (full training set) ===
ZeroR predicts class value: 0
Time taken to build model: 0.02 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4149 89.476 %Incorrectly Classified Instances 488 10.524 %Kappa statistic 0 Mean absolute error 0.1885Root mean squared error 0.3069Relative absolute error 100 %Root relative squared error 100 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 1 1 0.895 1 0.944 0 0 0 0 0 0 1
=== Confusion Matrix ===
a b <-- classified as 4149 0 | a = 0 488 0 | b = 1
only 89% agreement.so the others are an improvement
Get some scored data sets for mapping
1) “J48 reduced” = the one from page 11 – using “reduced error pruning”
java -cp weka.jar -mx300m weka.classifiers.trees.J48 -R -N 3 -Q 1 -M 2 -t ../PecanData/pecans.arff -i -k -d ../PecanData/J48-reduced-classifier.model
got this result
Options: -R -N 3 -Q 1 -M 2
15 of 27
5/14/23 7:32 AMJ48 pruned tree------------------
MWM <= 24.5: 0 (1662.0/10.0)MWM > 24.5| RLOW <= 20.57: 0 (434.0/2.0)| RLOW > 20.57| | LPTOAE <= 0.0575| | | TRANGE <= 26.33: 0 (437.0/27.0)| | | TRANGE > 26.33| | | | EXPREY <= 540.4335| | | | | MCM <= -3.5: 0 (13.0/2.0)| | | | | MCM > -3.5| | | | | | BIO5 <= 25632.9578: 1 (25.0/2.0)| | | | | | BIO5 > 25632.9578: 0 (12.0/3.0)| | | | EXPREY > 540.4335: 0 (32.0/4.0)| | LPTOAE > 0.0575| | | LCOKLM <= 1.9957: 0 (41.0)| | | LCOKLM > 1.9957| | | | WATDGRC <= 3| | | | | LPTOAE <= 0.0751| | | | | | RRCORR <= -3.5: 0 (32.0/3.0)| | | | | | RRCORR > -3.5| | | | | | | ELEV <= 831| | | | | | | | LRRANGE <= 1.6697: 0 (6.0)| | | | | | | | LRRANGE > 1.6697| | | | | | | | | RRCORR3 <= 9| | | | | | | | | | ELEV <= 413: 1 (23.0)| | | | | | | | | | ELEV > 413| | | | | | | | | | | CLIM <= 3| | | | | | | | | | | | PTORUN <= 1.8019: 0 (18.0/8.0)| | | | | | | | | | | | PTORUN > 1.8019: 1 (12.0)| | | | | | | | | | | CLIM > 3: 0 (2.0)| | | | | | | | | RRCORR3 > 9| | | | | | | | | | WRET <= 104.2606: 0 (8.0)| | | | | | | | | | WRET > 104.2606: 1 (4.0)| | | | | | | ELEV > 831: 0 (11.0)| | | | | LPTOAE > 0.0751| | | | | | LCRR <= 2.922| | | | | | | RRCORR3 <= 3| | | | | | | | PERWRET <= 0.6364: 0 (17.0/4.0)| | | | | | | | PERWRET > 0.6364| | | | | | | | | COKLM <= 693: 1 (4.0)| | | | | | | | | COKLM > 693: 0 (4.0/2.0)| | | | | | | RRCORR3 > 3: 0 (3.0)| | | | | | LCRR > 2.922: 1 (218.0/38.0)| | | | WATDGRC > 3| | | | | LET <= 1.1957| | | | | | PERWDEF <= 0.5: 0 (40.0)| | | | | | PERWDEF > 0.5| | | | | | | RLOW <= 26.67: 0 (13.0/1.0)| | | | | | | RLOW > 26.67: 1 (3.0)| | | | | LET > 1.1957| | | | | | AVWAT <= 4: 0 (11.0/4.0)| | | | | | AVWAT > 4: 1 (7.0)
Number of Leaves : 27
Size of the tree : 53
16 of 27
5/14/23 7:32 AM
Time taken to build model: 6.6 secondsTime taken to test model on training data: 0.11 seconds
=== Error on training data ===
Correctly Classified Instances 4453 96.0319 %Incorrectly Classified Instances 184 3.9681 %Kappa statistic 0.781 K&B Relative Info Score 287357.2097 %K&B Information Score 1396.3146 bits 0.3011 bits/instanceClass complexity | order 0 2250.7576 bits 0.4854 bits/instanceClass complexity | scheme 19037.9527 bits 4.1057 bits/instanceComplexity improvement (Sf) -16787.1951 bits -3.6203 bits/instanceMean absolute error 0.0644Root mean squared error 0.1852Relative absolute error 34.1766 %Root relative squared error 60.3369 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.983 0.232 0.973 0.983 0.978 0 0.768 0.017 0.841 0.768 0.803 1
=== Confusion Matrix ===
a b <-- classified as 4078 71 | a = 0 113 375 | b = 1
=== Stratified cross-validation ===
Correctly Classified Instances 4348 93.7675 %Incorrectly Classified Instances 289 6.2325 %Kappa statistic 0.6524K&B Relative Info Score 233687.7747 %K&B Information Score 1135.1897 bits 0.2448 bits/instanceClass complexity | order 0 2250.7629 bits 0.4854 bits/instanceClass complexity | scheme 83502.5671 bits 18.0079 bits/instanceComplexity improvement (Sf) -81251.8041 bits -17.5225 bits/instanceMean absolute error 0.0729Root mean squared error 0.2247Relative absolute error 38.6837 %Root relative squared error 73.2403 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.972 0.35 0.959 0.972 0.965 0 0.65 0.028 0.729 0.65 0.687 1
17 of 27
5/14/23 7:32 AM=== Confusion Matrix ===
a b <-- classified as 4031 118 | a = 0 171 317 | b = 1
looks the same as when run from explorer – good!
now, classify the pecan data
java -cp weka.jar weka.classifiers.trees.J48 -l ../PecanData/J48-reduced-classifier.model -T ../PecanData/pecans.arff -p 1 > ../PecanData/J48-reduced-output.txt
open in excel & fix to make J48-reduced-output.xls
2) Do this for the JRip from page 13 as well
java -cp weka.jar -mx300m weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1 -t ../PecanData/pecans.arff -i -k -d ../PecanData/JRip-classifier.model
it gave this output:
Options: -F 3 -N 2.0 -O 2 -S 1
JRIP rules:===========
(MWM >= 26.5) and (BAR5 >= 14.6915) and (PTOAE >= 1.1925) and (ELEV <= 300) => Pecan=1 (82.0/4.0)(AE >= 652.4943) and (PTOAE >= 1.1295) and (WATDGRC <= 3) and (WRET >= 104.8334) and (ELEV <= 625) => Pecan=1 (72.0/4.0)(MWM >= 24.6) and (CVRAIN <= 44.3185) and (WSTORAGE >= 181.796) and (ELEV <= 1030) => Pecan=1 (165.0/50.0)(MWM >= 24.3) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (PTOWATR >= 10.8738) and (Site <= 1517) => Pecan=1 (51.0/4.0)(AE >= 622.0895) and (COKLM >= 506.9) and (EXPREY <= 520.5728) and (PTOWATR >= 8.7045) => Pecan=1 (59.0/13.0)(MWM >= 24.8) and (TRANGE >= 24.7) and (RLOW >= 25.91) and (RLOW <= 46.74) => Pecan=1 (52.0/24.0)(MWM >= 27.22) and (RLOW >= 71.88) and (EXPREY <= 439.1472) and (WRET >= 102.7854) and (TEMP <= 56.0959) => Pecan=1 (15.0/1.0)(MWM >= 27.44) and (CVRAIN <= 34.6388) and (WSTORAGE <= 161.2) => Pecan=1 (77.0/37.0) => Pecan=0 (4064.0/52.0)
Number of Rules : 9
Time taken to build model: 62.44 secondsTime taken to test model on training data: 0.19 seconds
18 of 27
5/14/23 7:32 AM=== Error on training data ===
Correctly Classified Instances 4448 95.9241 %Incorrectly Classified Instances 189 4.0759 %Kappa statistic 0.799 K&B Relative Info Score 301337.6929 %K&B Information Score 1464.248 bits 0.3158 bits/instanceClass complexity | order 0 2250.7576 bits 0.4854 bits/instanceClass complexity | scheme 791.9992 bits 0.1708 bits/instanceComplexity improvement (Sf) 1458.7584 bits 0.3146 bits/instanceMean absolute error 0.0607Root mean squared error 0.1742Relative absolute error 32.1921 %Root relative squared error 56.7583 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.967 0.107 0.987 0.967 0.977 0 0.893 0.033 0.761 0.893 0.822 1
=== Confusion Matrix ===
a b <-- classified as 4012 137 | a = 0 52 436 | b = 1
=== Stratified cross-validation ===
Correctly Classified Instances 4394 94.7595 %Incorrectly Classified Instances 243 5.2405 %Kappa statistic 0.7153K&B Relative Info Score 261538.1361 %K&B Information Score 1270.479 bits 0.274 bits/instanceClass complexity | order 0 2250.7629 bits 0.4854 bits/instanceClass complexity | scheme 5543.159 bits 1.1954 bits/instanceComplexity improvement (Sf) -3292.396 bits -0.71 bits/instanceMean absolute error 0.0744Root mean squared error 0.2155Relative absolute error 39.4775 %Root relative squared error 70.2129 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.974 0.275 0.968 0.974 0.971 0 0.725 0.026 0.765 0.725 0.744 1
19 of 27
5/14/23 7:32 AM=== Confusion Matrix ===
a b <-- classified as 4040 109 | a = 0 134 354 | b = 1
now, classify the pecan data
java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/JRip-classifier.model -T ../PecanData/pecans.arff -p 1 > ../PecanData/JRip-output.txt
set up in excel
Now, try doing the J48 with only the “raw variables” – not the derived ones. This is since the tree and rule schemes seem to use derived variables mostlyit will be interesting to see if and how it works with the raw ones
The “raw” ones are:CMATCRRMWMMCMRHIGHRLOWELEVWSTORAGE COKLM
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1-2,9,12-36,38-103Instances: 4637Attributes: 10 ELEV COKLM MWM MCM RHIGH RLOW CMAT CRR WSTORAGE PecanTest mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:===========
20 of 27
5/14/23 7:32 AM(MWM >= 25.3) and (RLOW >= 25.91) and (COKLM >= 220) and (MWM >= 26.94) and (CRR >= 909.56) and (MWM >= 27.6) => Pecan=1 (157.0/13.0)(MWM >= 24.6) and (RLOW >= 25.91) and (COKLM >= 352) and (WSTORAGE <= 136.7894) and (ELEV <= 719) and (MCM <= 1.1) => Pecan=1 (73.0/6.0)(MWM >= 25.9) and (RLOW >= 25.91) and (COKLM >= 340) and (MWM >= 26.94) and (ELEV <= 1030) => Pecan=1 (82.0/18.0)(MWM >= 26.2) and (RLOW >= 26.42) and (WSTORAGE >= 188.644) and (RLOW >= 44.96) and (RHIGH >= 114.81) => Pecan=1 (27.0/2.0)(MWM >= 24.6) and (RLOW >= 20.83) and (RLOW <= 45.47) and (MCM <= 3.56) and (RLOW >= 27.69) and (RLOW >= 41.66) => Pecan=1 (21.0/5.0)(MWM >= 24.3) and (RLOW >= 23.62) and (CRR <= 1130.55) and (RHIGH >= 119.63) and (COKLM >= 113.75) and (ELEV <= 825) and (MCM >= -3) => Pecan=1 (24.0/5.0)(MWM >= 26.5) and (RLOW >= 71.88) and (MWM >= 27.44) and (COKLM >= 15.9) and (ELEV <= 116) => Pecan=1 (51.0/17.0)(MWM >= 24.2) and (RLOW >= 20.57) and (CRR <= 1097.05) and (COKLM >= 139.74) and (ELEV <= 549) and (RLOW <= 45.21) => Pecan=1 (15.0/3.0)(MWM >= 24.1) and (WSTORAGE >= 188.644) and (RLOW >= 26.42) and (COKLM >= 563.3) and (COKLM <= 716.5) and (MCM <= 3.89) and (MWM >= 26.2) => Pecan=1 (14.0/1.0) => Pecan=0 (4173.0/94.0)
Number of Rules : 10
Time taken to build model: 15.6 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4366 94.1557 %Incorrectly Classified Instances 271 5.8443 %Kappa statistic 0.6883Mean absolute error 0.0815Root mean squared error 0.2216Relative absolute error 43.2606 %Root relative squared error 72.2221 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.968 0.283 0.967 0.968 0.967 0 0.717 0.032 0.725 0.717 0.721 1
=== Confusion Matrix ===
a b <-- classified as 4016 133 | a = 0 138 350 | b = 1
almost as good!!
21 of 27
5/14/23 7:32 AMso, set up a classified data set for this:
need to get data set with just raw attributesedited data.csv to data-raw.csvpasted into pecans.arff to make pecans-raw.arff
ran with explorer and JRip as before.
printed output to be sure it’s the same
=== Run information ===
Scheme: weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1Relation: PresenceOfPecansInstances: 4637Attributes: 11 Site ELEV COKLM MWM MCM RHIGH RLOW CMAT CRR WSTORAGE PecanTest mode: 10-fold cross-validation
=== Classifier model (full training set) ===
JRIP rules:===========
(MWM >= 25.3) and (RLOW >= 25.91) and (COKLM >= 272) and (MWM >= 27) and (ELEV <= 660) => Pecan=1 (175.0/23.0)(MWM >= 25.1) and (RLOW >= 25.91) and (WSTORAGE >= 188.644) and (RLOW >= 40.89) => Pecan=1 (76.0/11.0)(MWM >= 24.6) and (RLOW >= 28.45) and (COKLM >= 352) and (CRR <= 1263.13) and (ELEV <= 830) and (ELEV <= 605) => Pecan=1 (74.0/12.0)(MWM >= 26.17) and (RLOW >= 25.91) and (COKLM >= 507) and (Site <= 3496) and (MWM >= 26.94) and (COKLM <= 693) => Pecan=1 (30.0/3.0)(MWM >= 24.3) and (RLOW >= 20.57) and (MCM <= 0.94) and (RHIGH >= 128.52) and (WSTORAGE <= 136.7894) and (ELEV <= 690) => Pecan=1 (11.0/1.0)(MWM >= 24.4) and (RLOW >= 20.57) and (CRR <= 1107.68) and (COKLM >= 117.12) and (ELEV <= 1025) and (ELEV <= 507) and (RLOW <= 54.1) => Pecan=1 (12.0/0.0)(MWM >= 24.3) and (RLOW >= 20.57) and (MWM >= 27.44) and (RLOW >= 71.88) and (RLOW >= 91.95) and (RHIGH <= 177.8) => Pecan=1 (27.0/7.0)(MWM >= 24.4) and (RLOW >= 20.57) and (CRR <= 1107.44) and (Site <= 3159) and (RHIGH >= 128.02) and (ELEV <= 1050) and (MCM >= -3.3) => Pecan=1 (39.0/13.0)(MWM >= 24) and (RLOW >= 23.62) and (WSTORAGE <= 161.2) and (RLOW >= 72.9) and (MWM >= 27.44) and (RLOW <= 81.53) => Pecan=1 (20.0/5.0) => Pecan=0 (4173.0/99.0)
Number of Rules : 10
Time taken to build model: 22 seconds
22 of 27
5/14/23 7:32 AM
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4384 94.5439 %Incorrectly Classified Instances 253 5.4561 %Kappa statistic 0.7126Mean absolute error 0.0774Root mean squared error 0.2146Relative absolute error 41.0487 %Root relative squared error 69.9221 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.968 0.25 0.971 0.968 0.969 0 0.75 0.032 0.736 0.75 0.743 1
=== Confusion Matrix ===
a b <-- classified as 4018 131 | a = 0 122 366 | b = 1
looks not exactly the same but OK.
make a classifier
java -cp weka.jar -mx300m weka.classifiers.rules.JRip -F 3 -N 2.0 -O 2 -S 1 -t ../PecanData/pecans-raw.arff -i -k -d ../PecanData/raw-JRip-classifier.model
same output as above
classify the raw data
java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/raw-JRip-classifier.model -T ../PecanData/pecans-raw.arff -p 1 > ../PecanData/raw-JRip-output.txt
made raw-JRip-output.xls
to get towards max kappa, try the lazy nearest neighbor one & run it to find best Kuse full dataset
in explorerLBK –
crossvalidate = trueKNN = 6
this will evaluate, by cross-validation, 1 – 6 nearest neighborslet it run overnight
it came up with K = 3 as the best one.(same as before)
23 of 27
5/14/23 7:32 AMlast data point – do OneR to see what it’d be & the improvement in Kappa
pecans.arff
it used Site...
so exclude that one=== Run information ===
Scheme: weka.classifiers.rules.OneR -B 6Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1Instances: 4637Attributes: 103 [list of attributes omitted]Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
LSSTAB2:< -1.71435 -> 0< -1.71225 -> 1< -1.70995 -> 0< -1.7081499999999998 -> 1< -1.70755 -> 0< -1.7056 -> 1< -1.7053 -> 0< -1.7036 -> 1< -1.7032 -> 0< -1.7015 -> 1< -1.69895 -> 0< -1.6969 -> 1< -1.69665 -> 0< -1.6948 -> 1< -1.69455 -> 0< -1.6931500000000002 -> 1< -1.69245 -> 0< -1.6905000000000001 -> 1< -1.68595 -> 0< -1.68435 -> 1< -0.04535 -> 0< 0.10985 -> 1< 0.1921 -> 0< 0.20995 -> 1< 0.5104 -> 0< 0.52035 -> 1>= 0.52035 -> 0
(4227/4637 instances correct)
Time taken to build model: 1.4 seconds
24 of 27
5/14/23 7:32 AM=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4179 90.1229 %Incorrectly Classified Instances 458 9.8771 %Kappa statistic 0.2328Mean absolute error 0.0988Root mean squared error 0.3143Relative absolute error 52.4041 %Root relative squared error 102.4163 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.987 0.828 0.91 0.987 0.947 0 0.172 0.013 0.609 0.172 0.268 1
=== Confusion Matrix ===
a b <-- classified as 4095 54 | a = 0 404 84 | b = 1
not very good.
try OneR on the raw data only
=== Run information ===
Scheme: weka.classifiers.rules.OneR -B 6Relation: PresenceOfPecans-weka.filters.unsupervised.attribute.Remove-R1Instances: 4637Attributes: 10 ELEV COKLM MWM MCM RHIGH RLOW CMAT CRR WSTORAGE PecanTest mode: 10-fold cross-validation
25 of 27
5/14/23 7:32 AM=== Classifier model (full training set) ===
MWM:< 27.314999999999998 -> 0< 27.395 -> 1< 27.53 -> 0< 27.58 -> 1< 27.71 -> 0< 27.79 -> 1< 27.92 -> 0< 27.97 -> 1< 28.105 -> 0< 28.185000000000002 -> 1< 28.21 -> 0< 28.25 -> 1< 28.395 -> 0< 28.47 -> 1< 28.64 -> 0< 28.685000000000002 -> 1< 29.314999999999998 -> 0< 29.47 -> 1>= 29.47 -> 0
(4202/4637 instances correct)
Time taken to build model: 0.13 seconds
=== Stratified cross-validation ====== Summary ===
Correctly Classified Instances 4164 89.7994 %Incorrectly Classified Instances 473 10.2006 %Kappa statistic 0.2739Mean absolute error 0.102 Root mean squared error 0.3194Relative absolute error 54.1204 %Root relative squared error 104.0799 %Total Number of Instances 4637
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0.977 0.773 0.915 0.977 0.945 0 0.227 0.023 0.536 0.227 0.319 1
=== Confusion Matrix ===
a b <-- classified as 4053 96 | a = 0 377 111 | b = 1
it used mean of the warmest month
26 of 27
5/14/23 7:32 AMSummary
Model 10x cross-validationMethod % correct Kappa % correct Kappa
ZeroR 84.98 0.000OneR 90.12 0.233
J48 – small tree 96.03 0.781 93.77 0.652J48 – large tree 98.92 0.943 94.30 0.694
JRip – raw attributes only 94.54 0.713LB1 94.72 0.721JRip 95.92 0.799 94.76 0.715LB3 95.21 0.745
did analysis of world dataput in “World” folder inside “PecanData”
used WrldRunPecan.csv as source filedid not have a last, pecan, columnused pecans.arff for header (deleted “@attribute pecan” line)
saved as world.arff
opened in explorer to check syntaxlooks OK
run with Jrip classifier from page 20
java -cp weka.jar weka.classifiers.rules.JRip -l ../PecanData/JRip-classifier.model -T ../PecanData/World/world.arff -p 1 > ../PecanData/World/world-JRip-output.txt
did something strange....didn’t make predictions???
try putting in a dummy “Pecans” attribute.edit WrldRunPecan.csvre-make world.arff
try again worked AOK saved as world-JRip-output.xls
27 of 27