Date post: | 22-Feb-2017 |
Category: |
Documents |
Upload: | durgesh-kumar |
View: | 21 times |
Download: | 0 times |
Indian Institute of Technology Kanpur
2nd
Semester : 2015-2016
REGRESSION ANALYSIS PROJECT
Analysis of Rice production in 2011-12 in India
Submitted to:
Dr. Sharmistha mitra Submitted by:
1) P.CHARAN SANTOSH KUMAR(12462)
2) DURGESH KUMAR(13268)
3) NIHAL KASHYAP(151100)
DATA SOURCES:
The following links below are used determining the rice production India.
1) http://drdpat.bih.nic.in/Downloads/Statewise-APY-of-Rice-2006-07-to-
2010-11.pdf (for yield of rice per hectare)
2) http://www.imdagrimet.gov.in/temperature_monthly_archive (for mean
temperature)
3) http://www.rainwaterharvesting.org/urban/rainfall.htm (for average rainfall
in all states)
4) http://www.merinews.com/campaign/water-scarcity/annual-rainfall.jsp
5) https://www.currentresults.com/Weather/India/average-yearly-
precipitation.php (average weather changes in India)
6) http://indianfertilizer.com/statistics (consumption of fertilizers)
7) http://indianfertilizer.com/frontend/statistics/contentFileView?page=statswr
2012-13 (fertilizer production in india)
Acknowledgements:
We would like to thank our instructor of the course Dr. Sharmishtha Mitra for
providing constant guidance and motivation for this project, without which it
would have been an impossible task to accomplish.
Next we would like to thank the directorate of rice development, Government of
India for providing the data online.
Last but not the least we appreciate each others’ contribution and hard work for the
fulfillment of the project.
ABSTRACT
In this project we predicted rice production(in kg) per hectare in India.
Given the conditions rainfall, fertilizers, soil pH, humidity, temperature,
wind speed, seed quality. For this we have used multiple linear
regression model to predict the yield of rice(in kg per hectare) in a
season of any year.
The columns in the data set have been coded in the following way:
1) yield.per.hect
2) rainfall.mm.
3) fert.kg.per.hect.
4) temp.celcius.
5) windspeed
6) ph level of soil
7) insecticides
8) seed
9) humidity
We have done our project in R software.
pH level of soil, insecticides, seeds are taken as dummy variables in
which :
pH value
1 when 5.5<pH<6.9
0 when pH<5.5
-1 when pH>6.9
Insecticides
1 when only conventional pesticides is used in both riping state and initial
state.
0 when both conventional ripening state and organic pesticides is used in initial
state.
-1 when only organic pesticides are used.
Rice Seeds quality
1 when small grain rice.
0 when small sweet grain rice.
-1 when medium grain rice.
-2 when long grain rice.
MLR Model
For MLR model, the actual dataset included only 28 observations with
the initially 8 Regressors.
Step 1: fitting the model:
a=lm(yield.per.hect.~rainfall.mm.+fert.kg.per.hect.+temp.celcius.+windspeed+ph+
insecticides+seed+humid)
> summary(a)
Call:
lm(formula = yield.per.hect. ~ rainfall.mm. + fert.kg.per.hect. +
temp.celcius. + windspeed + ph + insecticides + seed + humid)
Residuals:
Min 1Q Median 3Q Max
-522.52 -153.27 23.91 103.82 526.50
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -700.00246 1050.02528 -0.667 0.51345
rainfall.mm. -0.06694 0.07882 -0.849 0.40692
fert.kg.per.hect. 4.88709 1.56946 3.114 0.00599
temp.celcius. -13.59277 12.08219 -1.125 0.27536
windspeed 24.61640 11.20673 2.197 0.04139
ph -48.56163 87.66266 -2.117 0.04847
insecticides 21.25789 87.76761 0.242 0.81136
seed 40.26558 49.81719 0.808 0.42949
humid 28.29617 8.36068 3.384 0.00330
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 297 on 18 degrees of freedom
Multiple R-squared: 0.8501, Adjusted R-squared: 0.7835
F-statistic: 12.76 on 8 and 18 DF, p-value: 5.442e-06
Step 2:- Plotting of graphs among regressors
Plot(yield.per.hect,rainfall,main=”Yield(kg/hect) vs Rainfall(mm)”,xlab=”----------
-rainfall-----------”,ylab=----------Yield.per.hect-------------
,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,fertilizer,main=”Yield(kg/hect) vs fertilizer(kg per
hect)”,xlab=”-----------fertilizer-----------”,ylab=----------Yield.per.hect-------------
,xlim=c(0,200),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,temperature ,main=”Yield(kg/hect) vs temperature(in
celsius)”,xlab=”-----------temperature-----------”,ylab=----------Yield.per.hect--------
-----,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,windspeed ,main=”Yield(kg/hect) vs windspeed(m/s)”,xlab=”--
---------windspeed-----------”,ylab=----------Yield.per.hect-------------
,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,pH ,main=”Yield(kg/hect) vs pH”,xlab=”-----------pH-----------
”,ylab=---------- Yield.per.hect-------------,xlim=c(-
1,1),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,Instecticides,main=”Yield(kg/hect) vs Insecticides”,xlab=”------
-----Insecticide-----------”,ylab=----------Yield.per.hect-------------,xlim=c(-
1,1),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,Seed quality,main=”Yield(kg/hect) vs Seed quality”,xlab=”-----
------Seed quality-----------”,ylab=----------Yield.per.hect-------------,xlim=c(-
2,2),ylim=c(0,3500),col=”red”,las=1,pch=8)
Plot(yield.per.hect,Humidity,main=”Yield(kg/hect) vs Humdity(in
celsius)”,xlab=”-----------Humidity -----------”,ylab=----------Yield.per.hect----------
---,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)
Graph is outward-opening funnel shaped. As fitted value increases
variance of residual will increase so, variance (€) increases so, v(€) is
not constant. It can be removed by transformations.
fert.kg.per.hect. ph seed temp.celcius. humidity insecticides rainfall.mm.
windspeed
Sum of Squares 3417528 99961 6745 2734932 161403 58759 78912
910174
Deg. of Freedom 1 1 1 1 1 1 1 1
Residuals
Sum of Squares 3128146
Deg. of Freedom 18
Residual standard error: 416.8763
Estimated effects may be unbalanced
summary(aov(ab))
Df Sum Sq Mean Sq F value Pr(>F)
fert.kg.per.hect. 1 3417528 3417528 19.665 0.000320
ph 1 99961 99961 0.575 0.458018
seed 1 6745 6745 0.039 0.846031
temp.celcius. 1 2734932 2734932 15.737 0.000904
humidity 1 161403 161403 0.929 0.347958
insecticides 1 58759 58759 0.338 0.568131
rainfall.mm. 1 78912 78912 0.454 0.508968
windspeed 1 910174 910174 5.237 0.034418
Residuals 18 3128146 173786
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
By observing F-values we can apply forward selection method to find best model and we can also apply
Partial F-test for checking the significance of the method
>
confint(ab,level = 0.95)
2.5 % 97.5 %
(Intercept) -3349.2297970 2961.2997874
fert.kg.per.hect. 2.8117459 11.8889648
ph -108.8748555 163.1934019
seed -20.7671813 267.9343453
temp.celcius. -69.6783441 -9.5517498
humidity -0.0539360 56.3145370
insecticides -198.3451448 307.1186867
rainfall.mm. -0.4434321 0.1114147
windspeed 3.1519321 73.7486348
>
nihal=data.frame(fert.kg.per.hect.=50,ph=1,windspeed=50,rainfall.mm.=35,temp.celcius.=40,seed=1,humidity=50,insecticides=1)
predict(ab,nihal,interval ="confidence" )
fit lwr upr
1 2117.299 1492.6 2741.999
Multicollinearity
> a=rainfall.mm.-mean(rainfall.mm.) > b=a/sqrt(sum(a*a)) > b > fert > fert.kg.per.hect.-mean(fert.kg.per.hect.) > c=fert.kg.per.hect.-mean(fert.kg.per.hect.) > d=c/sqrt(sum(c*c)) > d > e=temp-mean(temp.celcius.) > e=temp.celcius.-mean(temp.celcius.) > f=e/sqrt(sum(e*e)) > f > windspeed-mean(windspeed) > g=windspeed-mean(windspeed) > h=g/sqrt(sum(g*g)) > h > i=ph=mean(ph) > i=ph-mean(ph) > j=i/sqrt(sum(i*i)) > j > j > i > i=ph=mean(ph) > i > i/sqrt(sum(i*i)) > i=ph-mean(ph) > i > mean(ph) > ph > rm(ph) > ph > ph-mean > > ph-mean(ph) > k=ph-mean(ph) > l=k/sqrt(sum(k*k)) > l > insecticides-mean(insecticides) > m=insecticides-mean(insecticides) > n=m/sqrt(sum(m*m)) > n > seed-mean(seed) > o=seed-mean(seed) > p=o/sqrt(sum(p*p)) > p=o/sqrt(sum(o*o)) > p
> humid > humid-mean(humid) > q=humid-mean(humid) > r=q/sqrt(sum(q*q)) > r > x=cbind(r,p,n,l,h,f,d,b) > t(x)%*%x > y=t(x)%*%x > inverse.gaussian(r,p,n,l,h,f,d,b)
By doing above commands we get,
INVERSE
1.858 0.313 -0.137 -0.431 -1.258 0.201 -0.065 -0.224
0.313 1.516 0.518 0.460 0.349 -0.061 -0.308 -0.436
-0.137 0.518 1.610 0.196 0.803 0.026 -0.474 -0.470
-0.431 0.460 0.196 1.415 0.867 -0.030 -0.102 0.048
-1.258 0.349 0.803 0.867 2.543 -0.103 0.013 0.385
0.201 -0.061 0.026 -0.030 -0.103 1.061 0.085 -0.111
-0.065 -0.308 -0.474 -0.102 0.013 0.085 1.357 0.665
-0.224 -0.436 -0.470 0.048 0.385 -0.111 0.665 1.576
Hence the vif will be
1.858, 1.516, 1.610, 1.415, 2.543, 1.061, 1.357, 1.576.
All vif’s are less than 5 so there is no multicollinearity data
CONCLUSION:
Ho:b1=b2=………=b8=0
H1:Ho is not true
Ho is rejected as all bi’s are not 0.
From t values we conclude that ,
All farmers try to give all the stuffs required by a crop but only yield
increases when fertilizers, wind speed, insecticides, seeds, humidity are
in appropriate amount. We cannot control rainfall, temperature and pH
of soils.