+ All Categories
Home > Documents > 416-project

416-project

Date post: 22-Feb-2017
Category:
Upload: durgesh-kumar
View: 21 times
Download: 0 times
Share this document with a friend
22
Indian Institute of Technology Kanpur 2 nd Semester : 2015-2016 REGRESSION ANALYSIS PROJECT Analysis of Rice production in 2011-12 in India Submitted to: Dr. Sharmistha mitra Submitted by: 1) P.CHARAN SANTOSH KUMAR(12462) 2) DURGESH KUMAR(13268) 3) NIHAL KASHYAP(151100)
Transcript

Indian Institute of Technology Kanpur

2nd

Semester : 2015-2016

REGRESSION ANALYSIS PROJECT

Analysis of Rice production in 2011-12 in India

Submitted to:

Dr. Sharmistha mitra Submitted by:

1) P.CHARAN SANTOSH KUMAR(12462)

2) DURGESH KUMAR(13268)

3) NIHAL KASHYAP(151100)

DATA SOURCES:

The following links below are used determining the rice production India.

1) http://drdpat.bih.nic.in/Downloads/Statewise-APY-of-Rice-2006-07-to-

2010-11.pdf (for yield of rice per hectare)

2) http://www.imdagrimet.gov.in/temperature_monthly_archive (for mean

temperature)

3) http://www.rainwaterharvesting.org/urban/rainfall.htm (for average rainfall

in all states)

4) http://www.merinews.com/campaign/water-scarcity/annual-rainfall.jsp

5) https://www.currentresults.com/Weather/India/average-yearly-

precipitation.php (average weather changes in India)

6) http://indianfertilizer.com/statistics (consumption of fertilizers)

7) http://indianfertilizer.com/frontend/statistics/contentFileView?page=statswr

2012-13 (fertilizer production in india)

Acknowledgements:

We would like to thank our instructor of the course Dr. Sharmishtha Mitra for

providing constant guidance and motivation for this project, without which it

would have been an impossible task to accomplish.

Next we would like to thank the directorate of rice development, Government of

India for providing the data online.

Last but not the least we appreciate each others’ contribution and hard work for the

fulfillment of the project.

ABSTRACT

In this project we predicted rice production(in kg) per hectare in India.

Given the conditions rainfall, fertilizers, soil pH, humidity, temperature,

wind speed, seed quality. For this we have used multiple linear

regression model to predict the yield of rice(in kg per hectare) in a

season of any year.

The columns in the data set have been coded in the following way:

1) yield.per.hect

2) rainfall.mm.

3) fert.kg.per.hect.

4) temp.celcius.

5) windspeed

6) ph level of soil

7) insecticides

8) seed

9) humidity

We have done our project in R software.

pH level of soil, insecticides, seeds are taken as dummy variables in

which :

pH value

1 when 5.5<pH<6.9

0 when pH<5.5

-1 when pH>6.9

Insecticides

1 when only conventional pesticides is used in both riping state and initial

state.

0 when both conventional ripening state and organic pesticides is used in initial

state.

-1 when only organic pesticides are used.

Rice Seeds quality

1 when small grain rice.

0 when small sweet grain rice.

-1 when medium grain rice.

-2 when long grain rice.

MLR Model

For MLR model, the actual dataset included only 28 observations with

the initially 8 Regressors.

Step 1: fitting the model:

a=lm(yield.per.hect.~rainfall.mm.+fert.kg.per.hect.+temp.celcius.+windspeed+ph+

insecticides+seed+humid)

> summary(a)

Call:

lm(formula = yield.per.hect. ~ rainfall.mm. + fert.kg.per.hect. +

temp.celcius. + windspeed + ph + insecticides + seed + humid)

Residuals:

Min 1Q Median 3Q Max

-522.52 -153.27 23.91 103.82 526.50

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -700.00246 1050.02528 -0.667 0.51345

rainfall.mm. -0.06694 0.07882 -0.849 0.40692

fert.kg.per.hect. 4.88709 1.56946 3.114 0.00599

temp.celcius. -13.59277 12.08219 -1.125 0.27536

windspeed 24.61640 11.20673 2.197 0.04139

ph -48.56163 87.66266 -2.117 0.04847

insecticides 21.25789 87.76761 0.242 0.81136

seed 40.26558 49.81719 0.808 0.42949

humid 28.29617 8.36068 3.384 0.00330

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 297 on 18 degrees of freedom

Multiple R-squared: 0.8501, Adjusted R-squared: 0.7835

F-statistic: 12.76 on 8 and 18 DF, p-value: 5.442e-06

Step 2:- Plotting of graphs among regressors

Plot(yield.per.hect,rainfall,main=”Yield(kg/hect) vs Rainfall(mm)”,xlab=”----------

-rainfall-----------”,ylab=----------Yield.per.hect-------------

,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,fertilizer,main=”Yield(kg/hect) vs fertilizer(kg per

hect)”,xlab=”-----------fertilizer-----------”,ylab=----------Yield.per.hect-------------

,xlim=c(0,200),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,temperature ,main=”Yield(kg/hect) vs temperature(in

celsius)”,xlab=”-----------temperature-----------”,ylab=----------Yield.per.hect--------

-----,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,windspeed ,main=”Yield(kg/hect) vs windspeed(m/s)”,xlab=”--

---------windspeed-----------”,ylab=----------Yield.per.hect-------------

,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,pH ,main=”Yield(kg/hect) vs pH”,xlab=”-----------pH-----------

”,ylab=---------- Yield.per.hect-------------,xlim=c(-

1,1),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,Instecticides,main=”Yield(kg/hect) vs Insecticides”,xlab=”------

-----Insecticide-----------”,ylab=----------Yield.per.hect-------------,xlim=c(-

1,1),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,Seed quality,main=”Yield(kg/hect) vs Seed quality”,xlab=”-----

------Seed quality-----------”,ylab=----------Yield.per.hect-------------,xlim=c(-

2,2),ylim=c(0,3500),col=”red”,las=1,pch=8)

Plot(yield.per.hect,Humidity,main=”Yield(kg/hect) vs Humdity(in

celsius)”,xlab=”-----------Humidity -----------”,ylab=----------Yield.per.hect----------

---,xlim=c(0,100),ylim=c(0,3500),col=”red”,las=1,pch=8)

Approximately light tailed distribution

Graph is outward-opening funnel shaped. As fitted value increases

variance of residual will increase so, variance (€) increases so, v(€) is

not constant. It can be removed by transformations.

Plot(a)

> aov(ab)

Call:

aov(formula = ab)

Terms:

fert.kg.per.hect. ph seed temp.celcius. humidity insecticides rainfall.mm.

windspeed

Sum of Squares 3417528 99961 6745 2734932 161403 58759 78912

910174

Deg. of Freedom 1 1 1 1 1 1 1 1

Residuals

Sum of Squares 3128146

Deg. of Freedom 18

Residual standard error: 416.8763

Estimated effects may be unbalanced

summary(aov(ab))

Df Sum Sq Mean Sq F value Pr(>F)

fert.kg.per.hect. 1 3417528 3417528 19.665 0.000320

ph 1 99961 99961 0.575 0.458018

seed 1 6745 6745 0.039 0.846031

temp.celcius. 1 2734932 2734932 15.737 0.000904

humidity 1 161403 161403 0.929 0.347958

insecticides 1 58759 58759 0.338 0.568131

rainfall.mm. 1 78912 78912 0.454 0.508968

windspeed 1 910174 910174 5.237 0.034418

Residuals 18 3128146 173786

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

By observing F-values we can apply forward selection method to find best model and we can also apply

Partial F-test for checking the significance of the method

>

confint(ab,level = 0.95)

2.5 % 97.5 %

(Intercept) -3349.2297970 2961.2997874

fert.kg.per.hect. 2.8117459 11.8889648

ph -108.8748555 163.1934019

seed -20.7671813 267.9343453

temp.celcius. -69.6783441 -9.5517498

humidity -0.0539360 56.3145370

insecticides -198.3451448 307.1186867

rainfall.mm. -0.4434321 0.1114147

windspeed 3.1519321 73.7486348

>

nihal=data.frame(fert.kg.per.hect.=50,ph=1,windspeed=50,rainfall.mm.=35,temp.celcius.=40,seed=1,humidity=50,insecticides=1)

predict(ab,nihal,interval ="confidence" )

fit lwr upr

1 2117.299 1492.6 2741.999

Multicollinearity

> a=rainfall.mm.-mean(rainfall.mm.) > b=a/sqrt(sum(a*a)) > b > fert > fert.kg.per.hect.-mean(fert.kg.per.hect.) > c=fert.kg.per.hect.-mean(fert.kg.per.hect.) > d=c/sqrt(sum(c*c)) > d > e=temp-mean(temp.celcius.) > e=temp.celcius.-mean(temp.celcius.) > f=e/sqrt(sum(e*e)) > f > windspeed-mean(windspeed) > g=windspeed-mean(windspeed) > h=g/sqrt(sum(g*g)) > h > i=ph=mean(ph) > i=ph-mean(ph) > j=i/sqrt(sum(i*i)) > j > j > i > i=ph=mean(ph) > i > i/sqrt(sum(i*i)) > i=ph-mean(ph) > i > mean(ph) > ph > rm(ph) > ph > ph-mean > > ph-mean(ph) > k=ph-mean(ph) > l=k/sqrt(sum(k*k)) > l > insecticides-mean(insecticides) > m=insecticides-mean(insecticides) > n=m/sqrt(sum(m*m)) > n > seed-mean(seed) > o=seed-mean(seed) > p=o/sqrt(sum(p*p)) > p=o/sqrt(sum(o*o)) > p

> humid > humid-mean(humid) > q=humid-mean(humid) > r=q/sqrt(sum(q*q)) > r > x=cbind(r,p,n,l,h,f,d,b) > t(x)%*%x > y=t(x)%*%x > inverse.gaussian(r,p,n,l,h,f,d,b)

By doing above commands we get,

INVERSE

1.858 0.313 -0.137 -0.431 -1.258 0.201 -0.065 -0.224

0.313 1.516 0.518 0.460 0.349 -0.061 -0.308 -0.436

-0.137 0.518 1.610 0.196 0.803 0.026 -0.474 -0.470

-0.431 0.460 0.196 1.415 0.867 -0.030 -0.102 0.048

-1.258 0.349 0.803 0.867 2.543 -0.103 0.013 0.385

0.201 -0.061 0.026 -0.030 -0.103 1.061 0.085 -0.111

-0.065 -0.308 -0.474 -0.102 0.013 0.085 1.357 0.665

-0.224 -0.436 -0.470 0.048 0.385 -0.111 0.665 1.576

Hence the vif will be

1.858, 1.516, 1.610, 1.415, 2.543, 1.061, 1.357, 1.576.

All vif’s are less than 5 so there is no multicollinearity data

CONCLUSION:

Ho:b1=b2=………=b8=0

H1:Ho is not true

Ho is rejected as all bi’s are not 0.

From t values we conclude that ,

All farmers try to give all the stuffs required by a crop but only yield

increases when fertilizers, wind speed, insecticides, seeds, humidity are

in appropriate amount. We cannot control rainfall, temperature and pH

of soils.


Recommended