+ All Categories
Home > Documents > GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results...

GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results...

Date post: 12-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
27
VAT Tax Gap prediction: a 2-steps Gradient Boosting approach Giovanna Tagliaferri 13 March 2019
Transcript
Page 1: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

VAT Tax Gap prediction: a 2-steps

Gradient Boosting approach

Giovanna Tagliaferri

13 March 2019

Page 2: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Outline

1 Introduction

2 2-Steps Gradient Boosting

3 Application on results from fiscal audits

Dataset description

Selection bias correction

Potential tax base estimate

4 VAT base gap propensity analysis

5 Conclusions

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 2 of 16

Page 3: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Preamble

• Internship: work realized at Sogei in collaboration with the Italian Revenue

Agency.

• What: produce an estimate of the Italian VAT Tax Gap for the year 2011.

- Major disadvantage: selection bias ) taxpayers are not randomly se-

lected.

• How: a completely non parametric approach in 2-steps, based on Gradient

Boosting, able to provide estimates for the potential tax base (BIT) and

the undeclared part (BIND).

BIT = BID + BIND

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 3 of 16

Page 4: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Preamble

• Internship: work realized at Sogei in collaboration with the Italian Revenue

Agency.

• What: produce an estimate of the Italian VAT Tax Gap for the year 2011.

- Major disadvantage: selection bias ) taxpayers are not randomly se-

lected.

• How: a completely non parametric approach in 2-steps, based on Gradient

Boosting, able to provide estimates for the potential tax base (BIT) and

the undeclared part (BIND).

BIT = BID + BIND

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 3 of 16

Page 5: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Preamble

• Internship: work realized at Sogei in collaboration with the Italian Revenue

Agency.

• What: produce an estimate of the Italian VAT Tax Gap for the year 2011.

- Major disadvantage: selection bias ) taxpayers are not randomly se-

lected.

• How: a completely non parametric approach in 2-steps, based on Gradient

Boosting, able to provide estimates for the potential tax base (BIT) and

the undeclared part (BIND).

BIT = BID + BIND

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 3 of 16

Page 6: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Application Context

• The statistical unit is the Individual Firm, individual who carries out busi-

ness activities or self-employment.

• The available information have been gathered from two sources:

- the register of Irpef, VAT and Irap declarations (available for the entire

population);

- the compliance control papers (available for tax assessed taxpayers).

• Only 2% of taxpayers are generally subject to tax assessment.

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 4 of 16

Page 7: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

2-Steps Gradient Boosting

First step: Selection bias correction

Gradient Boosting classification model, aimed at the estimation of:

π̂i = P ( i ∈ S | X ).

Target variable: compliance control presence.

Second step: Potential tax base estimate

Gradient Boosting regression model, only on the assessed units, with weights:

νi ∝1π̂i.

Target variable: potential tax base.

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 5 of 16

Page 8: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

2-Steps Gradient Boosting

First step: Selection bias correction

Gradient Boosting classification model, aimed at the estimation of:

π̂i = P ( i ∈ S | X ).

Target variable: compliance control presence.

Second step: Potential tax base estimate

Gradient Boosting regression model, only on the assessed units, with weights:

νi ∝1π̂i.

Target variable: potential tax base.

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 5 of 16

Page 9: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

2-Steps Gradient Boosting

First step: Selection bias correction

Gradient Boosting classification model, aimed at the estimation of:

π̂i = P ( i ∈ S | X ).

Target variable: compliance control presence.

Second step: Potential tax base estimate

Gradient Boosting regression model, only on the assessed units, with weights:

νi ∝1π̂i.

Target variable: potential tax base.

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 5 of 16

Page 10: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Data

Matrix with approximately 2.3 milion of taxpayers for 160 variables.

Problems: hardware limits

Solution: subsampling

Total population Sample

Control type Frequence Percentage Frequence Percentage

Not Assessed 2′275′219 99.18% 45′489 70.85%

Assessed 18′718 0.82% 18′718 29.15%

2′293′937 100% 64′207 100%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 6 of 16

Page 11: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Data

Matrix with approximately 2.3 milion of taxpayers for 160 variables.

Problems: hardware limits

Solution: subsampling

Total population Sample

Control type Frequence Percentage Frequence Percentage

Not Assessed 2′275′219 99.18% 45′489 70.85%

Assessed 18′718 0.82% 18′718 29.15%

2′293′937 100% 64′207 100%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 6 of 16

Page 12: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Data

Matrix with approximately 2.3 milion of taxpayers for 160 variables.

Problems: hardware limits

Solution: subsampling

Total population Sample

Control type Frequence Percentage Frequence Percentage

Not Assessed 2′275′219 99.18% 45′489 70.85%

Assessed 18′718 0.82% 18′718 29.15%

2′293′937 100% 64′207 100%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 6 of 16

Page 13: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Selection bias correction

Hyperparameters have been tuned via cross-validation, with the sample split

in train (70%) and test (30%). The optimal choice was:

{λopt = 0.1, n.iteropt = 998} → AUC = 0.79

The most discriminating variables:

- region in which the firm operates

- branch

- number of employees

- revenues

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 7 of 16

Page 14: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Selection bias correction

Hyperparameters have been tuned via cross-validation, with the sample split

in train (70%) and test (30%). The optimal choice was:

{λopt = 0.1, n.iteropt = 998} → AUC = 0.79

The most discriminating variables:

- region in which the firm operates

- branch

- number of employees

- revenues

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 7 of 16

Page 15: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Potential tax base estimate

The regressive model has been estimated only on 18’718 taxpayers subject to

tax assessment. Cross validation was also performed here.

{λopt = 0.1, n.iteropt = 38 } → R2BIT = 0.83

Most important variables:

- belonging region

- taxable for other purchases and imports

- set of operations that produce VAT

- operating costs and fiscal added value

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 8 of 16

Page 16: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Potential tax base estimate

The regressive model has been estimated only on 18’718 taxpayers subject to

tax assessment. Cross validation was also performed here.

{λopt = 0.1, n.iteropt = 38 } → R2BIT = 0.83

Most important variables:

- belonging region

- taxable for other purchases and imports

- set of operations that produce VAT

- operating costs and fiscal added value

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 8 of 16

Page 17: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Potential tax base estimate

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 9 of 16

Page 18: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Results

The Heckman Model has been estimated on the same sample for compara-

tive purposes.

Gradient Boosting Heckman

Train Test Test

BINDTOT 0.727mld 0.314mld 0.314mld

ˆBINDTOT 0.693mld 0.292mld 0.290mld

BITTOT 3.194mld 1.316mld 1.316mld

ˆBITTOT 3.159mld 1.292mld 1.231mld

R2BIT 0.836 0.828 0.657

R2adj,BIT 0.834 0.826 0.652

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 10 of 16

Page 19: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Tax evasion propensity analysis

The estimated model has been used to get predictions onto not assessed

taxpayers.

The trend of tax evasion propensity was studied for the whole sample.

Prop =

∑Ni=1

ˆBIND i∑Ni=1

ˆBIT i

The lower the ratio the better the compliance.

Gradient Boosting Heckman

Propensity 30.40% 29.77%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 11 of 16

Page 20: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Tax evasion propensity analysis

The estimated model has been used to get predictions onto not assessed

taxpayers.

The trend of tax evasion propensity was studied for the whole sample.

Prop =

∑Ni=1

ˆBIND i∑Ni=1

ˆBIT i

The lower the ratio the better the compliance.

Gradient Boosting Heckman

Propensity 30.40% 29.77%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 11 of 16

Page 21: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Propensity for sex

Gradient Boosting Heckman

Sex n BITmld

BINDmld

Prop BITmld

BINDmld

Prop

Female 16053 2.34 0.81 34.74% 2.22 0.70 31.39%

Male 48154 8.72 2.55 29.25% 8.73 2.56 29.35%

Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%

Gradient Boosting Heckman

Age n BITmld

BINDmld

Prop BITmld

BINDmld

Prop

[18 − 25) 976 0.13 0.05 39.07% 0.13 0.04 36.71%

[25 − 45) 28250 4.26 1.45 34.10% 4.11 1.30 31.61%

[45 − 65) 30496 5.64 1.60 28.45% 5.64 1.60 28.37%

over 65 4485 1.02 0.25 24.65% 1.08 0.32 29.28%

Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 12 of 16

Page 22: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Propensity for sex and age

Gradient Boosting Heckman

Sex n BITmld

BINDmld

Prop BITmld

BINDmld

Prop

Female 16053 2.34 0.81 34.74% 2.22 0.70 31.39%

Male 48154 8.72 2.55 29.25% 8.73 2.56 29.35%

Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%

Gradient Boosting Heckman

Age n BITmld

BINDmld

Prop BITmld

BINDmld

Prop

[18 − 25) 976 0.13 0.05 39.07% 0.13 0.04 36.71%

[25 − 45) 28250 4.26 1.45 34.10% 4.11 1.30 31.61%

[45 − 65) 30496 5.64 1.60 28.45% 5.64 1.60 28.37%

over 65 4485 1.02 0.25 24.65% 1.08 0.32 29.28%

Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 12 of 16

Page 23: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Propensity for geographic area

a) Heckman b) Gradient Boosting

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 13 of 16

Page 24: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Conclusions

• Advantages:

- all data are processed

- distribution free approach

- no transformation variable is required

- no problems with multicollinearity

• Further developments:

- extension of the analysis to the entire population

- robustification via ensemble with other models (Xgboost and Neural

Network)

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 14 of 16

Page 25: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Conclusions

• Advantages:

- all data are processed

- distribution free approach

- no transformation variable is required

- no problems with multicollinearity

• Further developments:

- extension of the analysis to the entire population

- robustification via ensemble with other models (Xgboost and Neural

Network)

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 14 of 16

Page 26: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Bibliography

[1] Statuto dell’Agenzia delle Entrate.

[2] Braiotta A., Carfora A., Pansini R.V., Pisani S.; Tax Gap and redistributive aspects

across Italy, 2015.

[3] Heckman James J.; Sample Selection Bias as a Specification Error, Econometrica 47,

no. 1 (1979): 153-61.

[4] Greene William H.; Econometric Analysis (Fifth ed.), Prentice-Hall, 2003.

[5] Friedman Jerome H.; Greedy Function Approximation: A Gradient Boosting

Machine, Annals of Statistics 29(5):1189-1232, 2001.

[6] Friedman Jerome H.; Stochastic Gradient Boosting, Computational Statistics and

Data Analysis 38(4):367-378, 2002.

[7] Bianca Zadrozny; Learning and Evaluating Classifiers under Sample Selection Bias,

Proceedings of the twenty-first international conference on Machine learning, 2004.

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 15 of 16

Page 27: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax

Thanks for your attention!

VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 16 of 16


Recommended