Date post: | 16-Aug-2015 |
Category: |
Data & Analytics |
Upload: | talumbau |
View: | 943 times |
Download: | 0 times |
Building TaxBrain:Numba-Enabled Financial Computing on the Web
T.J. Alumbaugh
July 25, 2015
© 2015 Continuum Analytics-
© 2015 Continuum Analytics-
Agenda
1. Background
2. Tax Calculator
3. Numba to the rescue
4. Webapp demo
5. Deployment
6. Lessons Learned
7. Future Work
8. Acknowledgements
3
BACKGROUND
© 2015 Continuum Analytics- Confidential & Proprietary
4
Open Source Policy Center
© 2015 Continuum Analytics-
A community with the goal of making policy analysis more trustworthy, accessible, and innovative by harnessing open-source methods to build cutting edge economic models.
5
Open Source Policy Center - Motivation
© 2015 Continuum Analytics-
• Computational economic models massively influence which policy ideas become law.
• What happens to the government’s budget, who will gain/lose, how will people’s behavior change, what happens to the economy?
• Existing models are proprietary software and most policymakers and the public don’t have access.
- Limited access inhibits creative policy solutions or wide participation in policy debates -> Bad for democracy
- Limited transparency inhibits external review and stifles innovation - > Bad for economy
6
TAX CALCULATOR
© 2015 Continuum Analytics- Confidential & Proprietary
7
Tax-Calculator Python package: taxcalc• Implementation of the Federal income tax code for
2013 – 2024ish.• Performs a “microsimulation” calculation– What is the effect on revenue if we raise/lower the capital
gains tax? Raise/lower the maximum taxable income for SS? Raise/lower the highest income tax rate? Increase/decrease the Earned Income Tax Credit? Do all at the same time?
© 2015 Continuum Analytics-
8
taxcalc computes revenue estimates
© 2015 Continuum Analytics-
Sample tax returns
Tax code parameters
taxcalc
Revenue projection
9
Sample Tax Returns: the Public Use File• Public Use File (PUF) is a licensed dataset made
available by the Statistics of Income (SOI) branch of the IRS (your tax dollars at work!)
• ~150,000 sample tax returns, with privacy-enhancing modifications, weighted to be statistically similar to the ~120,000,000 tax returns filed every year
© 2015 Continuum Analytics-
10
Your policy reform in action!
© 2015 Continuum Analytics-
PUF
Default Tax code parameters
taxcalc
Status quo Revenue projection
taxcalcPUF
User-specified Tax code parameters
Revenue projection for user-defined policy
Δ
Δ = your policy effect!
11
Your policy reform in action!
© 2015 Continuum Analytics-
PUF
Default Tax code parameters
taxcalc
Status quo Revenue projection
taxcalcPUF
User-specified Tax code parameters
Revenue projection for user-defined policy
Δ
Δ = your policy effect!
(Actually, there is Δ1,…, Δ10
because we do this for 10 budget years)
12
NUMBA TO THE RESCUE
© 2015 Continuum Analytics- Confidential & Proprietary
13
Numba helps humans read fast Python code
© 2015 Continuum Analytics-
Turning this….
14
Numba helps humans read fast Python code
© 2015 Continuum Analytics-
Into this…
15
But wait – there’s more!
© 2015 Continuum Analytics-
<img>numba_logo.png</img>
16
Take advantage of common patterns
© 2015 Continuum Analytics-
Nearly every function in taxcalc operates on columns of a DataFrame There are ~150 different columns, most functions take 10-30 arguments return 5-15 arguments. That’s a lot of typing.
A B C for i in range(x): … expressions with A[i],
B[i], C[i], etc. All the tax logic goes here!
17
Custom decorator: @iterate_jit
© 2015 Continuum Analytics-
• We handle the boilerplate by making custom wrapping functions at import time (and jitting the result)
• Caller calls function like this:
SSBenefits(params, records)
18
Custom decorator: @iterate_jit
© 2015 Continuum Analytics-
Function definition looks like this:
19
Custom decorator: @iterate_jit
© 2015 Continuum Analytics-
• Creates/Applies a wrapper ‘for’ loop function to the given function
• Jits that resulting function• Shuffles the right arguments in and out of the
DataFrames• ~ SAS-like programming interface to leverage
experience of tax modeling community
Act now while
supplies last!!
20
How can we get non-coders to use taxcalc and do their own policy microsimulation?
© 2015 Continuum Analytics-
21© 2015 Continuum Analytics-
www.ospc.org
22
TAXBRAIN DEMO
© 2015 Continuum Analytics- Confidential & Proprietary
23
The TaxBrain architecture• Django• Celery: One budget year “delta” is an
asynchronously executed ‘task’• Redis for message brokering
© 2015 Continuum Analytics-
24
Option 1: Use Heroku for everything- Add-ons: RedisGreen for redis- Additional dynos for computational work
© 2015 Continuum Analytics-
Gunicorn serving Django app
Celery worker node
Web dyno Worker dynoRedis
25
Option 1: Use Heroku for everything- Add-ons: RedisGreen for redis- Additional dynos for computational work
© 2015 Continuum Analytics-
Gunicorn serving Django app
Celery worker node
Web dyno Worker dynoRedis
Result: we found that only PX dynos can handle heavy workloads w/ high memory watermark. They are also expensive and didn’t perform up to our expectations.
26
Option 2: Heroku + AWS• Option 2: Heroku for web, AWS for workers
© 2015 Continuum Analytics-
…AWS node
AWS node
AWS node
AWS node
AWS node
AWS node
year 0
year n
Gunicorn serving Django app
(HTTP)Split the calculation over budget years and recombine when work is done
27
AWS Worker nodes• Flask + Redis + Celery + taxcalc• State-less API to do one year’s budget calculation
– Flask endpoints for:• start work + get ticket POST “START_JOB”• Is this ticket done yet? GET “QUERY_RESULT”• Provide the answer for this ticket GET “GET_RESULT”
• Deployed with salt, services running with systemd• Cheap enough to have surplus workers so we can provide graceful degradation of
service• TIP: For numba-ized work, used a threaded worker pool for celery, not the default
pool of worker processes
celery -A webapp.apps.taxbrain.tasks worker -P eventlet -l info
© 2015 Continuum Analytics-
28
All of the code that runs ospc.org is now public!• js, css, Django templates• Distributed task execution
© 2015 Continuum Analytics-
29
All of the code that runs ospc.org is now public!• js, css, Django templates• Distributed task execution
© 2015 Continuum Analytics-
http://www.github.com/OpenSourcePolicyCenter/webapp_public
30
DEPLOYMENT
© 2015 Continuum Analytics- Confidential & Proprietary
31
We use Heroku for deployment• It’s hard to beat:
git push heroku master
• TIP: Custom Heroku buildpacks that use conda:https://github.com/kennethreitz/conda-buildpack/
© 2015 Continuum Analytics-
32
Deploying TaxBrain• taxcalc changes rapidly• Policy analysts want anyone to be able to
reproduce their results• Whatever you produce on TaxBrain should be
easily reproducible on a local machine on any platform
© 2015 Continuum Analytics-
33
Deploying TaxBrain 1. git tag + git archive -> updates package
__version__ through versioneer2. conda build and upload packages to anaconda.org3. Deploy with Heroku (git push heroku master)(latest version automatically used at deployment with conda install) conda install –c ospc taxcalc
© 2015 Continuum Analytics-
34
LESSONS LEARNED
© 2015 Continuum Analytics- Confidential & Proprietary
35
Lessons Learned• Heroku for ‘compute’ is expensive for
memory/compute intensive applications• Git tag + versioneer + conda build +
anaconda.org = transparent cross-platform deployment, reproducible results and public history of changes
© 2015 Continuum Analytics-
36
Lessons Learned• Formulate your work as state-less operations– BAD (form Ax=b, apply pre-conditioner, solve with
GMRES, return x, use x)– GOOD (partition problem into N smaller Ax=b
problems, give those to a pool of workers, assemble answer after all work is done)
– This may not be the least amount of computational work
© 2015 Continuum Analytics-
37
FUTURE WORK/ACKNOWLEDGEMENTS
© 2015 Continuum Analytics- Confidential & Proprietary
38
Future Work• Dynamic scoring macroeconomic model• Healthcare models, Social Security, etc.• Visualization of TaxBrain results (embedded Bokeh plots, D3,
etc.)• Lots of improvements to OSPC.org & taxcalc. Open issues on
Github!
© 2015 Continuum Analytics-
You!
39
Acknowledgements• Matt Jensen, Managing Director OSPC• Zach Risher, Web Dev• Fellow Continuum developers:– Jake Lyons, Theo Lekkas, Andrew Farrell, Kevin
Colton
© 2015 Continuum Analytics-
THANKS FOR LISTENING!
git clone http://www.github.com/OpenSourcePolicyCenter/Tax-Calculatorgit clone http://www.github.com/OpenSourcePolicyCenter/webapp_public
© 2015 Continuum Analytics-
Email: [email protected]: @talumbau