Date post: | 23-Jan-2018 |
Category: |
Technology |
Upload: | wes-mckinney |
View: | 2,606 times |
Download: | 0 times |
SaturdayMorningKeynoteWesMcKinney@wesmckinn
PyConAPAC2016(Seoul)
Me
DataPad
ApacheArrow
Featheribis
Inprocess:PythonforDataAnalysis:2ndEdi:onComing2017(inEnglishJ)
Q:Whatbringsyouhere?
Oursharedvalues
PrideinsoMwarecraMsmanship
Mystory
• AccidentalsoMwaredeveloper• 2007:Myfirstjob(financialresearchanalyst)
• IstartedwriPngPythonlibrariestodomyownworkbeQer
• SoonIwashelpingmycolleaguesworkbeQer,too
Tools
Tools
Empathythefeelingthatyouunderstandandshareanotherperson'sexperiencesandemoPons:theabilitytosharesomeoneelse'sfeelings
Source:Merriam-Webster'sLearner'sDicPonary
Opensourceiswonderful…
Opensourceiswonderful…butitcanalsobefrustraPng
Sustainableopensource
• Howtokeepcontributorsfromdrowning/burningout?
• Howtofundthework?
• Howtoprotectandservethecommunity?
TheGrind
“Thegrindisanendlessstreamofbugreports,requests,demands,quesPons,andoccasionalinquisiPons.” DHH,CreatorofRubyonRails
pandas,theopensourceproject
• PartsofcodedatebacktoApril2008• Over600uniquecontributorsonGitHub• AcPveprojectmaintainersrangefrom4-7people
• >6900ClosedIssues• >5100PullRequests
pandasatendof2012
April7,2014
"Somemightarguethat[Heartbleed]istheworst
vulnerabilityfound(atleastintermsofitspotenPalimpact)
sincecommercialtrafficbegantoflowontheInternet."
JosephSteinberg,Forbescybersecuritycolumnist
“Thereshouldbeatleast…[6]fullPmeOpenSSLteammembers,notjustone,abletoconcentrate…withouthavingtohustlecommercialwork.Ifyou’rea…inaposiPontodosomethingaboutit,giveitsomethought.Please.I’mgemngoldandwearyandI’dliketorePresomeday.”SteveMarquess,OpenSSLteam
ByNadiaEghbal,supportedbytheFordFoundaPon
Formoreonthis
“TheCathedralandtheBazaar”
Python’snormalizaPoninindustry
• Pythonhasbecomealeadinglanguageinsteadofsomething“experimental”or“risky”
• ManybusinessesfoundedonthegrowthofthePythonuserbase
• SeePaulGraham’s2004essay“ThePythonParadox”—howthingshavechanged!
Governance“theprocessesofinteracPonanddecision-makingamongtheactorsinvolvedinacollecPveproblem…”
M.HuMy(viaWikipedia)
OpennessandTransparency
Consensus
Someexamplegovernancedocuments
• NumPy(seethedocs)
• IPython/Jupytergovernance– github.com/jupyter/governance
• pandas– github.com/pydata/pandas-governance– ModeledaMerJupytergovernance
hQp://numfocus.org
hQp://apache.org
conda-forge
• Community-curatedcondapackagechannel(hostedonanaconda.org)
• Reproduciblebuildinfrastructure(Docker+CircleCI+TravisCI+Appveyor)
• AutomatedGitHubhelpertools
conda config --add channels conda-forge
Whatisnextforpandas?
• pandas1.0– Astable,maintenance-onlyrelease
• Beginning“pandas2.0”– PlanningsignificantrefactoringontheinternalsofSeries,DataFrame
Whypandas2.0?
• Somechangesdifficult/impossibletodoinanincrementalway
• pandas’srelaPonshipwiththeecosystemhasevolvedoverthelast5years
• Makepandas
– Fasteranduselessmemory– Fixlong-standinglimitaPons/inconsistencies– Easierinteroperability/extensibility
ApacheArrow
hQp://arrow.apache.org
HighPerformanceSharing&InterchangeToday With Arrow
• Each system has its own internal memory format
• 70-80% CPU wasted on serialization and deserialization
• Similar functionality implemented in multiple projects
• All systems utilize the same memory format
• No overhead for cross-system communication
• Projects can share functionality (eg, Parquet-to-Arrow reader)
FeatherFileFormatforPythonandR
• Problem:fast,language-agnosPcbinarydataframefileformat
• ByWesMcKinney(Python)andHadleyWickham(R)
• ReadspeedsclosetodiskIOperformance
• LeveragesApacheArrow
Thankyou
@wesmckinnhQp://wesmckinney.com
pandassprintonMonday!