+ All Categories
Home > Documents > Healthcare Costs

Healthcare Costs

Date post: 16-Aug-2015
Category:
Upload: ricardy-ricot
View: 19 times
Download: 2 times
Share this document with a friend
Description:
data analysis on healthcare costs
Popular Tags:
13
8/14/2015 healthcarecosts http://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 1/13 Healthcare Costs I stumbled upon some data regarding healthcare costs for various medical procedures and conditions in the USA. And so, I am going to analyze the data and see what we can find out from it. First, let me set up this Ipython notebook with the necessary requirements for our work. I am also going to open the file containing the data (the csv file) and present the first 3 rows of data in order to have an idea of what we are working with. Some computer code will be featured. But if you are not a computer programmer, don't worry about it. You will not need that skill to understand this work. Here we go. In [5]: # importing the various modules we will be using import pandas as pd import numpy as np pd.set_option("display.mpl_style", "default") # reading the data file and showing the first 3 rows healthdata = pd.read_csv("healthcarecosts.csv") healthdata[:3] Ok now, we can see what kind of data we have available. We have the definition of the various medical procedures and conditions "DRG Definition". We have the provider id, provider name, their address, their state. Also, the Average Covered Charges. In the original file, they say that the Average Covered Charges is the total amount that the provider charges. So, we will use this as the cost for the various procedures. We can also find out how many rows of data we have with the code below. And we find out that we have 163065 rows of data and 12 columns. In [9]: # finding number of rows and columns healthdata.shape But we don't need all those different kinds of data for our work. We are only interested in "DRG Definition", "Provider State" and "Average Covered Charges". So, let's manipulate the data in order to show only what we need. Also, for those inclined towards computer programming, you may note that the values in "Average Covered Charges" are strings. Well, if you have noticed, don't worry about it. I will also transform those values into numbers (floats) so we can do calculations with them. When I am done, you will no longer see the $ in front of their numbers Out[5]: DRG Definition Provider Id Provider Name Provider Street Address Provider City Provider State Provider Zip Code Hospital Referral Region Descript 0 039 EXTRACRANIAL PROCEDURES W/O CC/MCC 10001 SOUTHEAST ALABAMA MEDICAL CENTER 1108 ROSS CLARK CIRCLE DOTHAN AL 36301 AL Doth 1 039 EXTRACRANIAL PROCEDURES W/O CC/MCC 10005 MARSHALL MEDICAL CENTER SOUTH 2505 U S HIGHWAY 431 NORTH BOAZ AL 35957 AL Birmingh 2 039 EXTRACRANIAL PROCEDURES W/O CC/MCC 10006 ELIZA COFFEE MEMORIAL HOSPITAL 205 MARENGO STREET FLORENCE AL 35631 AL Birmingh Out[9]: (163065, 12)
Transcript

8/14/2015 healthcarecostshttp://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 1/13HealthcareCostsIstumbleduponsomedataregardinghealthcarecostsforvariousmedicalproceduresandconditionsintheUSA.Andso,Iamgoingtoanalyzethedataandseewhatwecanfindoutfromit.First,letmesetupthisIpythonnotebookwiththenecessaryrequirementsforourwork.Iamalsogoingtoopenthefilecontainingthedata(thecsvfile)andpresentthefirst3rowsofdatainordertohaveanideaofwhatweareworkingwith.Somecomputercodewillbefeatured.Butifyouarenotacomputerprogrammer,don'tworryaboutit.Youwillnotneedthatskilltounderstandthiswork.Herewego.In[5]: #importingthevariousmoduleswewillbeusingimportpandasaspdimportnumpyasnppd.set_option("display.mpl_style","default")#readingthedatafileandshowingthefirst3rowshealthdata=pd.read_csv("healthcarecosts.csv")healthdata[:3]Oknow,wecanseewhatkindofdatawehaveavailable.Wehavethedefinitionofthevariousmedicalproceduresandconditions"DRGDefinition".Wehavetheproviderid,providername,theiraddress,theirstate.Also,theAverageCoveredCharges.Intheoriginalfile,theysaythattheAverageCoveredChargesisthetotalamountthattheprovidercharges.So,wewillusethisasthecostforthevariousprocedures.Wecanalsofindouthowmanyrowsofdatawehavewiththecodebelow.Andwefindoutthatwehave163065rowsofdataand12columns.In[9]: #findingnumberofrowsandcolumnshealthdata.shapeButwedon'tneedallthosedifferentkindsofdataforourwork.Weareonlyinterestedin"DRGDefinition","ProviderState"and"AverageCoveredCharges".So,let'smanipulatethedatainordertoshowonlywhatweneed.Also,forthoseinclinedtowardscomputerprogramming,youmaynotethatthevaluesin"AverageCoveredCharges"arestrings.Well,ifyouhavenoticed,don'tworryaboutit.Iwillalsotransformthosevaluesintonumbers(floats)sowecandocalculationswiththem.WhenIamdone,youwillnolongerseethe$infrontoftheirnumbersOut[5]:DRGDefinitionProviderIdProviderNameProviderStreetAddressProviderCityProviderStateProviderZipCodeHospitalReferralRegionDescription0039EXTRACRANIALPROCEDURESW/OCC/MCC10001SOUTHEASTALABAMAMEDICALCENTER1108ROSSCLARKCIRCLEDOTHAN AL 36301 ALDothan1039EXTRACRANIALPROCEDURESW/OCC/MCC10005MARSHALLMEDICALCENTERSOUTH2505USHIGHWAY431NORTHBOAZ AL 35957ALBirmingham2039EXTRACRANIALPROCEDURESW/OCC/MCC10006ELIZACOFFEEMEMORIALHOSPITAL205MARENGOSTREETFLORENCE AL 35631ALBirminghamOut[9]: (163065,12)8/14/2015 healthcarecostshttp://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 2/13In[11]: #combiningonly"DRGDefinition","ProviderState"and"AverageCoveredCharges"datahealthdata2=healthdata[["DRGDefinition","ProviderState","AverageCoveredCharges"]]#firstmakingsurethatdatain"AverageCoveredCharges"arestringsbyconvertingthemintostrings,thenconvertingtofloatshealthdata2["AverageCoveredCharges"]=healthdata2["AverageCoveredCharges"].str[1:].astype(float)#showingfirst10rowsofthenewdatahealthdata2[:10]Ignorethewarning.Everythingisalright.Now,beforewegofurther,Iaminterestinginfindingoutwhatuniquevalues/nameswehaveforthemedicalprocedures.So,let'screatealistthatshowsonlyuniquevalues.SeebelowIn[13]: #convertingthevaluesfrom"DRGDefinition"intoalist.Butthiswillgiveusseveralinstancesofthesamevaluesnewlist=healthdata2["DRGDefinition"].tolist()#retrievingtheuniquevaluesbyconvertingthepreviouslistintoasetdataset=set(newlist)#thenconvertingthesetbackintoalistagain,butwithuniquevaluesthistime,foreaseofoperation.Andsortingthelist#thenshowingthelistwithitsuniquevaluesdatalist=[aforaindataset]datalist=sorted(datalist)datalistC:\Users\Ricardy\Anaconda\lib\sitepackages\IPython\kernel\__main__.py:5:SettingWithCopyWarning:AvalueistryingtobesetonacopyofaslicefromaDataFrame.Tryusing.loc[row_indexer,col_indexer]=valueinsteadSeethethecaveatsinthedocumentation:http://pandas.pydata.org/pandasdocs/stable/indexing.html#indexingviewversuscopyOut[11]:DRGDefinitionProviderStateAverageCoveredCharges0 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 32963.071 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 15131.852 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 37560.373 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 13998.284 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 31633.275 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 16920.796 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 11977.137 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 35841.098 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 28523.399 039EXTRACRANIALPROCEDURESW/OCC/MCCAL 75233.38Out[13]: ['039EXTRACRANIALPROCEDURESW/OCC/MCC','057DEGENERATIVENERVOUSSYSTEMDISORDERSW/OMCC','064INTRACRANIALHEMORRHAGEORCEREBRALINFARCTIONWMCC','065INTRACRANIALHEMORRHAGEORCEREBRALINFARCTIONWCC','066INTRACRANIALHEMORRHAGEORCEREBRALINFARCTIONW/OCC/MCC','069TRANSIENTISCHEMIA','074CRANIAL&PERIPHERALNERVEDISORDERSW/OMCC','101SEIZURESW/OMCC','149DYSEQUILIBRIUM',8/14/2015 healthcarecostshttp://localhost:8888/nbconvert/html/healthcarecosts.ipynb?download=false 3/13'176PULMONARYEMBOLISMW/OMCC','177RESPIRATORYINFECTIONS&INFLAMMATIONSWMCC','178RESPIRATORYINFECTIONS&INFLAMMATIONSWCC','189PULMONARYEDEMA&RESPIRATORYFAILURE','190CHRONICOBSTRUCTIVEPULMONARYDISEASEWMCC','191CHRONICOBSTRUCTIVEPULMONARYDISEASEWCC','192CHRONICOBSTRUCTIVEPULMONARYDISEASEW/OCC/MCC','193SIMPLEPNEUMONIA&PLEURISYWMCC','194SIMPLEPNEUMONIA&PLEURISYWCC','195SIMPLEPNEUMONIA&PLEURISYW/OCC/MCC','202BRONCHITIS&ASTHMAWCC/MCC','203BRONCHITIS&ASTHMAW/OCC/MCC','207RESPIRATORYSYSTEMDIAGNOSISWVENTILATORSUPPORT96+HOURS','208RESPIRATORYSYSTEMDIAGNOSISWVENTILATORSUPPORT


Recommended