+ All Categories
Home > Technology > But how do I GET the data? Transparency Camp 2014

But how do I GET the data? Transparency Camp 2014

Date post: 30-Nov-2014
Category:
Upload: jeffrey-quigley
View: 347 times
Download: 5 times
Share this document with a friend
Description:
This high level presentation talks about some of the most common ways to collect data in the context of the standard analytical process. This includes scraping, using APIs and how the frequency of update can influence which method you choose.
27
1 But how do I GET the data? Transparency Camp 2014
Transcript
  • 1. 1 But how do I GET the data? Transparency Camp 2014
  • 2. Shooju is a Web-Based Data Platform 2 Consolidate your internal and external data sources Make all data searchable from one place Provide continuous updating Seamlessly integrate with tools and applications Share data across your entire organization Save time and energy while reducing errors and problems with version control Shooju saves time, improves data quality and enhances data sharing across your entire organization
  • 3. The Analytical Process 3 Data Data Data Data Data Data Data Data Data Data Data Data
  • 4. The Analytical Process 4 Data Data Data Data Data Data Data Data Data Data Data Data some place
  • 5. The Analytical Process 5 Data Data Data Data Data Data Data Data Data Data Data Data some place your tool of choice
  • 6. The Analytical Process 6 Data Data Data Data Data Data Data Data Data Data Data Data some place your tool of choice your product
  • 7. The Analytical Process 7 Data Data Data Data Data Data Data Data Data Data Data Data some place your tool of choice your product The Fun Part
  • 8. The Analytical Process 8 Data Data Data Data Data Data Data Data Data Data Data Data some place your tool of choice your product The Not Fun Part
  • 9. Big data vs. small data 9
  • 10. A boring 2 x 2 10
  • 11. The harsh 80/20 reality 11 Most organizations spend more time collecting, cleaning, downloading, managing and wrangling data than they do conducting analysis
  • 12. Three ways to get data API Good Bad Scraping Manual 12 Defined as ETL (Extract, Transform, Load) process
  • 13. Method comparison 13 TechnicalExpertiserequired Time (and annoyance) Manual Scraping API
  • 14. 14 Average cost curve of data collection Manual Collection AverageCost Number of times data is collected
  • 15. 15 Average cost curve of data collection Manual Collection AverageCost Number of times data is collected Scraping
  • 16. 16 Average cost curve of data collection Manual Collection AverageCost Number of times data is collected Scraping API
  • 17. How do you get your data? What do you like? What dont you like? 17
  • 18. Once the data is scraped, where can it go? CSV XLS DBF SQL NoSQL Many others 18
  • 19. Where does your data go when you collect it? 19
  • 20. 1 Appendix
  • 21. Shooju Value Added Cost Savings By saving analyst time and energy, Shooju allows analysts to do more with less, reducing data management costs and putting more focus on high-value analysis. Added Quality Automating data processes internally will ensure that your data is accurate, up-to-date and consistent across your entire organization. Enhanced Decision Making Having more accurate data available faster with more analyst time left for analysis leads to enhanced decision making. 21 Cost Savings Added Quality Enhanced Decision Making Shooju Value Added
  • 22. 22 Shooju Sources Excel Add-In & Other Tools Custom BI Apps Web Search Auto- Import Drivers # of analysts retrieving time saved in retrieval # of sources frequency of retrieval # of analysts refreshing time saved in tool refresh # of sources frequency of refresh time to integrate data analysts contributing data # of tools created analyst upload time # of analysts searching time saved in search # of sources frequency of search 5 analysts 65 min / source 22 sources 18 times / year 11 analysts 74 min / source 22 sources 14 times / year 9 min / source 22 sources 32 times / year $97k (14%) $73k (10%) $248k (35%) $702kTotal: Cost Savings 13 analysts 14 wk of dev. saved 8 analysts contributing 2 apps created $284k (41%) 40 min 10 times / year Sample Cost Savings Cost Savings Added Quality Enhanced Decision MakingShooju Value Added * Based on real 40-person organization. Assumed annual wages vary between $30k and $140k. $410k savings equivalent to 10% of HR spend* Shooju speeds up custom BI application development by making all data natively accessible and continuously updated in any BI tool or custom app. USD (%)
  • 23. Added Quality: The Three Cs 23 Cost Savings Added QualityShooju Value Added Consistency Shooju ensures that all analysts are using the same data across all their tools and applications. By allowing analysts to upload their own data to the platform, internal data as well as external data now flows seamlessly - without messy spreadsheet links. Currency By automatically pulling in the latest source data through the Shooju importer layer, Shooju ensures that all of your spreadsheets and models are populated with the latest data. Our native plugins for Excel, Access and all your other tools allow data to flow through directly without any need for the analyst to download or copy and paste. Correctness The more data is touched by human hands, the more prone it is to errors. By streamlining workflows and automating work processes, Shooju eliminates most of these errors, saving time and ensuring that the data you rely on is more accurate. Enhanced Decision Making
  • 24. We support any data source 24 Ask us about non-mainstream data sources that traditional data providers dont carry.
  • 25. Shooju Data Process 25
  • 26. Shooju vs. Custom Data Warehouse Custom Data Warehouse Shooju Design Custom Plug-and-play Cost 7+ digits 5-6 digits Rollout timeline Months / Years Hours Scalability Minimal Infinite Flexibility Low High Maintenance High Low Stakeholders IT controlled Analyst run / IT maintained Tool and app support Clunky, requiring IT Native tool support 26 Data warehouse projects are costly, time consuming and result in inflexible systems with low adoption rates
  • 27. Shooju vs. Off-the-shelf Data Management* Off-the-shelf Data Management* Shooju Service focus Data provision/management Process improvement Prepackaged data feeds Many None Custom data feeds None (not natively supported) Included(all feeds are custom) Internal data integration Weeks (high consulting fees) Days (included in service) Process flexibility Low High Analyst learning curve Weeks Hours Ease of migrating off Very difficult/impossible Easy Annual fee 6-7 digits 5-6 digits 27 Data management* solutions focus on generic data provision rather than process improvement and limit analysts to a closed and inflexible data ecosystem. * Top-ranked providers in the EnergyRisk Data Management category include: Morningstar, ZE Power Group, SunGard, Allegro, Pioneer Solutions, SAS, and InteractiveData. See http://www.slideshare.net/Allegrodev/energy-risk-magazines-etrm-software-rankings-2013

Recommended