Opower @ Hadoop Summit North America Use Dependency Injection to get Hadoop out of your application code June 27, 2013 Eric Chang Technology Lead, Data Services Opower
Transcript
1. Opower @ Hadoop Summit North America Use Dependency
Injection to get Hadoop out of your application code June 27, 2013
Eric Chang Technology Lead, Data Services Opower
2. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem
Statement 2. Solution 3. Example 4. Opower case study 5. Wrap
up
3. Opower @ Hadoop Summit North America 3 Problem Statement
Hadoop is hard, lets go shopping!, or Effective Separation of
Concerns in Hadoop
4. Opower @ Hadoop Summit North America 4 Problem Statement Why
Separation of Concerns? Integration/migration of existing code
Allows for code re-use Allows for different levels of expertise
Greater testability Hadoop doesnt do Separation of Concerns
serialization, input/output formats, and partitioning are not
portable provides little guidance/out of the box functionality for
integrating code components (existing or new)
5. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem
Statement 2. Solution 3. Example 4. Opower case study 5. Wrap
up
6. Opower @ Hadoop Summit North America 6 Solution Dependency
Injection, or Dont call us, well call you
7. Opower @ Hadoop Summit North America 7 Solution: DI,
illustrated aRealtimeCallFromTheWeb() { IoC container
BizServiceImpl Realtime ReadDAO Realtime WriteDAO
businessService.run() } Realtime DataStore
8. Opower @ Hadoop Summit North America 8 Solution: DI,
illustrated IoC container BizServiceImpl reduce(key, values,
context) { ContextBacked WriteDAO businessService.run() }
ValuesBacked ReadDAO
9. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem
Statement 2. Solution 3. Example 4. Opower case study 5. Wrap
up
10. Opower @ Hadoop Summit North America 10 Example
Small-batch, Artisanal WordCount -> Petabyte-scale WordCount*
*healthy suspension of disbelief required refs
http://wiki.apache.org/hadoop/WordCount
11. Opower @ Hadoop Summit North America 11 Example: Artisanal
WordCount You live in a borough of NYC and have a beard Youve built
a great business around counting words, one at a time, in small,
handcrafted batches in linear O(n) time You receive files from
customers and run your simple but effective code You had the
foresight to know that some day you need to scale up. So you
created a properly componentized architecture: Domain objects Data
access layer Service layer (application logic)
12. Opower @ Hadoop Summit North America WordCountDTO word :
String count: int 12 Example: Artisanal WordCount getWords() :
Iterable writeWordCount(count : WordCountDTO) countWord(word :
String) WordCount ServiceImpl ArtisanalWord CountDAO 1 2 3 1.
Retrieve words 2. Count words 3. Write count
13. Opower @ Hadoop Summit North America 13 Example: Artisanal
WordCount Core business logic: WordCountServiceImpl.countWord()
public void countWord(String word) { int wordCount = 0; for(String
nextWord : wordCountDAO.getWords()){ if(nextWord.equals(word))
++wordCount; } WordCountDTO wordCountDTO = new WordCountDTO(word,
wordCount); wordCountDAO.writeWordCount(wordCountDTO); }
14. Opower @ Hadoop Summit North America 14 Example: Artisanal
WordCount IoC configuration (Google Guice) public class
WordCountGuiceModule extends AbstractModule { ... @Override
protected void configure() { bind(WordCountService.class)
.to(WordCountServiceImpl.class); bind(WordCountDAO.class)
.toInstance(this.wordCountDAO); } }
15. Opower @ Hadoop Summit North America 15 Example: Artisanal
WordCount Artisanal WordCount wiring and execution WordCountDAO
wordCountDAO = new ArtisanalWordCountDAO(inFile, outFile);
WordCountService wordCountService = Guice.createInjector( new
WordCountGuiceModule(wordCountDAO)
).getInstance(WordCountService.class); for(String word :
getWordsToCount()) { wordCountService.countWord(word); }
16. Opower @ Hadoop Summit North America 16 Example: Artisanal
WordCount artisanalWordCount() { IoC container WordCountServiceImpl
ArtisanalWord CountDAO wordCountService .countWord(hat) } bat cat
hat mat hat sat rat
17. Opower @ Hadoop Summit North America 17 Example: Artisanal
WordCount
18. Opower @ Hadoop Summit North America 18 Example: Petabyte
WordCount Indie days are over: petabytes of words! O(n) wont cut it
Hadoop to the rescue. You partition by word in your map phase. Your
reduce method looks like: public void reduce(Text key, Iterable
values, Context context) MapReduceWordCountDAO fulfills the
WordCountDAO contract (more on this later) WordCountDTOs are
written to an MR context and collected
19. Opower @ Hadoop Summit North America 19 Example: Petabyte
WordCount reduce(key, values, context) { IoC container
WordCountServiceImpl MapReduce WordCountDAO wordCountService
.countWord(key.toString()) } bat cat hat mat hat sat cat bat: cat:
hat:
20. Opower @ Hadoop Summit North America 20 Example: Petabyte
WordCount Petabyte WordCount wiring and execution public void
reduce(Text key, Iterable values, Context ctx){
MapReduceWordCountDAO wordCountDAO = new
MapReduceWordCountDAO(key,values,ctx); WordCountService
wordCountService = Guice.createInjector( new
WordCountGuiceModule(wordCountDAO)
).getInstance(WordCountService.class);
wordCountService.countWord(key.toString()); }
21. Opower @ Hadoop Summit North America 21 Example: Petabyte
WordCount
22. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem
Statement 2. Solution 3. Example 4. Opower case study 5. Wrap
up
23. Opower @ Hadoop Summit North America 23 Opower case study:
Bill Projection Opower in 5 bullet points We Help People like you
& me Reduce Your energy usage by working with utility companies
to analyze energy usage and provide actionable insights One of the
ways we do this is via Bill Projection
24. Opower @ Hadoop Summit North America 24 Opower case study:
Bill Projection How it works Retrieve energy usage (kWh, therms)
Forecast usage Apply rates to project costs Rate Engine rates
$30
25. Opower @ Hadoop Summit North America 25 Opower case study:
Bill Projection DI used to employ the same code components for
batch and in-process, synchronous (real-time) calculations Batch
M/R calculations In-process calculations web emailsms ivr Bill
Projection code components Curated data inputs Results
validation
26. Opower @ Hadoop Summit North America 26 Opower case study:
Bill Projection Spring IoC container BillForecastServiceImpl
billForecastService .forecast() } HBase map() reduce(key, values,
context) { RateEngineImpl MapReduceDAOMRUsageDAO
27. Opower @ Hadoop Summit North America 27 Opower case study:
Bill Projection calculateBillProjection() { Spring IoC container
BillForecastServiceImpl RateEngineImpl MapReduceDAOHBaseUsageDAO
HBase billForecastService .forecast() }
28. Opower @ Hadoop Summit North America 28 Opower case study:
Bill Projection Benefits of DI solution Were able to use pre-Hadoop
Rate Engine code component Calculations can be applied in batch
and/or in real-time Good test coverage
29. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem
Statement 2. Solution 3. Example 4. Opower case study 5. Wrap
up
30. Opower @ Hadoop Summit North America 30 Wrap up Dependency
Injection + Hadoop gives you Separation of Concerns Batch and
real-time calculations using the same code Some limitations Code is
sufficiently componentized Assumes domain classes can survive MR
partitioning Somebody still has to know MR Opower employs DI +
Hadoop to serve up Bill Projections using a mixed batch + real-time
workflow
31. Opower @ Hadoop Summit North America 31 Wrap up Questions?
Eric Chang Technical Lead, Data Services Opower [email protected]
http://www.linkedin.com/in/ericgchang Artisanal WordCount example:
https://github.com/opower/artisanal-word-count