Big-DataAnalyticsArchitectureforBusinesses:
Open-sourcePerspectiveMert Onuralp Gökalp,KeremKayabay,MohamedZaki
CambridgeServiceAlliance(CSA)InstituteforManufacturingUniversityofCambridge
January2018
Whystudyopen-sourcetoolsforBigData?
• Open-sourcetoolshavebecomethestandardBigDataprocessingplatforms*• Thegap:Studytheopen-sourcetoolsconsideringbothmanagerialandtechnicalperspective• SomeQuestions• Whypeopleprefercommercialsolutionsratherthanopen-source?• DoweneedcommercialBigDatasolutions?
TheBigToolsEra
• Manytoolscontinuetoemergetodealwithbigdataatafastpace• Characteristics:Volume,speed,diversity• Problems:Processing,storage,manipulation,aggregation,visualization
• Sometoolsonlyaimtoanalyse datainacertaindomain• InternetofThings,EdgeComputing
• Justbyreviewingopen-sourcetools,wehavecomeacross6500suchtoolsandfiltereddownto241
Open-sourcebigdatatools
• TypicallysupportedbycompaniesthatprovideservicesoverInternet• Google,Yahoo,Twitter,LinkedIn
• Toprovidebetterservicestotheirusersandthird-partycustomers• ThetoolsaremadeavailabletoITindustryasopen-sourcetools• Theyhavebecomethestandardbigdataprocessingplatforms
Someexampletools
BigDataProcessing BigDataCharacteristic ToolsandTechnologies
Batchprocessing Volume Hadoop,Spark,Flink
Streamprocessing Velocity Storm,Samza,S4, SparkStreaming,Flink Streaming
BigDataStorage BigDataCharacteristic ToolsandTechnologies
NoSQL Variety MongoDB,Cassandra,Hbase,Redis
Choosingtherighttoolset
• Choicesdependonthecharacteristicsofdataanddomainofoperation• Businessesincurcoststryingtoadoptnewtechnologies:technicaldebt• Trainingtheworkforce• Changeexistingsourcecodetorunonnewerversions• Changetheunderlyingtoolset
Whyfirmsoutsource?
• Trackingthedevelopmentsinthisdomainishard• Mostofthetoolsareunknowntothebusinessworld
• NottolagbehindthehypeBUT commercialsolutionproviders• Relyonasubsetofavailableopen-sourcetools• Donothavethedomainspecificexpertise• Donotsolvetechnicalandsoftchallenges
Aimofthisstudy
• Systematicallyreviewtheopen-sourcetoolsinthebigdatadomain• Establishamethodfortrackingthedevelopmentsfortheopen-sourcetools• Buildareferenceopen-sourcebigdataanalyticsarchitecture• Analyse firmstogivedirectionstobusinessesusingtheproposedarchitecture
Toolselectionprocess
Somefigures
Open-sourceArchitecture
DistributionofOpen-SourceTools
Howtochooseabigdatatool?
• Weneedtocomeupwithcriteria• Wecanlookat(113)real-worldusecases,solutionbriefs,whitepapers&blogpostsfromawiderangeofindustries• Telecommunication,healthcare,banking&finance,manufacturing,transportation,energy
• Secondarydata-setssupporttheproposedbig-datareferencearchitecture
Secondaryuse-casecompanydistribution
Howtochooseabigdatatool?
• Timingrequirement:Batchvsstreamprocessing• Datasize: In-memoryvson-diskprocessing• Platformindependency: Interoperabilityofabigdatatool• Datastoragemodel: Graph-based,key-value-based,document-based,time-series-based
Problemsofarchitecturedevelopmentinbigdata• Choosingthebesttool• Abundanceoftools• Nosinglebesttool• Maturityofatool
• Domain-specificchallenges• Thegapbetweendomain-specificknowledgeanddatascience
• Firm-specificsoftchallenges
Problemsofarchitecturedevelopmentinbigdata• Firm-specificsoftchallenges• Managerialskillsdeep-rootedinanorganization• Lackofdata-drivenorganizationalculture• Customersmaynotbeabletoperceivethevalueofbigdata
Tosumup
• Newertoolsneverceasetoemergeinthisdomain• Wecanforeseewheretheindustrywillfocusresearchefforts• Organizationsshouldtrytobuildtheirownbigdataarchitecture• Relyonopen-sourcetoolsinsteadofimposedcommercialsolutions
Tosumup
• Organizationsshouldtrytobuildtheirownbigdataarchitecture• Itisrewarding• Capturedomain-specificknowledge• Theprocesswouldbuildadata-drivencultureanddeveloptherightmanagerialskills• Betterdecision-making
Thankyou!
• Q&A
Forthcoming Webinars
Date14:30hr BST
Topic Invitedspeaker
2018Jan15th BigDataAnalyticsArchitectureforBusiness Mert/Kareem/MohamedFeb12th DigitalBusinessTransformationandStrategy:What
doweknowsofar?MariamHelmy IsmailAbdelaal
Mar12th Doesbuyers’dependencetranslateintofinancialperformance?Anempiricalanalysis ofmanufacturer-serviceproviderrelationships
OrnellaBenedettini