Pentaho8.0andBeyondMattHowardPentahoSr.DirectorofProductManagement,HitachiVantara
The forward-looking statements contained in this document represent an outline of ourcurrent intended product direction. It is provided for information purposes only and is not acommitment to deliver any new or enhanced product or functionality, or that we will pursuethe product direction described. Facts and circumstances may occur which may impactcurrent plans, resulting in changes to the information in this presentation. This informationis current only as of the date it is made and should not be relied upon in making purchasingdecisions. The development, release (if at all), and timing of any features or functionalitydescribed for the Pentaho products remains at the sole discretion of Pentaho.
SafeHarborStatement
Pentaho8.0andBeyond
1ProductVision
2Pentaho8.0
3ProductRoadmap
ProductVision
HITACHIDATASYSTEMS> Contentplatform> Storagesolutions
ThePowerofThree
PENTAHO> DataIntegration> BusinessAnalytics
HITACHIINSIGHTGROUP> Lumada IoT
OperationalData BigData DataStream Public/PrivateClouds
ConsumerBusinessAnalystDataAnalyst/DataScientistDataEngineer
CustomandSelf-ServiceDashboards
InteractiveQueryandAnalysis
PentahoDataIntegrationDataPreparation|IntegratedMachinelearning
OPEN AND EMBEDDAB L E
ProductionReporting
OperationalData BigData DataStream Public/PrivateClouds
ConsumerBusinessAnalystDataAnalyst/DataScientistDataEngineer
CustomandSelf-ServiceDashboards
InteractiveQueryandAnalysisProductionReporting
PentahoDataIntegrationDataPreparation|IntegratedMachineLearning
PentahoBusinessAnalyticsPlatform
OPEN AND EMBEDDAB L EOPEN AND EMBEDDAB L EOPEN AND EMBEDDAB L E
FutureVision:ASingleConsistentExperience
DataPrepDataEngineering Analytics
Ingestion Processing Blending DataDelivery DataDiscovery/Analysis
Analysis&Dashboards
Administration Security LifecycleManagement
DataProvenance
DynamicDataPipeline Monitoring Automation
Pentaho8.0
IntroducingPentaho8.0
Challenge#1Datavolumesandvelocityaregrowingexponentially
Challenge#2Processingandstorageresourcesareconstrained
Challenge#3ShortageofBigDatatalentandlackofproductivity
Pentaho8.0Broadensconnectivitytostreamingdatasources
• ConnecttoKafkastreams• StreamprocessingwithSpark• BigdatasecuritywithKnox
Pentaho8.0Optimizesprocessingresources
• EnhancedAdaptiveExecution(AEL)• NativeAvroandParquethandling•Workernodesfor“Scale-out”
Pentaho8.0Booststeamproductivityacrossthepipeline
• Dataexplorerfilters• ImprovedrepositoryUX• Extendedoperationsmart
StreamingforTimeSensitiveInsight
Enableusecasesthatrequirereal-timeprocessing,monitoringandaggregation• Real-timedevicemonitoring• Log-fileaggregation• Notifications• Andmore…
NEWinPentaho8.0ü KafkaProducerStepü KafkaConsumerStepü GetrecordsfromstreamStepü SparkstreamingviaAEL
Pentaho7.1– AdaptiveExecutionforSpark
ü NoCoding
ü BuildOnce
ü ExecuteonAny*Engine
PDI
PentahoKettle
*CurrentlyAvailableEngines
EnhancedAdaptiveExecution
Simplifiedsetup• Eliminated“Zookeeper”component• Reducednumberofsetupsteps
Hardeneddeployment• Fail-overattheedge• Kerberosimpersonationforclient
Moreflexible• Supportmultiplerunconfigurations• Customizeclustersettingsperjobtype
PDIClient
Spark/HadoopProcessingNodes
HADOOPCLUSTER
AEL-SparkEngine
(SparkDriver)
AEL-SparkDaemononEdgeNodes
Hadoop/SparkCompatibleStorageCluster
HDFS AzureStorage
AmazonS3 Etc…
SparkExecutors
WorkerNodesforScalingOut
Scaleworkitemsacrossmultiplenodes(containers)
• Easilyaddandremoveresourcesasrequired
• Monitorandbalancechangingworkloads
• Deployonpremise,cloudandhybrid
WorkerNode(a)
WorkerNode(b)
WorkerNode(c…)DistributeandScale
NEWinPentaho8.0ü Containerframeworkü Orchestrationframeworkü Nodemonitoringü EnhancedHAimplementation
WorkerNodesArchitecture
WORKERNODES
OrchestrationFramework
ContainerFrameworkPentahoServer
WN1e.g.KJB
WN2e.g.KTR
WN…n“Executor”
Orchestration(Scheduler,monitoring,security,etc.)
Controller(HA)
Master(Standby)
Master(Standby)
Master(Working)
PentahoRepository
PentahoClients
Poweredby…
Pentaho7.0– DataExplorer
Accessvisualizationsduringdataprepforinspectionandprototyping
DataExplorerFilters
EnhanceddatainspectioninPDI
• Identifydatatobecleanedorremoved
• Deliverdatatothebusinessmorequickly
ENHANCEDinPentaho8.0ü Numericfiltersü Stringfiltersü Include/Excludedatapoints
Pentaho8.0– CompleteDataIntegration• FiltersinDataExplorerforenhanceddatainspectionduringprep
• NewPDIRepositoryDialogsforbetterusability• RunConfigurationsforJobsforseamlessuserexperience
BigData• StreamDataProcessingtosimplifynearrealtimeintegrationwithKafka
• EnhancedAELforreliability,performance,andsecurity
• BigDataFileFormatstosupportcrucialHadoopusecases
• BigDataSecuritywithHDPKnoxGateway• VFSImprovementsfornamedHadoopclusters
EnterprisePlatform• WorkerNodesScale-OuttodrivesuperioragilityandTCOforenterprises
• RubyTheme– newplatformbranding
AdditionalItems• OpsMartforOracle,MySQL,SQLServer• BigDataSandboxVMupdates• Platformpasswordsecurityimprovements• PDIMavenization forinfraalignment• Documentationimprovementsonhelp.pentaho.com
ProductRoadmap
Scale-outDeployment
MetadataManagement
OperationsManagement
CloudDeployment
AdaptiveExecution
SparkExecution
StreamProcessing
MachineLearning
DataExploration
VisualDataPrep
EmbeddedAnalytics
DataCatalog
EnterprisePlatform
BigDataProcessing
EMERGINGTRENDSANDTECHNOLOGYAdvancedAnalytics|Real-time
VisualDataExperience
PENTAHOFOUNDATIONAL INVESTMENTAREAS
RoadmapInitiatives
StrengtheningtheBridgeBetweenDataandInsight
DATAEXPLORER
Source1Source2Source3Source4Source5
ü Visualdatainspectionü Intuitivedataprepü Advancedvisualization
ü Governedaccessü Searchablemetadataü CollaborationCATALOG
InlineDataPrep– VisionIntuitive,excel-liketransformationdesign
FieldStatisticsFieldType:IntegerRecords:10,000Cardinality:273Min<count>:1Max<count>:23BinSize(%):Quintile
IntegratedProfiling
InlineModel
MergeFields
InlineTransformation
PentahoMachineLearningOrchestration
DataExplorer
NotebookIntegrations
NativeAlgorithms
Catalog
AdaptiveExecution
Roadmapprojectsthatserveemergingneedsofdatascientists.
PentahoRoadmap Featuresanddatesaresubjecttochange.
Nov2017 1H18 (8.1) Future
VISUALDATAEXPERIENCE
• DataExplorerFilters • CatalogI• VisualProfiling
• CatalogSearch• DataPrepfromDET• LayoutManager
• NewUserConsole• DataScienceViz• Real-timeViz
(BIG)DATAPROCESSING
• KafkaInterface• SparkStreaming• ParquetandAvro• EnhancedAEL
• StreamingII• EnhancedJSON/XML/ORC• AEL- extenddistros
• AdvancedProfiling• RulesValidator• NativeMLalgorithms• AEL– Flink
• ThinKettle(Composer)• WebDesigner• DataOperationsMgr.• AEL– Next
ENTERPRISEPLATFORM
• Scale-outFramework• FoundryIntegration
• UnifiedMonitoring• HardenMetadataBridges• Vantara Integrations
• EnhancedUpgrade• EnhancedSecurity• NewContentLifecycle• Vantara Integrations
• MetadataManager• BusinessGlossary• Multi-tenancy• Vantara Integrations
ECOSYSTEM • AELHDP,MapR • GoogleCloudPlatform• Cassandra/NoSQLUpdate
• Multi-cloudOrchestration• CloudAppConnectors
• Mainframe• EnhancedSAPandSFDC
HitachiVantara Portfolio
FoundryServicePlatformWorkflow Scheduling Security Clustering MonitoringRepositorySearch
ApplicationStudioDashboards Visualization Notifications AppDevelopment
StorageConvergedInfrastructure AutomatedManagement DataProtectionFlashStorage
DataIntegrationAssetManagement AnalyticsEdgeProcessing• Assetregistry• Datacatalog• Metadatamanagement• Modelingandlineage• Governance
• Dataconnectors• Transformationengines• Profilingandquality• Datablending• Datapreparation
• Businessanalytics• Contentanalytics• Artificialintelligence• Batchandstream
SoftwarePlatform
ApplicationFramework
Storage
EdgeProcessing AssetManagement AnalyticsDataIntegration
IoTSolutions– fromEdgetoOutcomes
Sensors
Things
People
FogLayer Core
IoT DataPipeline
Telemetry
Edge
AssetRegistry
StreamQueues
Edge Core
Sensors
Things
People
Edge
Filtering
AssetRegistry
StreamQueues
Lumada IoTDataPipeline
Insights Outcomes
Ingest
Process
Visualize
Model
Predict
Notify
IoT AnalyticProcessor
SMARTCITY
SMARTBUSINESS
SMARTDATACENTER
SMARTINDUSTRY
UnlocktheBusinessValueinYOURData
YOUR
DA
TA
Video,ImageandAudioEmailand DocumentsTransactionalData IT,Sensorand MachineLogsSocialMedia
HitachiContentPlatform
TX TX
YOUR
STRA
TEGY NeedforBetterInsights ToAchieveBetterOutcomes
BigDataAnalytics
ContentExploration
Pentaho
HitachiContentIntelligenceYOUR
INSH
GTS
HITACHIDATASYSTEMS> Contentplatform> Storagesolutions
ThePowerofThree
PENTAHO> DataIntegration> BusinessAnalytics
HITACHIINSIGHTGROUP> Lumada IoT
Summary
Summary
Whatwecoveredtoday:
• ProductVision• Pentaho8.0Release• ProductRoadmap
NextStepsWanttolearnmoreaboutPentaho8.0andproductroadmap?
• Otherrecommendedbreakoutsessions:– ProcessingBigDatawithPentaho:RakeshSaha– OperatingPentahoatScale:JensBleul
• SolutionExpo– Pentaho8.0andBeyond– Lumada IoTPlatform– HitachiContentPlatform– SparkProcessing– Andmore….
Pentaho8.1– Preview
SomeCandidateProjects• EnhancedStreaming• EnhancedProfiling• GoogleCloudPlatform• UnifiedMonitoringandLogging• EnhancedMetadataHandling
Pentaho8.1ExpectedAvailability
Q22017