Welcome
Data Analytics at Logitech
Snowflake + Tableau = #Winning
# T C 1 8
Avinash Deshpande
I am a futurist, scientist,
engineer, designer, data
evangelist at heart …
Find me at
Avinash DeshpandeChief Software Architect
www.linkedin.com/in/avinashpd1
Logitech Data Use Cases
Structured Semi-Structured Unstructured
Batc
h D
ata
Velo
cit
y R
eal-
Tim
e
Social Media
Sentiment
Predictive Analytics
Demand ForecastingPrice violations on
Retail sites
Data Warehousing Text Mining
Security Video
Analysis
Retail Data
scrapping
Machine Learning
ioT
Multi site ERP
Marketing Funnel
Sales Channel Mgmt
Smart Home
Natural Language
Processing (NLP) VR Gaming
Device Events
Analytics at ScaleSupporting Our Growing Business
Real-Time on Demand Delivery to Your Phone, Desktop, and Dashboard
Executive summaries
Customer by product
Product by customer
Demand/Supply updates
Market analytics/Market share
Marketing reports
Competitive analysis
Sentiment
Consumer persona generation
Granular consumer segmentation
Marketing spend optimization
Consumer value management
Consumer lifetime value analysis
Context based marketing
Cloud Empowers IT Organizations to Redefine the Way Data Services are Produced and Delivered
Elastic infrastructure
Simple, secure, robust, and scalableScalable
Pay as useEfficient
Managed servicesReliable
Transparency on usage patterns
Breadth of services Governed
Need for Data Virtualization
Abstract access to disparate data sources
A single semantic repository
Optimized data availability in real-time to consumers
Centralized, governed and secured data layer
Improve the User Experience
User Pain: Report is always slower when I want to use it (peak business hours)
Snowflake is able to flex-up compute power in seconds.
Business users can have their own isolated instance of right sized compute so that performance is always consistent for the work they do, and not impacted by what others are doing.
Improve the User Experience
User Pain: I want access to more historical data than I have today
Snowflake’s low cost, fast, infinitely scalable storage layer removes the limitations of adding and keeping more historical data than typical data warehouse solutions allow.
Improve the User Experience
User Pain: Commonly used reports always seem to be slow
Snowflake has the unique ability to globally cache commonly used queries that are sent via Tableau. This means that commonly used workbooks are almost always cached and end users experience extremely fast performance regardless of how many people are running the same workbook.
Improve the User Experience
User Pain: I want to explore non-traditional data sets that aren’t currently available
Unlike other traditional DW solutions, Snowflake treats non-traditional data types like JSON/AVRO/XML as first class citizens (direct SQL access and fast performance). This allows the data to be immediately available without complex ETL.
Improve the User Experience
User Pain: I’m tired of waiting for new data to be loaded into the system.
Snowflake’s unique architecture allows customers to implement new data ingestion processes like 24/7 loading. This lets end users “see” their data in near real-time vs the traditional nightly batch. Use Tableau Live Connection rather than Extract.
eDW Solution Architecture
Data Producer Data Consumer Business Layer Reporting / Advanced Analytics Layer
AWS
eBS -Exadata
Reports
IoT Solution Architecture
Edge Compute Data Consumer
Use Snowpipe
to enable real-
time ingestion
➢ Keep raw
data in Semi
Structured
JSON format
➢ Create
structured
objects with
Cleaned and/or
aggregated data
Options Edge
Compute
Business Layer
➢ Denodo
Views
➢ Create
business
specific views
for reporting
Reporting / Advanced Analytics Layer
➢ Reports
Kafka
SNOWFLAKE BENCHMARK
Other Popular Columnar dBArchitecture/Storage: Traditional shared
nothing architecture. Data lives on EC2 nodes, requiring costly 24/7 uptime, even when not in use.
Data Types: Requires use of additional tools (Hadoop, Mongo, etc.) to ingest and make semi-structured data available.
Scalability: Extended process to resize compute resources to accommodate additional demand.
Concurrency: Published limits of 50 concurrent users/queries, but generally slows down around 15.
Administration/Design: Need to continually manage: vacuuming, distribution/sort keys, compression, metadata, indexing, backups, etc. Need to understand data model in advance.
Snowflake
Architecture/Storage: Multi-cluster shared data architecture. Data stored in S3, allowing multiple EC2 compute clusters to access simultaneously without contention.
Data Types: Ability to ingest and query raw JSON, XML, Avro, Parquet without prior transformation.
Scalability: Data not coupled to compute, allowing the ability to resize instantly and shut down when not in use.
Concurrency: Ability to isolate users on separate compute resources to avoid contention. Auto-scale feature scales compute resources horizontally to support concurrent workloads.
Administration/Design: ZERO; free up your DBA team for other tasks. Load data in real time without need for model.
Difficult to set up and tune performance
Does not provide any options for end user to influence performance
Difficult to manage usage
Resources usage over time
Queries and data retrieved
Cost associated to increase capacity and
support
Need to add partitions
By default, concurrency limits allow you to
submit twenty concurrent DDL queries and
twenty concurrent SELECT queries at a time and
query timeout is 30 minutes
Schema needed ahead of time
For performance, data needs to be converted to
columnar
SNOWFLAKE
Performance out-of-box. Advanced tuning with
auto clustering
Allows to reserve various compute configurations
as needed
Usage can be segregated at compute level
Horizontal and vertical scaling without down time
Cost is consistent
No need to add partitions
Default concurrency is 300 (15x) and can be
raised if necessary
Schema on read
Default columnar format
ATHENA
Spark on Snowflake
• It's easier to manage data in tables than in files on S3.
• If you ever need to dedupe, update, or delete data, you can do that with standard SQL in
Snowflake but need to write a program to do it on S3.
• In order to get good performance, you have to optimize the file formats, partition sizes, etc when
working on files in S3.
• If you want to join the data with any other data in Snowflake, you can do it easily.
• It's easier to manage security in a database using RBAC than on files in S3 using policy
documents.
• The performance will be better running on top of Snowflake with the custom Spark connector's
pushdown capability. That feature pushes part or all of the sparkplan into Snowflake including
filters, projections, joins, and aggregates. This helps minimize the amount of data the spark cluster
needs to pull into memory and the amount of work it has to do to process that data.
Unique Snowflake Features
JSON: ingest raw JSON without
transformation. Query JSON with SQL and
correlate against relational data
Cloning: instant dev/test environments
or point in time snapshots.
Time Travel: Query data as of any point in time within the past 90
days
Query Caching: instant results for
Executive dashboards and commonly run
reports.
Backups: automatic cross data center
replication
Data Sharing: publish or consume data sets
to or from external clients without direct
system access
Auto-Scaling: dynamic horizontal
scaling for concurrency to deliver
consistent SLAs
Central Data Store: Get everyone under
one platform
Upgrades: weekly system updates with
zero downtime
Security: encryption by default
Charge Back: monitor business usage to
understand how much each user costs you
Big Data Fabric
AWS S3 Snowflake Facebook
Zendesk Paypal Shipstation
Google analytics Adobe analytics Amazon marketing
NLP Shopify
Data Virtualization
Humanizing Data Insights
Although big data and analytics have made
data more accessible to business users but
still requires human effort. The automation
enabled a business user (e.g. a sales rep) to
post a question (e.g. “What are the Q3 sales
trends for Product A in North America?) to a
chatbot in conversational language and
receive an answer with data insights that are
completely humanized (e.g. “The total Q3
sales for Product A in North America totaled
$200.4 M, a 15% increase from Q3 last year,
but only a 5% increase from last quarter.”
Enter a question
Click Send and wait for about 15 seconds for result
SEND BUTTON
Question Asked
Result Statistics
ANUVAAD
Provides you quick answers to your supply chain queries asked in English
Insights
Operations
Retail Pricing
POS
Sentiment Analysis
Video Analysis
Text Analysis
IOT
Please complete the
session survey from the
Session Details screen
in your TC18 app
Thank you!
#TC18