+ All Categories
Home > Documents > InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data...

InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data...

Date post: 27-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
13
InsideBIGDATA Guide to Retail by Daniel D. Guerrez BROUGHT TO YOU BY
Transcript
Page 1: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

InsideBIGDATA Guide to

Retailby Daniel D. Gutierrez

BROUGHT TO YOU BY

Page 2: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

2

Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled with insights are at the heart of what drives this business. It’s a logical consequence then that retail is the vertical market that adopted big data and technologies like Hadoop earlier than many other industries. Retail started with diverse transactional data but is now much more sophisticated in the way technology is being applied toward gaining competitive advantage. Big data refers to huge data sets characterized by larger volumes (by orders of magnitude) and greater variety and complexity, and generated at a higher velocity than companies have faced before. The use of big data is all about deriving real meaning from the increasing amount of data everywhere and providing richer insights into business patterns and trends, to drive operational efficiencies, and to improve advantage in a competitive marketplace.

So what is big data in terms of its relevance to the retail industry? In the simplest terms, big data offers a means to understand shoppers via a myriad of digital touch points—from their online purchases, to their presence on social networks, to their visits

to brick and mortar stores (using the shopper’s cell phone Wi-Fi to track physical movements). But while big data might hold the key to newfound consumer insights, retailers are grappling with how to meaningfully process—and ultimately monetize —huge amounts of unstructured data in the form of consumer tweets, images, video, and more. In addition, machine generated structured data adds yet another dimension, e.g. sensor data like RFID tags and GPS data, web log data and point-of-sale data.

Retailers that understand how to analyze the multitude of digital bread crumbs left behind by unknowing browsers and buyers are finding themselves at a major advantage over competitors that solely rely on intuition. Big data enables companies to create comprehensive customer profiles and precise product recommendations. Today, customers are finding the products they want quicker than ever before, and in many cases the items are finding them, thanks to targeted ads.

Big Data for Retail – An Overview

ContentsBig Data for Retail – An Overview ........................... 2How Big Data is Being Used in Retail ...................... 4

Customer Relationship......................................... 4Marketing and Promotional ................................. 4Operational Examples .......................................... 4

360-Degree View of the Customer ......................... 5Retail Gains with Distributed Systems .................... 6

Apache Hadoop ................................................... 6Apache Spark ....................................................... 7

Big Data Solutions for Retail ................................... 8 Hadoop Solutions ................................................ 8ETL Offload Reference Architecture ..................... 9

The Future of Big Data and Retail ......................... 11Case Studies: Dell Focused Customer Use Cases .. 11

Large Retailer ..................................................... 11Staples ............................................................... 12MetaScale .......................................................... 12

Summary .............................................................. 13

While big data might hold the key to newfound consumer insights, retailers are grappling with how to meaningfully process – and ultimately monetize —huge amounts of unstructured data in the form of consumer tweets, images, video, and more.

Page 3: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

3

By placing buying habits under the microscope, companies are perfecting the science of impulse buying. Amazon and Alibaba are renowned for their ability to suggest products based on who you are, what you look at and what you’ve bought. It was reported that in the first year the big data technology stack was deployed on Walmart.com, there was an increase of 20% in the number of shoppers completing a purchase after searching for a product because they were able to quickly find what they were looking for (Bloomberg Business, April 2014). Physical retailers are now using big data, storing massive amounts of information on severs and using software to search for trends, to drive more people into their stores.

Whether the targeted ads are online or off, the challenge for retailers is to avoid seeming invasive. We’ve all heard the anecdote of the retailer who knew a female shopper was pregnant before her family based on her purchase patterns. Seemingly invasive or not, stockpiling information isn’t effec-tive if you don’t have enough of it or the means to properly sift through it.

One important aspect of big data in retail is the more data you collect and act on, the greater the benefit. Being on the front end of the customer relationship, it’s important for a retailer to understand everything about a customer, engage them in every way possible, and continuously build on that relationship. This is how big data provides benefit to the retailer in terms of customer intelligence leading to extending customer life-time value (LTV).

Retail generates a flood of complex structured and unstructured data. There is a vast number of sources of this data, but for a short-list we can consider the following:

• Point-of-sale (POS) devices• Websites• Mobile commerce solutions• Social media sites like Twitter and Facebook• Customer loyalty programs• Video surveillance systems with video

analytics that record store traffic patterns, employee-customer interactions, and customer-merchandise interactions (such as the dwell time around an end cap)

• UPC and RFID readers• Employee devices, including PCs,

smartphones and other handheld devices• Sensors: Near Field Communication (NFC)

and Bluetooth Low-Energy (BLE) on customer smartphones, real-time location systems (RTLS), Wi-Fi and GPS—electronic asset protection at the point of sale (POS) and point of exit, bar codes and application-specific sensors.

Connecting these individual pieces of data is what the future of retail looks like. In fact, online companies like Amazon are already heavily invested in finding the connecting points—the company’s product recommendations are a result of big data analysis, and Walmart and other large retailers are busy putting in place similar big data tools. But big data depends on the collection and transfer of data, including initial analysis or analytics, and then the ability to deploy solutions based on big data business intelligence.

This guide is directed toward line of business leaders in conjunction with enterprise technologists with a focus on the above opportunities for retailers and how Dell can help them get started. The guide also will serve as a resource for retailers that are farther along the big data path and have more advanced technology requirements.

Being on the front end of the customer relationship, it’s important for a retailer to understand everything about a customer, engage them in every way possible, and continuously build on that relationship. This is how big data provides benefit to the retailer.

Page 4: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

4

Today, the customer expects that a retailer knows their full history: purchases, preferences and promotions, regardless of channel or device. Harnessing data at a very large scale, i.e. every interaction, rather than each transaction, enables creation of a data-driven, robust customer profile that reveals their preferences and aspirations. Big data enables all of these customer touch points to be assembled into a consolidated profile—a single, comprehensive view of the customer which may then be used to identify sales opportunities that historically have been managed through sales associate relationships. Many high-end retailers maintain strong personal relationships with customers, but that model does not scale.

The appeal of big data in the retail industry is clear. Organizations like the National Retail Federation (NRF), the industry’s largest advocacy organization, have embraced big data early on—with conferences having keynote addresses, breakout sessions and roundtable discussions, all holding a data-centric view for the future of the industry.

Here are a few common use case examples related to the retail industry where the benefits of big data technology can be realized:

4 Customer Relationship• Enablement of a 360-degree customer view• Recommendation systems to predict customer

preferences• Point-of-sale transaction analysis in order to

target retail promotions designed to make customers buy

• Personalization: evolving the purchase funnel, including personalized pricing and offers

4 Marketing and Promotional• Improved A/B testing• KPIs such as loyalty metrics, reach, engage-

ment, sentiment, customer service• Event correlation to store traffic• Twitter “BUY” button—extending e-commerce

to social media for a new genre of retailers• Ad targeting to determine how to increase the

efficiency of your ad campaigns• Comparison shopping services such as

Connexity (previously Shopzilla), a marketing platform offering search, display and insight solutions based on its unique retail data and advanced analysis.

4 Operational Examples• Competitive pricing, instant price adjustments• Risk modeling to better understand your

customers and markets• Merchandizing and supply chain optimization• Threat analysis to detect threats and fraudu-

lent activity• Additional use cases: distribution control, rate

of sale, distribution of inventory, predictive analytics using historical data

• Churn detection to identify customers who are most likely to defect to a competitor

How Big Data is Being Used in Retail

Hadoop-based solution architecture designed for providing churn analytics

Key components of the Dell Big Data Marketing Churn Analytics Solution

Page 5: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

5

The 360-degree customer view demonstrates that retailers can get a complete view of customers by aggregating data from the various touch points in which consumers interact with retailers. One way to think of this customer view is the “what,” “how” and “why” that makes up this perspective. To gain a 360-degree view, the retailer now needs to sift through and analyze mountains of structured, unstructured, and semi-structured data from on-premises and off-premises sources (the “what” part of the view) to understand customer behavior and patterns. We can now gain familiarity with customers through much of that unstructured data—we not only have their name, address, zip code, age, but also have their buying habits, their sentiments toward companies, search history, product preferences, and emotional responses, and more—details based on social media data, sensor data, etc. Additional insights can be gleaned from this unstructured data when you begin to recognize and track the importance of the nuances and the details.

As a quick aside, semi-structured data is a class of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Unstructured data can be thought of as a hybrid form that can be transformed to relationally structured data, but it can equally be loaded directly into Hadoop HDFS where it can be processed in raw form.

For the “how” and “why” of the equation, retailers are uniquely positioned to clearly define the use cases that enable them to take advantage of data—click stream data, internal structured customer databases that can be used for modeling, combined with like datasets that are available to the public and that can enable data-driven decisions for staying relevant in front of their customers. Retailers can use their 360° view of the customer to help drive bottom-line results, e.g. build opportunities for cross-sell/up-sell, deliver point-of-sale coupons so that customers don’t abandon carts, focus on the importance of personalized communications and more.

Some retail devices collect structured data that retailers may or may not be using. Sales and inventory data are naturally always tracked, but it can be surprising how many retailers don’t use the data they collect from loyalty programs for a lack of a way to correlate it to anything. Fortunately, there have been some exemplary success stories like the case where the use of loyalty cards potentially saved lives—a product recall triggered phone call and e-mail alerts directed to the purchasers of the item.

Big data also includes unstructured data that is variable in nature and comes in many formats, including text, document, image, video, and more. This unstructured data is growing faster than structured data. According to a 2011 IDC study, it will account for 90 percent of all data created in the next decade. As a new, relatively untapped source of insight, unstructured data analytics can reveal important interrelationships that were previously difficult or impossible to determine. In retail this could be a chance to see why a sale didn’t occur—whether it was product selection, pricing, store display, or ineffective promotional material. It could also point to new ways to attract and keep customers, as well as move product.

Retailers have been watching their transactional data for many years, the next step is incorporating unstructured data like social media data along with sentiment analysis. This has been a natural progression for retail. Retail companies have been collecting data in a structured manner— transactional data—but now they’re building systems that use disparate data sources including other types of structured data such as sensor data. This is where sentiment analysis comes into play —a benefit afforded by unstructured data which doesn’t fit into a traditional relational database. As one example, retailers are employing conversion rate optimization (CRO) technology using social intelligence by tracking conversions at scale via social media (unstructured data sets).

360-Degree View of the Customer

According to a 2011 IDC study, unstruc- tured data will account for 90 percent of all data created in the next decade.

Page 6: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

6

To compete in the age of the Internet storefront, large retailers are working to extend their early lead in adopting big data technology by utilizing scalable data management systems that integrate online and offline data so they can better understand their customers and improve the efficiency of their operations. In particular, retailers need to connect and process data in many formats from disparate systems and sources, including the social media sites that consumers interact with. Many retailers are exploring the opportunities surrounding the dominant entries in the distributed processing architectures: Apache Hadoop and Spark and are looking for ways to get started with these architectures. Even though Hadoop has been in the market since 2011, it is still a relatively new technology. Spark, on the other hand, can be considered the future of distributed processing architectures—a needs-based solution, i.e. if there is a need, a retailer will come up to speed very quickly to take advantage of the technology available.

Apache HadoopThe Hadoop data storage and processing system offers compelling benefits for retail organizations that want to extract value out of huge amounts of structured, unstructured and semi-structured data. With Hadoop, you can use and store any kind of data, from any source, in native format, and perform a wide variety of analyses and transformations on that data.

The seeds of Hadoop began with Doug Cutting and Mike Cafarella in 2002 with their Nutch project. In 2006, Cutting went to work at Yahoo and morphed the storage and processing parts of Nutch into Hadoop (named after Cutting’s son’s stuffed elephant) to capture and analyze the massive amounts of data generated by the company. Today, retail organizations can leverage the experience of these digital leaders by deploying the same platform in the retail environment. Even better, Hadoop allows you to start small and scale your solution to terabytes of data, or even petabytes, inexpensively.

Big Data technologies such as Hadoop (an open-source programming framework that supports the processing of large data sets in a distributed computing environment) are ideally suited to collecting and analyzing unstructured data types like the many uses in retail including web logs that show the movements of every customer though an online storefront. This data can then be combined with existing business intelligence and sales data to provide new insights.

Hadoop allows your organization to store petabytes, and even exabytes, of data cost-effectively. As the amount of data in a cluster grows, you can add new servers with local storage incrementally and inexpensively. Thanks to the way Hadoop takes advantage of the parallel processing power of the servers in the cluster, a 100-node Hadoop instance can answer questions on 100 terabytes of data just as quickly as a 10-node instance can answer questions on 10 terabytes.

Both robust and reliable, Hadoop handles hardware and system failures automatically, without losing data or interrupting data analyses. Better still, Hadoop runs on clusters of industry-standard servers. Each has local CPU and storage resources, and each has the flexibility to be configured with the proper balance of CPU, memory, and drive capacity to meet your specific performance needs.

Ultimately, Hadoop makes it possible to conduct the types of analysis that would be impractical or even impossible using virtually any other database or data warehouse. Along the way, Hadoop helps your retail organization reduce costs and extract more value from your data.

Ultimately, Hadoop makes it possible to conduct the types of analysis that would be impractical or even impossible using virtually any other database or data warehouse.

Retail Gains with Distributed Systems

Page 7: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

7

Apache SparkSpark is the latest distributed compute framework to come out to serve the big data community. Spark compliments Hadoop on many levels, but they do not perform exactly the same tasks. Most environments will support both and they are often used together. Spark is reported to process up to 100 times faster than Hadoop in certain circumstances, but it does not provide its own distributed storage system.

Many big data projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics applications can make use of data stored using the Hadoop Distributed File System (HDFS).

Distributed storage like HDFS is fundamental to big data deployments as it allows vast multi-petabyte data sets to be stored across many servers with direct attached storage, rather than involving costly custom hardware which would hold it all on one device. These systems are scalable, meaning that more drives can be added to the network as the dataset grows in size.

Spark has the edge over Hadoop in terms of speed and usability. Spark handles most of its computation “in-memory”—copying them from the distributed physical storage into far faster logical RAM memory and the ability to execute complex workflows. This reduces the amount of time consumed writing and reading to and from slow, high-latency mechanical

hard drives that needs to be done under Hadoop’s MapReduce system.

MapReduce writes all of the data back to the physical storage medium after each operation. This was originally done to ensure a full recovery could be made in case something goes wrong—as data held electronically in RAM is more volatile than that stored magnetically on disks. However Spark arranges data in what are known as Resilient Distributed Datasets, which can be recovered following failure.

Spark’s functionality for handling advanced data processing tasks such as near real-time stream processing and machine learning is way ahead of what is possible with Hadoop alone. Spark streaming is probably the hottest thing in the big data world right now because of the demand for “near real-time” analytics. With Spark-based near real-time analytics—retailers are looking for immediate response to customer needs for efficient recommendation engine processing. Real-time processing means that data can be fed into an analytical application the moment it is captured, and insights immediately fed back to the user through a dashboard, to allow action to be taken. This allows customer to process and analyze data in a short window of time on a continuous basis.

From a 2015 Apache Spark survey by Typesafe.

Page 8: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

8

This section highlights a number of important technologies that enable big data solutions for the retail industry. Specifically, Dell, together with strategic partners who include Cloudera, Intel and others, offers a focused big data and analytics portfolio to help you on every step of your journey. This end-to-end portfolio includes an unprecedented lineup of solutions and tools for advanced analytics, data integration, and data management—bringing together all the technology components your retail organization needs to gain a 360-degree view of customers.

Hadoop SolutionsHadoop addresses many challenges associated with storing, managing and processing large amounts of data in diverse formats—structured, unstructured and semi-structured. A shortlist of some of the more common operational uses for the Apache Hadoop platform for companies in the retail industry includes: ETL offload, active archive, log aggregation, price optimization and agile data mining.

Retail organizations that are gathering insights from vast data volumes and varied data types find that managing large volumes of unstructured data exceeds the capacity and capabilities of traditional data intelligence systems. These systems were specifically designed for structured data types from sources such as relational databases. Gathering data intelligence and developing perspectives from extremely large amounts of data requires a scalable system that can process multi-structured data volumes quickly and responsively and that can easily scale to manage growing data volumes.

In order to answer these needs and to address the explosive growth in data volumes and complexity, organizations of all sizes are turning to the open source Hadoop platform to store, process and generate value from their data stores. Hadoop solutions are not just about being able to capture data but also about being able to work with the many new and different varieties of unstructured data—social media data, sensor data, machine generated data and more. There are many advantages to using Hadoop, particularly in scalability, flexibility and economics. But as with any open source technology, it presents a unique set of challenges when deployed into production.

Cloudera created its enterprise distribution of Hadoop (CDH) for this very purpose; to remove the uncertainty and barriers that may dissuade an organization from deploying open source Hadoop into production processes.

In 2011, Dell, together with Cloudera and Intel, delivered their first tested and validated Hadoop Reference Architecture, the Dell | Cloudera Apache Hadoop Solution, accelerated by Intel. This end-to-end package delivers the core elements of Hadoop, including scalable systems and distributed computing, within a turnkey solution based on Cloudera Enterprise software and Dell hardware with Intel Xeon processors.

In order to allow retailers to realize the benefits of the Hadoop architecture, Dell QuickStart for Cloudera Hadoop is an all-in-one system designed to reduce the complexity of deploying, configuring, and managing Hadoop systems. The solution includes the hardware, software and services needed to deliver a Hadoop cluster that will start organizations on a proof of concept to begin working with big data.

Big Data Solutions for Retail

Dell QuickStart for Cloudera Hadoop is an all-in-one system designed to reduce the complexity of deploying, configuring, and managing Hadoop systems.

Hadoop solutions are not just about being able to capture data but also about being able to work with the many new and different varieties of unstructured data—social media data, sensor data, machine generated data and more.

Page 9: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

9

Dell QuickStart for Cloudera Hadoop enables organizations to quickly engage in Hadoop testing, development and proof of concept work. Through the combination of Dell Intel-based PowerEdge servers, Cloudera Enterprise Basic Edition, Dell Networking and Dell Services, organizations can quickly deploy Hadoop and enable development and application teams to test business processes, data analysis methodologies and operational needs against a fully functioning Hadoop cluster. With the added flexibility of the Dell Professional Services, you can choose the right combination of training, installation and application development that is right for your organization.

Dell QuickStart for Cloudera Hadoop is deployed as a packaged and supported solution with the option of the exploration of Hadoop software via a Dell Solution Center and on-premises work with a fully functioning Hadoop environment via the Dell Hadoop Pod Loaner Program.

To enable fast analytics and stream processing, another big data solution—the Dell In-Memory Appliance for Cloudera Enterprise—is bundled with Cloudera Enterprise, which includes Apache Spark, an open source parallel data processing framework.

With its appliance-based approach, the Dell solution simplifies and accelerates the otherwise complex

process of creating large cluster deployments. Rather than focusing on building and deploying an analytics platform, your IT team can now spend more time helping the business gain fast, critical insights from huge amounts of data.

ETL Offload Reference ArchitectureThe Dell | Cloudera | Syncsort Data Warehouse Optimization—ETL Offload Reference Architecture (RA), accelerated by Intel, serves to augment your enterprise data warehouse (EDW) by providing the means for running ETL jobs in Cloudera Enterprise with Syncsort DMX-h software. The solution makes it easy to build and deploy ETL jobs in Hadoop. Dell’s value has been validated by Principled Technologies, their study highlights how customers can save $425,972 over 3 years, run ETL jobs 60% faster, and give the business back 4 days all with entry Hadoop level expertise.

High-level view of the Dell Cloudera Apache Hadoop solution

Rather than focusing on building and deploying an analytics platform, your IT team can now spend more time helping the business gain fast, critical insights from huge amounts of data.

Page 10: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

10

The ETL process can create bottlenecks in EDWs. A few heavy jobs can bog down an enterprise data warehouse, and more processing means less query capacity. This processing work can be offloaded to Hadoop to reduce CPU utilization for heavy jobs and to accelerate complex ETL processes. The goal here isn’t to replace your EDW but rather to augment it by moving certain data, workloads and processes from your existing systems into Hadoop to gain new capabilities and cost economies.

Syncsort’s high-performance ETL software enables your users to maximize the benefits of MapReduce. Syncsort software enables faster time to value by reducing the need to develop expertise on Pig, Hive and Sqoop, or other technologies that are essential for creating ETL jobs in MapReduce.

Dell Statistica’s Analytics platform extends the Statistica portfolio as a content mining and analytics solution with the ability to transform complex and time-consuming manipulation of web-scale data resources into a fast and intuitive process. Features including advanced natural language processing (NLP), entity extraction, interactive visualizations and dashboards, and the capability to create advanced analytic models and distribute them across Hadoop, databases and database appliances.

Statistica provides the ability to harvest sentiments from unstructured data such as Twitter feeds, blogs, news reports, CRM systems, and other sources, and combine them with additional data, including demographic and regional data, to better understand market traction and opportunities in the retail space.

Statistica model development and deployment to Hadoop “data lakes”—allows for the gain of valuable insights by bringing advanced analytics to full volume data where it is stored.

This big data analytics solution also provides excellent performance and scalability by leveraging next-generation technologies like Hadoop, Lucene/SOLR search, Mahout machine learning and interactive visualization. As one use case example, Statistica is used by Dell Global Analytics (DGA) to help over 500 internal clients improve customer acquisition and retention, identify up-sell and cross-sell opportunities, increase revenue and more.

For the data experts, this solution provides:

• Out-of-the-box structured and unstructured analytics

• Drag-and-drop creation of analytic workflows• Hadoop-enabled for big data scalability• Functional widgets that can be configured for

individual analytic needs• Open-source search indexing of complex,

faceted metadata

Statistica Big Data Analytics enables enterprises of all types to more efficiently and effectively process all data.

Statistica Big Data Analytics enables enterprises of all types to more efficiently and effectively process all data.

Page 11: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

11

Some early Dell customers started using big data technology solutions back in 2011 when they came looking for recommendation engine solutions, e.g. the kind of system pioneered by Amazon. The recommendation engine was a seamless way for retailers to start seeing the benefits of big data. Once they saw the value and the ease of Hadoop once the platform had been implemented, the use of the big data technology stack grew from there.

One example is a large retailer who came to Dell for assistance with a recommendation engine project. They collaborated closely on their proof of concept with the Dell Big Data Specialists in the Dell Solution Center, and proceeded to build an 8 node cluster that grew to over 300 nodes in a 3 year period. Once they got their feet wet and saw the results, they moved on to other solutions like an ETL offload mainframe replacement. The company’s next steps with Hadoop were:

• Taking all their data sources and started to use them for their analytics

• Supply chain analysis• Central data repository• Price setting • Logistics planning

Dell Big Data in Action Recommendation EngineSolution went from a coordinated Proof of Concept with an 8-node Hadoop cluster to over 300 nodes and robust solutions for saving time and money.

ChallengesNeeded much more cost-effective ways to derive intelligence from enormous data stores while radically reducing costs.

Implementations• Recommendation engine• ETL mainframe replacement• Supply chain analysis• Central data repository• Price setting• Logistics planning

Results from Hadoop Solution• Reduced costs per millions of instructions

per second (MIPS) up to 160 times• Cut batch processing time from over

20 hours to less than an hour• Improved decision support by providing

better data faster• Offers rapid and cost-effective scalability

Hadoop Use Case Large Retailer

Spark Streaming enables scalable, high-throughput, fault-tolerant stream processing of live data streams

The Future of Big Data and Retail

Case Studies: Dell Focused Customer Use Cases

The future of big data and the retail industry is very promising with technology taking a strategic lead for maximizing competitive advantage. Let’s consider a couple of chief inflection points that the future might hold. First, the Internet of Things (IoT) will play an important role in terms of how sensor data is writing the next chapter in the retail & big data story. As one example, say you’re walking down the aisle at a retail store and an embedded sensor detects that you have the retailer’s app installed on your cell phone and it offers you the immediate gratification of a coupon.

Next, consider the increasing importance of real-time analytics through use of Spark streaming. This

is new technology and still a bit cumbersome to deploy so it’s not being as widely adopted as many might think but it is coming on strong. Real-time analytics represents a tremendous opportunity for retailers who are building their business and retaining their customers.

Case Study Large Retailer

Page 12: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

12

MetaScale provides Hadoop big data solutions, training and support—partnering with Dell—to help clients speed processing, improve decision support and realize more cost-effective ways to derive intelligence from enormous data stores while radically reducing costs.

MetaScale’s first client was one of North America’s largest retail groups, with many thousands of stores and tens of billions of dollars in annual sales. Its daily transactional and operational data volume is several terabytes, generated by millions of shoppers as well as many supply chains. In all, its current data volume exceeds 3 petabytes. The high level goal of the project was avoid the situation where its data management, analysis and reporting capabilities were quickly falling behind the pace of growth of its data.

As one example of where the big data project succeeded, the client company’s pricing business unit needed daily summary reports using data from multiple platforms to measure the effectiveness of its pricing in stores. It was generating these reports from its data warehouse and predictive analytics tools, both hosted on mainframes. But the

mainframes’ batch processing required for these reports was taking 10 to 15 hours, so the company could manage only weekly data warehouse loads. As a result, the pricing business unit wasn’t getting the decision support needed to fine-tune in-store pricing.

MetaScale’s approach to solve big data problems for its client was to deploy a Hadoop solution powered by MetaScale big data appliances based on Dell PowerEdge servers. MetaScale worked closely with Dell to develop its bundled Hadoop solutions to meet its client’s growing needs for performance, scalability and support. MetaScale’s big data appliances arrange the server nodes into logical clusters for handling large-scale data sets cost-effectively. The retail client’s Hadoop solution had a cluster with more than 500 server nodes, not counting its backup cluster.

In this particular client’s Hadoop implementation, MetaScale helped its client achieve an ROI in just three months after becoming operational with a cluster that, at the time, harnessed 50 nodes of Dell servers to manage its growing data.

Case Study MetaScale

Case Study StaplesTurning unfiltered feedback into actionable insightFortune 500 office and school supplies retailer Staples increased its brand recognition and boosted staff efficiency by improving social media listening and analysis, and reducing “noise” (i.e. irrelevant data) by 75 percent. Staples needed to reduce the noise it collected from social media channels to understand customers’ likes and dislikes more quickly, and improve its offerings. As more people engage on social media channels such as Twitter, Facebook, LinkedIn and others, Staples realized that it needed a better solution to sift through the increasing volume of public data to find tactical information.

The company engaged Dell Services to deploy and manage a cloud-based social media listening and

analysis service centered on big data technology. The solution served to pinpoint relevant unstructured data from social media insights faster, increase customer communication, amplify customers’ voice in day-to-day decisions, and improve corporate agility and offerings.

The results of the project were widespread. Marketing managers could quickly gauge whether media campaigns were effective and worth the investment. The website team could learn if customers were able to easily find the products and information they were looking for on Staples.com. Plus, executives would be able to easily see which store policies and processes are working, and which ones need improvement.

Page 13: InsideBIGDATA Guide to Retail - EM360...Guide to Retail 2 Few industries have greater access to data around consumers, products, and channels than the retail industry. Data coupled

www.insidebigdata.com | 508-259-8570 | [email protected]

Guide to Retail

13

In conclusion, the retail experience has changed dramatically in recent years as there has been a power shift over to consumers. Shoppers can easily find and compare products from an array of devices, even while walking through a store. They can share their opinions about retailers and products through social media and influence other prospective customers.

To compete in this new multi-channel environment, we’ve seen in this guide how retailers have to adopt new and innovative strategies to attract and retain customers. Big data technologies, specifically Hadoop, enable retailers to connect with customers through multiple channels at an entirely new level by harnessing the vast volumes of new data available today. Hadoop helps retailers store, transform, integrate and analyze a wide variety of online and offline customer data—POS transactions, e-commerce transactions, clickstream data, email, social media, sensor data and call center records—all in one central repository.

Retailers can analyze this data to generate insights about individual consumer preferences and behaviors, and offer personalized recommenda-tions in near real-time. Key to this is the ability to optimize merchandise selections and pricing that are tailored to a consumer’s likes and dislikes.

Retailers can improve sales opportunities as well as customer satisfaction by integrating all relevant customer data across all data sources into one single view. Hadoop can be used to combine these various sources of data into one repository and obtain that single, 360 degree view. Customers use email, chat and social media to communicate on an everyday basis. Getting a handle on not just their prior transactions but also likes and dislikes about their experiences is critical. Customer retention is a key metric for retailers, given the cost of acquiring new customers.

According to CMAC, McKinsey & Company’s Consumer Marketing Analytics Center, big data holds the promise of big benefits for retailers. While data-driven retailing was long a matter of reaching as many shoppers as possible, today’s richer insights allow for activities that are reliably

relevant to the target group and trigger quantifi-able responses. As a result, it is estimated that big data will have a number of significant benefits moving forward:

• Big data has the potential to increase retailer margins by 60%

• Big data is expected to drive up to 1% annual productivity growth in U.S. retail

• 50% of social media users indicated shopping-related information is especially important to them

• Social and mobile are just the most recent additions to a deep and growing pool of diverse data sources

• Real-time updates generate terabytes of data, calling for real-time actions

• Global data generation in retail is expected to grow at a rate of 60% annually

Summary

Hadoop helps retailers store, transform, integrate and analyze a wide variety of online and offline customer data—POS transactions, e-commerce transactions, click-stream data, email, social media, sensor data and call center records—all in one central repository.

© 2015 Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge are trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. This document is for informational purposes only. Dell reserves the right to make changes without further notice to the products herein. The content provided is as-is and without expressed or implied warranties of any kind.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.


Recommended