Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | anonymous-dlef3gv |
View: | 219 times |
Download: | 0 times |
of 11
7/27/2019 Big Data Analysis Guide
1/11
Big Data Analysis
7/27/2019 Big Data Analysis Guide
2/11
2
Big Data Overview (1/2)
Big data refers to datasets whose sizes are beyond
the ability of typical database software tools to
capture, store, manage and analyze.
A primary goal for looking at big data is to discover
repeatable business patterns.
It has many additional uses, including real-time fraud
detection, web display advertising and competitive
analysis, call center optimization, social media and
sentiment analysis, intelligent traffic management,
and smart power grids.
Big data analytics is often associated with cloud
computing because the analysis of large data
sets in real-time requires a framework
like MapReduce to distribute the work among tens,
hundreds or even thousands of computers.
As technology advances over time, the size of
datasets that qualify as big data will also increase andbig data is expected to play a significant economic
role to benefit not only private commerce but also
national economies and their citizens.
Big data involves more than simply the ability to
handle large volumes of data. Instead, it represents a
wide range of new analytical technologies and
business possibilities.
Big data is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates.
Its the data that would take too much time and cost too much money to load into a relational database for analysis.
Source:http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data, McKinsey Big Data Report, BI Research Using Big Data for Smarter Decision Making
Big Data Can Generate Significant Financial Value Across
Sectors
http://www.caseware.com/products/ideahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://searchcloudcomputing.techtarget.com/definition/big-data-Big-Datahttp://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
3/11
3
Big Data Overview (2/2)
Three Vs of Big Data
The three Vs of big data (volume, variety andvelocity) constitute a comprehensive definition.
Each of the three Vs has its own ramifications for
analytics.
Data volume is the primary attribute of big data
Big data can also be quantified by counting records, transactions,
tables or files. Some organizations find it more useful to quantify big
data in terms of time. For example, due to the seven-year statute of
limitations in the U.S., many firms prefer to keep seven years of data
available for risk, compliance and legal analysis.
The scope of big data affects its quantification, too. For example, in
many organizations, the data collected for general data warehousing
differs from data collected specifically for analytics.
Data variety comes from a greater variety of sources Big data comes from a variety of sources, including logs, click streams,
social media, radio-frequency identification (RFID) data from supply
chain applications, text data from call center applications, semi-
structured data from various business-to-business processes, and
geospatial data in logistics.
The recent tapping of these sources for analytics means that so-called
structured data is now joined by unstructured data (text and human
language) and semi-structured data (XML, RSS feeds).
Data feed velocity as a defining attribute of big data
The collection of big data in real time isnt new; many firms have been
collecting click stream data from the web for years, using streaming
data to make purchase recommendations to web visitors.
Even more challenging, the analytics that go with streaming data have
to make sense of the data and possibly take actionall in real time.
Source:TWDI Research report on Big Data Analytics
http://www.caseware.com/products/ideahttp://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
4/11
4
Big Data Future
International Data Corporation (IDC) released a worldwide big data technology and services forecast report based on a survey in
March 2012. As per the survey:
The big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. This represents a compound annual
growth rate (CAGR) of 40% or about seven times that of the overall information and communications technology (ICT) market.
The big data market is expanding rapidly and for technology buyers, opportunities exist to use big data technology to improve
operational efficiency and to drive innovation.
There are also big data opportunities for both large IT vendors and start ups. Major IT vendors are offering both database
solutions and configurations supporting big data by evolving their own products as well as by acquisition. At the same time, more
than half a billion dollars in venture capital has been invested in new big data technology.
While the five-year CAGR for the worldwide market is expected to be nearly 40%, the growth of individual segments varies from27.3% for servers and 34.2% for software to 61.4% for storage.
The growth in appliances, cloud, and outsourcing deals for big data technology will likely mean that over time, end users will pay
increasingly less attention to technology capabilities and will focus instead on the business value arguments. System
performance, availability, security and manageability will all matter greatly; however, how they are achieved will be less of a point
for differentiation among vendors.
There is a shortage of trained big data technology experts, in addition to a shortage of analytics experts. This labor supply
constraint will act as an inhibitor of adoption and use of big data technologies, and it will also encourage vendors to deliver big
data technologies as cloud-based solutions.
While software and services make up the bulk of the market opportunity, through 2015, infrastructure technology for big data
deployments is expected to grow slightly faster at 44% CAGR. Storage, in particular, shows the strongest growth opportunity,
growing at 61.4% CAGR through 2015.
Source:http://www.idc.com/getdoc.jsp?containerId=prUS23355112
IDC defines big data technologies as a new generation of technologies and architectures designed to extract value economically
from very large volumes of a wide variety of data by enabling high-velocity capture, discovery and/or analysis.
http://www.caseware.com/products/ideahttp://www.idc.com/getdoc.jsp?containerId=prUS23355112http://www.idc.com/getdoc.jsp?containerId=prUS23355112http://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
5/11
5
Big Data Risks/Challenges
Big data is complex because of the variety of data it encompasses from structured data, such as transactions one makes or
measurements one calculates and stores, to unstructured data such as text conversations, multimedia presentations and videostreams.
Big data presents a number of challenges relating to its complexity:
One challenge is how one can understand and use big data when it comes in an unstructured format, such as text or video.
Another challenge is how one can capture the most important data as it happens and deliver that to the right people in real-
time.
A third challenge is how one can store the data and analyze and understand it given its size and the computational capacity.
Big data also poses security and privacy risks for a large amount of data stored in data warehouses, centralized in a singlerepository.
Big data and extreme workloads require optimized hardware and software. The main challenges of big data and extreme
workloads are data variety and volume, and analytical workload complexity and agility.
Many organizations are struggling to deal with increasing data volumes, and big data simply makes the problem worse. To solve
this problem, organizations need to reduce the amount of data being stored and exploit new storage technologies that improve
performance and storage utilization.
Big datas increasing economic importance also raises a number of legal issues, especially when coupled with the fact that data is
fundamentally different from many other assets. For example, one piece of data can be copied perfectly and easily combined
with other data. The same piece of data can be used simultaneously by more than one person.
Sectors with a relative lack of competitive intensity and performance transparency, along with industries where profit pools are
highly concentrated, are likely to be slow to fully leverage the benefits of big data.
Source:BI Research Using Big Data for Smarter Decision Making, http://spotfireblog.tibco.com/?p=6793,
https://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroom
http://www.caseware.com/products/ideahttp://spotfireblog.tibco.com/?p=6793https://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroomhttps://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroomhttp://spotfireblog.tibco.com/?p=6793http://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
6/11
6
Big Data Importance
Creating transparency
Making big data more easily accessible to relevant stakeholders in a timely manner can create tremendous
value. In the public sector, making relevant data more readily accessible across otherwise separated
departments can sharply reduce search and processing time.
Enabling experimentation to
discover needs
As more transactional data is created and stored in digital form, organizations can collect more accurate and
detailed performance data on everything from product inventories to personnel sick days. Using data to
analyze variability in performance is generated by controlled experiments.
Segmenting populations tocustomize actions
Big data allows organizations to create highly specific segmentations and to tailor products and services
precisely to meet those needs. This approach is well-known in marketing and risk management but can be
revolutionary elsewhere.
Replacing human decision
making with automated
algorithms
Sophisticated analytics can substantially improve decision making, minimize risks and unearth valuable
insights that would otherwise remain hidden. Such analytics have applications for organizations from tax
agencies that can use automated risk engines to flag candidates for further examination.
Innovating new business
models, products and
services
Big data enables companies to create new products and services, enhance existing ones, and invent entirely
new business models. Manufacturers are using data obtained from the use of actual products to improve the
development of the next generation of products and to create innovative after-sales service offerings.
Source:McKinsey Big Data Report
http://www.caseware.com/products/ideahttp://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
7/11
7
Big Data Vendors
2012 Big Data Pure-Play Vendors, Yearly Big Data Revenue (in $US Million)
In the current market, big data pure-play vendors account for $300 million in big data-related revenue. Despite their relatively
small percentage of current overall revenue (approximately 5%), big data pure-play vendors (such as Vertica, Splunk and
Cloudera) are responsible for the vast majority of new innovations and modern approaches to data management and analytics
that have emerged over the last several years and made big data the hottest sector in IT.
Source:http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/
http://www.caseware.com/products/ideahttp://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/http://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
8/11
8
Big Data Trends
The McKinsey Global Institute estimated that enterprises globally stored
more than seven exabytes of new data on disk drives in 2010, while
consumers stored more than six exabytes of new data on devices such as
PCs and notebooks.
Big data has now reached every sector in the global economy. In total,
European organizations have about 70% of the storage capacity of the
entire United States at almost 11 exabytes.
The possibilities of big data continue to evolve rapidly, driven by innovation
in the underlying technologies, platforms and analytic capabilities for
handling data, as well as the evolution of behavior among its users as more
and more individuals live digital lives.
The use of big data is becoming a key way for leading companies to
outperform their peers. McKinsey estimated that a retailer embracing big
data has the potential to increase its operating margin by more than 60%.
The increasing use of multimedia in sectors, including health care and
consumer-facing industries, has contributed significantly to the growth of
big data and will continue to do so.
The surge in the use of social media is producing its own stream of new
data. While social networks dominate the communications portfolios of
younger users, older users are adopting them at an even more rapid pace.
Source:McKinsey Big Data Report
http://www.caseware.com/products/ideahttp://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
9/11
9
Big Data Examples
Big data includes web logs, RFID, sensor networks, social networks, social data, Internet text and documents, Internet search
indexing, call detail records, complex and/or interdisciplinary scientific research, military surveillance, medical records, photography
archives, video archives, and large-scale e-commerce.
Examples of Companies Using Big Data:
IBM has formed a partnership with the Netherlands Institute for Radio Astronomy (ASTRON) for the DOME Project, which
provided support in developing the tools needed to crunch the data for the ambitious international Square Kilometer Array (SKA)
radio telescope.
San Francisco-based SeeChange Companyoffered a better way of designing health insurance plans with what it calls value-
based benefits.The company used a substantial amount of data gleaned from personal health records, claims databases, lab
feeds and pharmacy data to identify patients with chronic illnesses who would benefit from a customized compliance program.
Boston-based company Humedica combined its data analytics with a real-time clinical surveillance and decision support system.
The company also sells its detailed clinical spending data to life sciences companies, with the idea that customers will use it to
quantify patient populations, market share and market opportunities.
Castlight Health aimed to push transparency in healthcare pricing by offering consumers a search engine to find prices of
healthcare services. Castlights technology allowed consumers to run side-by-side comparisons of out-of-pocket medical
expenses. Armed with prices, consumers will then shop for bargains, limiting the growth of healthcare costs.
Cleveland-based Explorys has started a Google-like service that helps clinicians analyze real-time information culled from trovesof electronic medical records (EMRs), financial records and other data. The idea is that medical researchers can mine the vast
amounts of data to learn how variations in treatment can affect outcomes, uncovering best practices to enhance patient care and
lower costs.
Apixiostechnology brings together data from structured sources l ike EMRs with unstructured data, such as a physicians patient
encounter notes. The companys software uses natural language processing technology to interpret clinicians free-text searches
and return the most relevant results.
Source:http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/, http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-
healthcare-problems/
http://www.caseware.com/products/ideahttp://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solve-healthcare-problems/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/http://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
10/11
10
Role of Internal Audit in Managing Big Data Case Study
Check the extent of data assets and deep dive into what all is available. Data that is redundant or unimportant may be
identified and reduced.
To manage data holdings effectively, an organization must first be aware of the location, condition and value of its research assets.
Conducting a data audit provides this information, raising awareness of collection strengths and identifying weaknesses in data
policies and management procedures.
The benefits of conducting an audit for managing big data effectively are:
Monitor holdings and avoid big data leaks. Data hacking, social engineering and data leaks are all concepts that plague
a company an audit can help a company identify areas where there is a possibility of leakage.
Manage risks associated with big data loss and irretrievability. Data which is not structured and is lying untouched
may never be retrieved; an audit can help identify such cases.
Develop a big data strategy and implement robust big data policies. Big data requires robust management and proper
structurization.
Improve workflows and benefit from efficiency savings. Check where there are complex and time-consumingworkflows and where there is a scope of improving efficiencies.
Realize the value of big data through improved access and reuse to check if there are areas that have not been used in
a while.
Source:http://www.data-audit.eu/docs/DAF_briefing_paper.pdf
http://www.caseware.com/products/ideahttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.data-audit.eu/docs/DAF_briefing_paper.pdfhttp://www.caseware.com/products/idea7/27/2019 Big Data Analysis Guide
11/11
11
Complex Big Data
Big Data Security
Big Data
Accessibility
Big Data
Quality
Big Data
Understanding
Managing Big Data Through Internal Audit
Most companies collect large volumes of data but they dont have comprehensive approaches for
centralizing the information. Internal audit can help companies manage big data by streamlining and
collating data effectively.
Following are issues of big data that internal audit can help mitigate:
Maintaining effective data security is increasingly recognized as a critical risk area for organizations. Loss of
control over data security can have severe ramifications for an organization, including regulatory penalties,
loss of reputation, and damage to business operations and profitability. Auditing can help organizations
secure and control data collected.
Giving access to big data to the right person at the right time is another challenge organizations face.
Segregation of duties (SoD) is an important aspect that can be checked by an IA.
The more data one accumulates, the harder it is to keep everything consistent and correct. Internal audit
can check the quality of big data.
Understanding and interpretation of big data remains one of the primary concerns for many organizations.
Auditors can effectively simplify an organizations data effectively.
Source:http://www.acl.com/pdfs/wp_AA_Best_Practices.pdf, http://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-data
http://www.caseware.com/products/ideahttp://www.acl.com/pdfs/wp_AA_Best_Practices.pdfhttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-datahttp://www.acl.com/pdfs/wp_AA_Best_Practices.pdfhttp://www.caseware.com/products/idea