Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | martin-armstrong |
View: | 218 times |
Download: | 0 times |
Understanding Big Data
By: Paul Kenosky
What will be covered!
Big Data Define Big Data Challenges Increase in Technology Characteristics of Big Data Fraud Detection Social Media Hadoop BigInsight
Define
Understanding Big Data Big Data applies to information that cant
be processed or analyzed using traditional processes or tools.
Wiki Big data is the term for a collection of
data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
Challenges
Business face big data challenges more and more in today's world
They are overloaded with information that can be beneficial to the organization
However they do not know how to make use of the raw and unstructured data
Big Data Technology
Interconnectivity: More and more systems, people, and
technology are becoming interconnected Inexpensive
Integrated circuits are continually becoming cheaper to produce and buy
This allows intelligence to be added to many devices that once seemed too costly
Big Data on the rise!
Example railway cars have hundreds of sensors. Sensors can track things such as conditions
experienced by the rail car, the state of individual parts, and GPS based data for shipment
With the rise of technology these rail cars are becoming more advanced and sensors are added to sensor data on parts that are prone to wear, so they can be replaced before they fail
Data is stored on the rails, railroad crossing sensors, weather patterns that cause rail movements, cargo location, cargo arrival, and cargo departure times
Processing all this data using a traditional relational system would be impractical if not impossible
Characteristics of Big data
Volume: Data being stored today is increasing at
an overwhelming number Booking a flight, posting to facebook,
sending a text, and more Variety:
Represents all types of data Velocity:
How quickly data is arriving, stored, and analyzed
Fraud Detection Background
Transactions Online auctions, insurance claims A big data platform can present
opportunities to increases detection success
Patterns of fraud can come and go in hours, days, or weeks.
If fraud detection pattern has a low latency by the time it is discovered the damage is already done
Fraud Detection Questions An estimate of 20% of available information that
could be useful for fraud detection is being used
Why not load the other 80 percent of data into the traditional analytic warehouse? Too expensive
Would it not pay for itself? How can we be sure this new information will be
valuable before making a costly business decision Use BigInsights to provide an elastic and cost-effective
repository to establish what of the remaining 80 percent of the information is useful for fraud modeling.
Fraud Detection
IBM teamed up with a large credit card issuer to improve there fraud detection model.
They discovered they could improve the speed of detection and have more accurate results using the new model
A process that once took three weeks was improved to just a few hours.
They also found that about half of the 80% was actually beneficial information that could be used
The Social Media Pattern
Organizations can use Big Data usage pattern in social media to find out what is being said about the company and competitors
This information can be used to significantly improve decision making
IBM has built a solution to accelerate an organization usage called Cognos Consumer Insights (CCI)
CCI allows an organization to see what people are saying, how topics are trending in social media, and all sorts of things that affect the organization
Why are they unhappy with my company?
Although you can find out what people are saying, another more important question would be why are they saying and behaving in this way?
An organization needs to look beyond that data to answer the question
Sales, promotions, loyalty programs, merchandising mix, competitor actions, and even weather can come into play.
Example
Company introduced a different kind of packaging for one of its products.
Customers were giving negative feedback on the new packaging
Months later the company discovered the problem and switched the packaging to an eco-friendly package.
This in turn increased sales and customer happiness
Example 2
An author of the book is a prolific facebook poster
Traveling on airlines is essential to his job and after a number of flight delays he posted his frustration with these airlines on his facebook wall
These flight delays were found on his facebook wall by the airline and they contacted him
Although, it doesn't mention what the airlines to did to compensate or fix the problem it does show one thing which is the company where listening
Hadoop
Hadoop is a top level apache project and is open source
Is designed to scan through large data sets to produce its results through a highly scalable, distributed batch processing system
Data is redundantly stored in multiple places across clusters
The programming model is build to expect failures and it will automatically resolve them by running portions of the program on various servers.
Hardware components might fail but due to the redundancy hadoop can provide fault tolerance
InfoSphere BigInsights
Hadoop can be complex to install, configure, and administrate
IBM takes this complexity away with the BigInsight installer
BigInsights makes it simpler for people to use Hadoop and build big data applications.
It enhances this open source technology to withstand the demands of your enterprise, adding administrative, discovery, development, provisioning, and security features, along with best-in-class analytical capabilities from IBM Research.
The result is that you get a more developed and user-friendly solution for complex, large scale analytics.
Special Thanks to
http://www-01.ibm.com/software/data/infosphere/biginsights/index.html
http://en.wikipedia.org/wiki/Big_data http://www.decalsplanet.com/item-1
0485-black-pot-of-gold.html http://drshocker.blogspot.com/2007_
03_01_archive.html http://www.mytinyphone.com/wallpa
per/31448/
https://www.facepunch.com/showthread.php?t=1332655
What Big Data Says About You
Short YouTube video that explains Big Data
Some interesting stories the speaker went over
Extra Story
Bats flying around airports Noise was produced and airports
filtered this noise out Weather patterns Airplane movement
15 years later scientists got together Collecting data on bat migration Throwing this data away
One mans garbage is another mans treasure
Extra Story
Gates foundation Eradicate polio in Nigeria
Satellite maps Found villages no one knew of Government did not know these people where there No maps showed these villages
Gates gave out GPS phones to polio eradication workers
Combining satellites, vaccine, and cell phones is not something that comes to mind when thinking of big data
Problems caused by misinformation or get the information to late
Special Thanks
http://motherboard.vice.com/blog/big-data-explained-brilliantly-in-one-short-video
http://www.netanimations.net/Moving-vampire-bat-and-Dracula-blood-sucking-animations.htm
http://www.nbcnews.com/id/37086846#.Uxd7-YXpbYg
Any Questions?