+ All Categories
Home > Documents > Write Up Raka Bigdata Analytic Telco

Write Up Raka Bigdata Analytic Telco

Date post: 14-Apr-2018
Category:
Upload: rakaangga
View: 220 times
Download: 0 times
Share this document with a friend

of 11

Transcript
  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    1/11

    Warming Up

    This document is an attempt at summarizing the technologies that play part in the (big data) analytics

    ecosystem. I put the word big data between parentheses because I will be focusing on the analyticsaspect.

    Big data is a relatively new (industry) term, but not necessarily a new concept. So if we limit ourunderstanding of big data only to the volume, maybe we're missing the bigger picture (because big

    is relative; our civilization has been dealing with increasingly bigger volume of data. What was big 10

    years ago, is quite likely not so big today). Just a heads up, from this point on, the word by big datasometimes refers to techniques & tools for dealing with big data.

    People coined these Vs (Volume, Velocity, Variety) as a way to characterize big data. I think talkingabout Variety and Velocity is a good / easier way to start a discussion about big data (especially with

    people coming from strong database administration / datawarehousing background). At times I need to

    convince (or doubt) myself about anything, including big data. I found it is easier to convince myself

    about big data if I start from Variety and Velocity.

    Why Variety? Because people (as user of technologies) can easily appreciate the fact that data are

    coming from more and more varied sources, thanks to cheaper sensors, mobility, and hiperconnectivity.Everybody with a cellphone generates data now. Every single activity they do, on various online

    services, through their cellphone generates the so-called data exhaust. Consequently, we are dealing

    with a variety of format of data, from structured to unstructured1.

    At this point I'm skeptical or wondering, how big data plays unique role in this situation? We already

    have and perform Extract Transform Load (ETL; common to datawarehousing practicioners) to address

    that challenge. Right, so maybe we should look at the other aspect to support this argument, velocity.

    Why? Because ETL process implies non-realtime analytics; the data is not processed just-in-time; it hasto go through the transformation stage before it is loaded to datawarehouse, where eventually data is

    picked up to be analyzed.

    Actually there are several articles about datawarehousing in the advent of big data. I was trying to

    understand if they are competing? Or are they complement to each other? I admit I haven't read throughthose articles, so I'll just a put their links for now; one from O'Reilly2, and the other one from Teradata3.

    People are talking about soft real-time analysis of data (or events). There's a phrase for that: Complex

    Event Processing (CEP) platform, which basically is a platform that enables us to observe moving-window of events (taken from continuous stream of data), and do time-series analysis such as pattern-

    matching on that window4. One open-source product for CEP that I know is JBoss Drools Fusion5.

    There's a strong link between Velocity and Variety, provided by Value 6. Fusing data from broad range

    of sources (variety) opens up the risk of having low-value data, data with low signal-to-noise ratio.

    1 http://www.finextra.com/community/FullBlog.aspx?blogid=61292 http://strata.oreilly.com/2011/01/data-warehouse-big-data.html

    3 http://www.teradata.com/white-papers/Hadoop-and-the-Data-Warehouse-When-to-Use-Which/?type=WP4 http://bit.ly/xrLyV1

    5 http://www.jboss.org/drools/drools-fusion.html

    6 http://www.finextra.com/community/fullblog.aspx?blogid=6222

  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    2/11

    Example: facebook / twitter. The need for speed here is to find high-value data among piles of low-

    value data.

    At this point maybe I should stop for a while trying to convince myself about big data. Some practicalhindsights I will gather along the way will clear things up, or provide me with a better / bigger / more

    fundamental questions. Either of them is beneficial. For now I'll just take it as a fact that big data is

    important, it's here, and it's part of continuum of tools we need to employ to bring up intelligence.

    I'll switch back to analytics, but before that I would like to do a round-up of big data tools I've found so

    far. The following table is a summary of tools I'm currently learning to use. They're in my own words,based on my current understanding. So it's not comprehensive and may contain inaccuracies.

    Name Description

    Hadoop Provides a programming framework that implements MapReduce and

    platform for its execution.

    It also provides distributed filesystem smart-enough to ensure minimum

    data-motion7 during parallel the processing of (big) data set spreadacross cluster of machines.

    It splits the dataset such that the task assigned the subset of data

    executes on the machine where the subset of data is located (locally or

    nearby); Map phase. Then it coordinates the aggregating of results ofcomputation collected from the task nodes (during the Reduce phase).

    It is still not clear to me how the process-affinity is achieved8, but looksto me it's through configuration of our hadoop cluster, specifically

    optimized for the application we're working on (knowing the nature of

    data distribution, pipeline of data processing, etc.).

    Apache Pig While Hadoop provides programming library for implementing the map

    and reduce task, Apache Pig projects raise the level abstraction,allowing people to specify those operations in SQL-like syntax. It brings

    productivity / efficienty for the project. Yahoo is said to have 40% to

    60% of its Hadoop workloads implemented in Pig scripts9.

    Apache Mahout A collection of machine-learning algorithms (e.g.: clustering, naivebayes, covering, linear modeling, etc), ready-to-use, for execution over

    Hadoop cluster.

    A big relieve, thanks to Mahout we can save time in the projects tryingto make those standard machine-algorithms paralelizable, using

    MapReduce approach, specifically for execution in Hadoop cluster.

    Drools Fusion A platform & programming libraries for complex event-processing.

    Basically we define some rules where we specify the way we correlate

    an event with past event(s), and make a conclusion based on that. Based

    7 http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-overview/

    8 http://www.plexxi.com/wp-content/uploads/2013/03/Plexxi-Use-Case-Hadoop-March-2013.pdf

    9 http://www.ibm.com/developerworks/library/l-apachepigdataquery/

  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    3/11

    on that conclusion, an action is taken (which we also have to specify /

    program). Obviously the rules will have to be discovered beforehand(e.g.: applying machine-learning techniques over historical records).

    MongoDB One of many NoSQL databases available in the market. NoSQL, a

    catch-all terms for non-relational database (which uses SQL as query

    language). My understanding at this moment is that NoSQL databases

    excels at scalability, by sacrificing the ACID (Atomicity, Consistency,Integrity, and Durability) property we can expect from RDBMS.

    MongoDB is document-oriented database10. Meaning (according towikipedia): designed for storing, retrieving, and managing document-

    oriented information, also known as semi-structured data.. So, I guess

    if what we're capturing & storing looks like documents (e.g.: forms,receipts, articles, etc.), probably we should consider MongoDB.

    The articles / book I've read on MongoDB gave me the impression thatdata-modeling for MongoDB does not put much emphasis on

    normalizing our data (unlike in data modeling for RDBMS, sincenormalization is cornerstone in achieving consistence). Document-

    oriented database seem to be focusing on the speed of retrieval, andhorizontal scaling (denormalized database is easier to be spread across

    machines).

    Others in NoSQL landscape Existing approaches in NoSQL landscape can be categorized as:

    Key/Value stores

    Column-oriented stores

    Document-oriented stores

    Object databases Graph databases

    I believe our choice should be based on the structure (or lack of) of the

    data we're dealing with, the kind of questions we want to get answered,

    and obviously the application requirement.

    Common cliche: NoSQL database is not the right choice for financial

    system. It is a sweeping statement, as if financial system is one big

    single application. Maybe it should be rephrased as: NoSQL, for itslack of support for transaction, should not be used in applications where

    we have to support use-cases like transferring money betweenaccounts. For other application, such as capturing tick data (deals,transfer events, stock price, etc), NoSQL database has its place in

    financial system11.

    Finally, I'd like to refer a book I might recommend later, that can help usunderstand better the nature of several popular NoSQL databases:

    10 http://en.wikipedia.org/wiki/Document-oriented_database

    11 http://www.10gen.com/post/45116404296/how-banks-use-mongodb-as-a-tick-database

  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    4/11

    Seven Databases in Seven Weeks: A Guide to Modern Databases and

    the NoSQL Movement12.

    I think that's quite a solid toolset for big data analytics. I wouldn't spend much more time onphilosophising; not until I get more hindsight from using those tools to crack some problems. What

    problems? Well, one good place to start is Kaggle.com, where we can get some problems (accompanied

    by dataset), and compete.

    12 http://www.amazon.com/Seven-Databases-Weeks-Movement-ebook/dp/B00AYQNR50/ref=tmm_kin_title_0?

    ie=UTF8&qid=1366862920&sr=8-1

  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    5/11

    Wandering

    Among so many challenges that CSP facing these days, we will have a look at two of them, namely

    customer retention and dynamic pricing. How did they end up there? Several factors:

    1. Saturated market.

    2. Number portability.

    3. Inability of CSPs to come up with killer-app(s).

    Factor number 1 and 2 lead to churn prevention. Factor number 3 leads to dynamic pricing. It's only

    logical, with not much room left for expansion, maintaining the customer base becomes a basic

    necessity for survival. With number portability, the situation is even worse for CSP; subscriber can

    switch to another CSP without worrying of losing her current number.

    On to point #3, no killer-app. CSP have been complaining about decreasing ARPU (Average Revenue

    Per-Unit). They can't just hike the price now (after they sacrificed it in the sake of expansion). So they

    look for revenue-generation from VAS (Value-Added Service). The following screenshots show theVAS offered by Telcel.

  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    6/11

    I personally don't find those services valueable enough for me to pay for; I never use them. In the

    last picture, Identificar de Llamadas (description: conoce el numero de quien te llama y/o el nombresi esta en el agenda de tu equipo / know the number of the calling person and / or the name if that

    person is registered in the contact list of your device). I'm not sure how is that a VAS. Are they

    charging me for having that service? I certainly hope not! I think I can accept Banca Movil as

    valuable service. But then again, banks (such as Bancomer) already offers an app for Android & iOS

    to do just that, and I'm pretty sure it has nothing to do with Telcel because the app uses internettechnologies all the way (meaning: no income for Telcel for usage of Bancomer-provided app).

    I guess contribution of VAS to the revenue of CSP is not significant enough to compensate decreasing

    ARPU. Here is another common complaint by CSP: they act only as a pipe, without benefiting enoughfrom their investment in building the infrastructure. The one that benefits from the traffic generated by

    video browsing on YouTube is.... Google mainly. For photo uploads on Instagram..., Instagram mainly.

    Same goes for Facebook, Twitter, etc13.

    Therefore, they start experimenting with online-charging schemes that takes traffic into account.

    Something nicer than simple bytes-to-cents function, in order to avoid backlash from subscribers. To

    me being-nice here means: at least there would be a notification when certain browsing activity that canincur additional charges, offering user to decide next action (proceed with extra payment, or cancel);

    minimizing surprises (and complaints). More on that after the box below.

    Clarification about contribution and trend of VAS to revenue generation of CSP: this article is not

    scientific; it's only a summary of what I gathered from various sources. I tried to find something tobackup my hunch that VAS will die out. Turns out the situation is different from market to market, and

    even from one segment to another segment in the same market.

    In Telcel case for example, I don't know how they label me in their system :) but certainly not a low-

    end. I have smartphone with unlimited 3G plan. Obviously I use internet for everything. I have no

    idea what percentage of Telcel subsribers still use low-end cellphone, and what percentage of themactually use the VAS displayed above.

    So I googled for VAS revenue decline and VAS revenue increase (and similar queries). I even

    googled for Telcel Earning Report (found nothing). I found conflicting results. In India, more than33% of total revenue of Tata Docomo comes from VAS14. I'd like to think this result draws from skills

    and experience of NTT in VAS (whose 43% of its revenue reportedly comes from VAS in 201015).

    So, NTT knows how to do it, very well. Part of the strategy, I guess, is by providing incentive for

    innovation16.

    13 Although that can change if CSP open their platform, giving access to software developers to build / enhance

    applications using telecommunication services provided by CSP. Example, by providing API like Twilio(http://www.twilio.com) or BlueVia (http://www.bluevia.com)

    14 http://www.wirelessduniya.com/2011/11/11/tata-docomo-vas-revenue-surpasses-33-of-its-entire-revenue/15 http://www.voicendata.com/voice-data/news/166226/data-services-revenue-exceed-usd330-bn-2013

    16 http://www.thehindubusinessline.com/todays-paper/tp-info-tech/ntt-docomo-laments-limited-valueadd-in-

    india/article1060116.ece

    http://www.twilio.com/http://www.bluevia.com/http://www.bluevia.com/http://www.twilio.com/
  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    7/11

    It's very similar to revenue share developers get from Android Store & Apple App Store17.

    Back on Track

    Now I will drive the essay back to (big data) analytics. To recap, the situation provides motives for the

    following use-cases:

    1. Churn prevention.

    2. Dynamic pricing.

    They are only two of a dozen other use-cases, listed in the following diagram18.

    The line of thinking I used in the essay is the usual start from goal. Specifically the approach will be

    in this order:

    1. What questions need / want to be answered? (one end)

    2. What data we have? (the other end)3. Techniques? (fill-in the blank in between)

    In the case of churn modeling, what are the questions that are nomally / possibly by CSP (that leads

    them to building their churn model). My lame attempt at mimicking Shakespeare: What are thequestions? That is the question.. I googled around, found nothing concrete enough to my liking, so I

    17 http://www.techrepublic.com/blog/app-builder/app-store-fees-percentages-and-payouts-what-developers-need-to-

    know/1205

    18 http://www.intracom-svyaz.com/download/eng_pdf/bigdata/BigStreamer.pdf

  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    8/11

    have to resort to writing something that sounds logic to me. So..., basically, based on historical data

    (mainly) we want to find out how certain actions, events, and / or conditions lead up to customer churn.

    Actions can be price change or marketing campaign, events can be dropped calls or congestion,

    conditions basically attributes related to the customers.

    All those information are gathered from sources like OSS (Operational Support System), BSS

    (Business Support System).

    The diagram is copied from http://ossline.typepad.com/.a/6a0105359f53d8970c0147e06562af970b-pi

    From OSS we get network-related information (things like dropped calls are gathered from there).

    From BSS we get business-related information (customer detail, the amount of billed, contracts, are allin that area). For more information about OSS/BSS, this page is a good start:

    http://www.quora.com/How-would-you-explain-OSS-and-BSS-to-a-layman

    Now, let's get really practical. Oracle for example, has an offering called Oracle Communication DataModel19 (part of its BSS offerings). As the name implies, it models the entities needed for business

    operation of CSP. One of the components of OCDM is data-mining model. The following screenshot of

    the Oracle's online-documention can give you an idea of what it is:

    19 http://www.oracle.com/us/products/applications/communications/industry-analytics/data-model/overview/index.html

    http://ossline.typepad.com/.a/6a0105359f53d8970c0147e06562af970b-pihttp://ossline.typepad.com/.a/6a0105359f53d8970c0147e06562af970b-pihttp://www.quora.com/How-would-you-explain-OSS-and-BSS-to-a-laymanhttp://ossline.typepad.com/.a/6a0105359f53d8970c0147e06562af970b-pihttp://www.quora.com/How-would-you-explain-OSS-and-BSS-to-a-layman
  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    9/11

    From that we can glimpse features relevant for churn-modeling20. This general idea can be transferred

    to situations where we don't use Oracle's product for example, and we have to build it by hand. Here

    are some of them:

    Customer id, Target column of churn model, Number of future contract count in last 3 months,Subscription count in last 3 months, Suspension count in last 3 months, Contract count in last 3

    months, Complaint count in last 3 months, Complaint call count to call center in last 3 months,

    Complaint call count to call center in the life time in last 3 months, Contract left days in last 3 months,

    Account left value in last 3 months, Remaining contract sum in last 3 months, Debt total in last 3months, Loyalty program balance in last 3 months, Total payment revenue in last 3 months, Monthly

    revenue (arpu) in last 3 months, Contract arpu amount in last 3 months, Party type code, individual or

    organizational in last 3 months, Business legal status, Marital status for individual user, Householdsize, Job Code, Nationality code, ...., For how long billing address is in effective, in days, ....

    Of course those attributes in OCDM are tied to the algorithm used in the product for generating

    prediction models. Some of the dimensions might not be relevant for our specific case. As in any data

    analysis activities, we have to apply some algorithms to select relevant dimensions to base our analysison (http://en.wikipedia.org/wiki/Feature_selection). But at least all those features in OCDM can give us

    starting point.

    From there we can work backward to parts of the system where those data can be obtained from (e.g.:billing and charging system, CRM, etc). Let's see another product, BSCS iX (a billing and charging

    solution now offered by Ericsson), just to get a glimpse of what is out there. The screenshot shows a

    table where invoices are stored.

    20 http://docs.oracle.com/cd/E11882_01/doc.112/e15886/data_mining_cdm.htm#autoId4

    http://en.wikipedia.org/wiki/Feature_selectionhttp://en.wikipedia.org/wiki/Feature_selection
  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    10/11

    Churn-modeling itself is quite a feat. There's a good book on that matter, from Rob Matisson, given

    away for free, and is also available in Google Books:

    http://books.google.com.mx/books/about/The_Telco_Churn_Management_Handbook.html?id=M_uuQx7vMngC&redir_esc=y . And of course, exercise.

    Teradata in collaboration with Duke University and several CSP held a contest in 2003 on churn-

    modeling. From the tournamen't site at http://www.fuqua.duke.edu/centers/ccrm/ , we can downloadthe problem description and dataset, among other materials.

    Wrapping up, a few words about dynamic pricing. I was wondering how is that related to big dataanalytics. Turns out dynamic pricing is a matter of yield management21, which can benefit from

    analytics, specifically forecasting techniques.

    21 http://en.wikipedia.org/wiki/Yield_management

    http://books.google.com.mx/books/about/The_Telco_Churn_Management_Handbook.html?id=M_uuQx7vMngC&redir_esc=yhttp://books.google.com.mx/books/about/The_Telco_Churn_Management_Handbook.html?id=M_uuQx7vMngC&redir_esc=yhttp://www.fuqua.duke.edu/centers/ccrm/http://books.google.com.mx/books/about/The_Telco_Churn_Management_Handbook.html?id=M_uuQx7vMngC&redir_esc=yhttp://books.google.com.mx/books/about/The_Telco_Churn_Management_Handbook.html?id=M_uuQx7vMngC&redir_esc=yhttp://www.fuqua.duke.edu/centers/ccrm/
  • 7/30/2019 Write Up Raka Bigdata Analytic Telco

    11/11

    Here are some scenarios of dynamic pricing, which basically an attempt at offering bandwidth to

    customer, at an attractive price, especially when the network conditions allows to do so. This has

    something to do with giving flexibility to subscriber; by not forcing them to pick between subscribing

    to unlimited capacity (at higher cost) or staying with limited bandwidth (barring them from using moreservices when needed). All-in-all this can make more people drawn into subscribing to / staying with

    the CSP. These diagrams are copied from a whitepaper by Tango Telecom, Beyond Policy, a New Era

    in Real-Time Charging

    22

    .

    This is an example of the case for real-time analytics over stream of data, coming in from the network

    elements (OSS), fed into forecasting model (built from historical data), to figure out for example if the

    traffic is low (thus offering turbo-boost to a customer wouldn't affect the rest), etc. For a more use-

    cases in this area, the following whitepaper from Telcordia is a good source: Applying YieldManagement in the Mobile Broadband Market23.

    22 http://www.tango.ie/dynamic-marketplace.html

    23 http://bit.ly/ZXdXPV (www.telecomtv.com/DocSend.aspx?fileid...9f55...yield-management...)

    http://bit.ly/ZXdXPVhttp://bit.ly/ZXdXPV

Recommended