Committed to Deliver…
We are Leaders in Hadoop Ecosystem. We support, maintain, monitor and provide services over
Hadoop whether you run apache Hadoop, Facebook version or Cloudera version in your own data center, or n cluster of machines Amazon EC2, Rackspace etc
We provide Scalable End-to-end Solution: Solution that can scale of large data set (Tera Bytes or Peta Bytes)
Low Cost Solution: Based on open source Framework currently used by Google, Yahoo and Facebook.
Solution optimized for minimum SLA and maximize performance
– Project Initiation• Project Planning• Requirement Collection• POC using Hadoop technology
– Team Building• Highly skilled Hadoop experts • Dedicated team for project
– Agile Methodology• Small Iterations• Easy to implement changing
requirement
– Support• Long term relationship to support developed product• Scope to change based on business/technical need
The combined experience has led to the adoption of unique methodology that ensures quality work. We:
Evaluating the hardware available and understand the clients requirements.
Peeking through the data. Analyzing data, prototype using M/R code. Show the
results to our clients. Iterative - and continuous improvement and develop
better understanding of data. Parallel development of various tasks:
◦ Data Collection◦ Data Storage in HDFS◦ M/R Analytics jobs.◦ Scheduler to run M/R jobs and bring coordination.◦ Transform output into OLAP cubes (Dimension
and Fact Table)◦ Provide a custom interface to retrieve the M/R
output
We are expert in time series data, in other words we receive time-stamp data.
We have ample experience in writing efficient fast and robust Map/Reduce code which implement ETL functions.
We have massaged Hadoop to enterprise standard provided features like High Availability, Data Collection, data Merging.
Writing Map/Reduce is not enough. We wrote layers on top of Hadoop which uses Hive, Pig to transform data in OLAP cubes for easy UI consumption.
We provide a brief about our clients.
Collector
Hadoop ClusterHadoop Cluster
Map / Reduce Output
UI Display
UI Display
Thrifit ServiceThrifit Service
Training Data
WebUI
WebUI
External News Collector
Map/ Reduce Categorization Index
Map/ Reduce (Filtering, Term Freq Collection)
Map/ Reduce (Training Set) Training Data
DFS Client
Hive Interface
We were asked to analyze their sales data and extract valuable information from the data.
The Data was in form of 9-tuple format: <OrderID, EmailID, MobileNum, ProductID, PayableAmount, DeliveryCharges, ModeofPayment, OrderStatus, OrderSite>
We were asked to provide information like unique subscribers count (used email address), per day transactions amount
We deployed the Hadoop cluster on three machines ◦ Deployed our collector to pump data from DB into
HDFS◦ Wrote M/R jobs to generate OLAP cubes.◦ Provided Hive Interface to extract and show in UI.
OrderID EmailID Mobile Num
Payable Amount
Delivery Charges
Mode of Payment
Order Status
Order Site
Day Granularity
Actual Number of Customers
Forecast Number of Customers
Total Aggregated Amount
Forecast Aggregated Amount
Email ID Payable Amount
We delivered end-to-end reporting solution to Guavus.
The Data was provided by Sprint Network (Tier 1 Company) we had to develop a reporting engine to analyze and generate OLAP cubes.
We were asked to provide evaluate Peta Bytes of data provide ETL solution
We deployed the Hadoop cluster on 10 Linux machines.
We wrote our collector which read Binary Data and pushed into Hadoop Cluster.
We wrote M/R jobs (which run for 4 hrs) every day The idea was to provide provide analytics on stream data
We generate OLAP cubes and storing results in Infinity DB (column DB), Hive.
Reporting UI/ Web InterfaceReporting UI/ Web Interface
Report Generation Task (Map / Reduce Framework)Report Generation Task (Map / Reduce Framework)
Data CollectorData Collector Query EngineQuery Engine Hadoop Configuration
Hadoop Configuration
Distributed Storage Framework (Hadoop / HDFS)Distributed Storage Framework (Hadoop / HDFS)
Infinity DB / Hive / PigInfinity DB / Hive / Pig
Hadoop Infrastructure
Hadoop Infrastructure
Map / Reduce Tasks
Map / Reduce Tasks
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Data Collector
Monitor / Overall Scheduler
Infinity DB / Hive /
Pig
Infinity DB / Hive /
Pig
Rubix Framework
Rubix Framework
UI D
ispla
yU
I Disp
lay
For HT we are developing a syndication clustering algorithm.
We have large amount of old news document and we were asked cluster. Manually clustering was nearly impossible
We implement a clustering Map/Reduce algorithm using Cosine Similarity and clustered the documents.
XML files/Documents
V1
V2
VN
List of XMLNews Files
Transformed into Integer Vector.One XML news file maps to One Vector.
ApplyCoSine
SimilarityBetweenVectors
Get the Minimum Distance Pair of Vector
NewsFiles
NewsFiles
NewsFilesNewsFiles
News Files
News Files
CreateList of closely related stories
HADOOP PLATFORM
MAP Functionality REDUCE Functionality
Clus
ter A
lgor
ithm
C-Ba
yes
Clas
sific
ation
C-Ba
yes
Clas
sific
ation
Cate
goriz
e D
ocum
ents
Cate
goriz
e D
ocum
ents
Office Location: India A-82, Sector 57, Noida, UP, 201301 Japan 2-8-6-405,Higashi Tabata Kita-ku,Tokyo,Japan General Inquiries [email protected] Sales Inquiries [email protected]