From Trust Flows Understanding The Deloitte Fast 50 Big Data Company You never heard of… Until now.
Transcript
1. From Trust Flows Understanding The Deloitte Fast 50 Big Data
Company You never heard of Until now.
2. @tryMajestic Some Stuff Youll Learn How we built a search
engine without $30 billion dollars How you can use it to make lots
of: Predictions Insights Money Data Stories
3. @tryMajestic Reaching for the Stars
4. @tryMajestic An Inspiration of a Search Engine
5. @tryMajestic Majestic is a Specialist Search Engine Digital
knowledge on a grand scale Dixon Jones
6. @tryMajestic The BIG specialist search engine Twitter has
500,000,000 Tweets per day on average In the same day, Majestic
crawls well over 2,000,000,000 NEW URLs (and sees 7 billion)
7. @tryMajestic How do they do that? Information Retrieval in
the Zeta age 1. Data Collection 2. Data Grouping 3. Data Indexing
4. Data Matching
8. @tryMajestic How to Collect 7 Billion URLs a Day?
9. @tryMajestic How to Analyze 200 Billion URLs a Day?
10. @tryMajestic Groups Make Search Much Better Find a Fact
Find a Friend Find a Customer Finding Anything
LibraryofCongresscirca1940 Research At:
info.majestic.com/groupresearch
11. @tryMajestic We Group AND ANALYSE pages Topical Trust Flow
using decay Algorithm ???
12. @tryMajestic The Index: For every page we know Its
influence in a simple score Its Context Its context by keyword Its
Influence in Context! In a series of simple 0-100 scores
13. @tryMajestic Works best with Universal Data set Every
signal is small Individually prone to error or opinion At scale the
error decreases Confidence increases
http://info.majestic.com/universal
14. @tryMajestic Data Matching
15. @tryMajestic Our Data Stack (For the Techies) Crawler: C#
.net / Mono NoSQL Read only file system Java Interrogation Dynamic
Front End Perl/Ruby etc Hadoop coming soon
16. @tryMajestic So we built it Now Imagine What COULD you do
with it?
17. @tryMajestic 1: Compare Competitor Backlinks
18. @tryMajestic Who is more popular on Twitter? 2: Finding
influencers Lady Gaga? Barack Obama? Trust Flow 74 Trust Flow
70
19. @tryMajestic 3: Prediction Elections Boris v Ken Obama v
Romney
20. @tryMajestic 4: Lobbying Senators
21. @tryMajestic 5: Data Art (Profiling Companies)
22. @tryMajestic What if we Pivot? Hadoop Imagine your OWN
version of our web index? A subset of the data, prepopulated for
your needs Updated Daily / Weekly / Monthly Stored in Open Source
Hadoop instances ready for easy interrogation What could you do
then?
23. @tryMajestic Data Store Examples
24. @tryMajestic
25. @tryMajestic
26. @tryMajestic
27. @tryMajestic Ways you could segment the web All domains
hosted in [Choose country or City Here] Most influential sites
about [Insert 800 Topics Here] Best Web Pages for [Choose 50
Million Phrases Here] Spamiest pages about [Insert 800 Topics Here]
Most influential Pages on [Choose any set of sites] Create a set of
pages with [Choose properties here] Got a plan? We have the
starting point for web data
28. @tryMajestic Some Takeaways How we built a search engine
without $30 billion dollars How you can use it to make lots of:
Predictions Insights Money Data Stories
29. @tryMajestic Out of Trust Flows understanding Real insight
into the world wide web from Majestic, the specialist search
engine