Data Visualizations of HYIP Dataset

Post on 03-Feb-2022

7 views 0 download

transcript

Data Visualizations of HYIP Dataset

Jie Han

Quantifying the WorldApril 23, 2012

Financial Cryptography 2012

http://fc12.ifca.ai/pre-proceedings/paper_27.pdf

This could be you!!!

Overview

1. What's an HYIP?2. Dataset 3. Processes4. R graph examples5. Google Chart examples6. Some helpful hints

High Yield Investment Programs (HYIPs)

● Also known as a Ponzi or pyramid scheme● Promise high returns on investment● Pay existing investors with revenue from new

investors● Unsustainable in the long run

Why are HYIPs a problem?

● Advertised as legitimate investments

● Sophisticated online ecosystem in support of the schemes

HYIP Website

HYIP Aggregator Websites

HYIP Variables

HYIP Lifetime

Typical life cycle of an HYIP:

About the Data

● Since 11/17/2010, still running● Collected data from nine "aggregator" websites● Total observations: 141k+● Total HYIPs observed: 1,576+

Process

Data collection (Python, crontab, mongoDB)

Preliminary analysis (Python, R)

Continue data collection, work on parsing all aggregators (Python)

Look at what we have, decide on what we want (R)

Difficulties in analyzing data -> create interactive data visualizations (Python, Google Charts, JS, HTML)

Use new tools to look for patterns (browser & eyes)

How an R Chart Gets Generated

Data Collection (Python)

Parse data & insert into db (Python, mongoDB)

Fetch & manipulate data (Python, mongoDB, R)

Output a .pdf image to server

New user input (HTML forms)

Front End

Back End

User interact with data in browser

Background scripts

How Can We Trust Aggregator Data?

CDF of Standard Deviations of HYIP Lifetimes ● Aggregators agree 80% of the time

How Long Do HYIPs Last Before Collapsing?

Survival function of HYIP Lifetimes● Most HYIPs collapse within a few weeks

What Factors Lead to Collapse?

Factors that lead to shorter HYIP lifespans:● Higher advertised rates of return● Shorter mandatory investment terms

R vs. Google Charts

● Useful if familiar with the dataset

● Good at presenting aggregate summaries

● Large learning curve, especially when you want to do something specific

● More customizable● Most analysis techniques

are available

● Anyone can view & interact with the data

● See a complete data distribution

● Learning curve isn't bad● Not as customizable● Have to wait for updates for

more functionality, or write your own

R Google Charts

How a Google Chart Gets GeneratedData Collection

(Python)

Parse data & insert into db (Python, mongoDB)

Fetch & manipulate data (Python, mongoDB, R)

Write JS & HTML page (Python, JS, HTML, CSS)

New user input (HTML forms)

User interact with data in browser

Background scripts

Back End

Front End

Variable Changes Over Time

cherryshares.com, aggregator ratingLink

General Programming Tips● Spend time on data quality● Organize your code, variable names, and files● Keep records of working examples● Plan out your code to maximize pattern capture● Error-catching, browser consoles, and regexes

are friends● Test out chunks of code before putting them

together● Google Tables take a while to load for large

datasets● Google Charts Playground allows you to test code

in their environment

Future Work

● Create an interactive web based visualization for our dataset - some examples I made

● Link scams together● Explore larger dataset

Thanks!