D. Tsagkarakis, A. Vavakos, V. Stavrou, M. Kandias Improving Online Social Network collection and processing mechanisms Athens University of Economics and Business
Problems
Time-consuming data management due to conventional relational databases.
Delays in the data mining mechanisms due to lack of parallel processing.
Need to upgrade existing mechanisms in order to make use of the latest API versions.
Dimitris Tsagkarakis, Alexandros Vavakos, Vasilis Stavrou, Miltiadis Kandias {d.tsagkarakis, alexandros.vavakos, stavrouv, kandias}@aueb.gr
Information Security and Critical Infrastructure Protection Laboratory
Dept. of Informatics, Athens University of Economics & Business (AUEB)
Improving Online Social Network collection and processing mechanisms
Introduction
Rapid explosion of Online Social Networks.
Users transfer their offline behavior to the online world.
Extraction of information from social networks contributes to the profiling of users.
Open Source INTelligence (OSINT) to mitigate the insider threat.
References
1. Amichai-Hamburger, Y., Vinitzky, G., Social Network Use and Personality”, 2010. 2. Gritzalis, D., Kandias, M., Stavrou, V., Mitrou, L., "History of Information: The case of Privacy and Security in Social Media", in Proc. of
the History of Information Conference, pp. 283-310, Law Library Publications, Greece, 2014. 3. Gritzalis, D., Stavrou, V., Kandias, M., Stergiopoulos, G., “Insider Threat: Εnhancing BPM through Social Media”, in Proc. of the 6th IFIP
International Conference on New Technologies, Mobility and Security, Springer, UAE, 2014. 4. Kandias, M., Mylonas, A., Virvilis, N., Theoharidou, M., Gritzalis, D., “An Insider Threat Prediction Model”, in Proc. of the 7th Internation-
al Conference on Trust, Privacy, and Security in Digital Business, pp. 26-37, Springer (LNCS-6264), Spain, 2010. 5. Kandias, M., Stavrou, V., Bosovic, N., Gritzalis, D., “Proactive Insider Threat Detection Through Social Media: The YouTube Case”, in Proc.
of the 12th Workshop on Privacy in the Electronic Society, Berlin, 2013. 6. Kandias, M., Mitrou, L., Stavrou, V., Gritzalis, D., “Which side are you on? A new Panopticon vs. Privacy”, in Proc. of the 10th Internatio-
nal Conference on Security and Cryptography, pp. 98-110, Iceland, 2013. 7. Kandias, M., Galbogini, K., Mitrou, L., Gritzalis, D., "Insiders trapped in the mirror reveal themselves in social media", in Proc. of the 7th
International Conference on Network and System Security, pp. 220-235, Springer (LNCS 7873), Spain, 2013. 8. Kandias, M., Stavrou, V., Bozovic, N., Mitrou, L., Gritzalis, D., "Can we trust this user? Predicting insider’s attitude via YouTube usage
profiling", in Proc. of 10th IEEE International Conference on Autonomic and Trusted Computing, pp. 347-354, IEEE Press, Italy, 2013. 9. Kandias, M., Virvilis, N., Gritzalis, D., “The Insider Threat in Cloud Computing”, in Proc. of the 6th International Workshop on Critical
Infrastructure Security, pp. 93-103, Springer, Switzerland, 2011. 10. Kotzanikolaou, P., Theoharidou, M., Gritzalis, D., “Interdependencies between Critical Infrastructures: Analyzing the Risk of Cascading
Effects”, in Proc. of the 6th International Workshop on Critical Infrastructure Security, pp. 107-118, Springer, Switzerland, 2011. 11. Mylonas, A., Kastania, A., Gritzalis, D., “Delegate the smartphone user? Security awareness in smartphone platforms”, Computers &
Security, Vol. 34, pp. 47-66, May 2013. 12. Mylonas, A., Meletiadis, V., Mitrou, L., Gritzalis, D., “Smartphone sensor data as digital evidence”, Computers & Security, Vol. 38, pp. 51-
75, October 2013. 13. Stavrou, V., Kandias, M., Karoulas, G., Gritzalis, D., "Business Process Modeling for Insider threat monitoring and handling", in Proc. of
the 11th International Conference on Trust, Privacy & Security in Digital Business, pp. 119-131, Springer (LNCS 8647), Germany, 2014. 14. Shaw, E., Ruby, K., Post, J., “The insider threat to information systems: The psychology of the dangerous insider”, Security Awareness
Bulletin, pp. 1-10, 1998. 15. Theoharidou, M., Kotzanikolaou, P., Gritzalis, D., “Risk assessment methodology for interdependent critical infrastructures”, Internatio-
nal Journal of Risk Assessment and Management, Vol. 15, No. 2-2, pp. 128-148, 2011.
Use of a distributed cluster of machines to store and manage large amounts of data.
Need for parallelized data collection due to the constantly increasing amounts of
data that social networks process.
Ability to connect to a social network using accounts from different networks.
Ability to simultaneously collect user’s data from all the social networks in which
they use the same account.
Proactive critical infrastructure protection capability.
Ability to enhance organizational monitoring systems to mitigate the insider threat.
Hadoop Ecosystem
Figure 3: Hadoop ecosystem
OLTP vs. OLAP
OLTP System OLAP System
Inserts and Updates Short and fast inserts and
updates initiated by end users
Periodic long-running batch jobs refresh
the data
Queries Relatively standardized and
simple queries that return
relatively few records
Often complex queries involving
aggregations
Processing Speed Typically very fast Depends on the amount of data involved
Space Requirements Relatively small Relatively large
Database Design Highly normalized with many
tables
Typically de-normalized with fewer
tables; use of star and snowflake
schemas
Final Twitter Crawler
Figure 2: Social media connectivity
Figure 5: Twitter Crawler configuration window
User Privacy: Ability to identify a user from a comment or image by third parties. Option to display the geographical location where a comment or image was posted from. Utilization of users’ personal information in order to associate certain advertisements with them.
Improvements: Parallelization using multithreading. Design of a Graphical User Interface. Crawler update to sequentially gather users using a file. Crawler update to modify the tool’s configuration from within the application. Crawler update to store incidents in a log file for later use (analysis or debugging).
Youtube
User Privacy: Ability to display user’s activity to third parties. Ability to display video’s information (view count, likes, etc). Connection with Google accounts. Shared accounts with Facebook and Twitter.
Improvements: Updates and improvements on YouTube’s API responses. Parallelization using multithreading. Changes on the data stored in the data warehouse.
Figure 1: OLTP vs OLAP Systems
Conclusions
Figure 4: Twitter Crawler root window