My Dissertation

During my MSc in Web TechnologyI decided to write my dissertation around Big Data and Machine Learning. I titled my dissertation Applying Machine Learning to Big Data using Social Media Analysis to identify people with High Intelligence. The main reason behind my topic of choice was to learn how to create and setup a Bid Data environment and also understand and apply the concepts behind Machine Learning. I decided upon using Hadoop as my Big Data framework. Cloudera offers a free single-node Hadoop cluster VM. This was my starting point, and took all the hassle out of setting up my environment. I choose NLTK, the Python framework as the machine learning framework. There are many machine learning frameworks available but NLTK ticked the boxes around natural language processing.

Executive Summary

Since the rise of social media platforms such as Facebook and Twitter, companies and organisations have performed social media analysis or data mining to help better understand their existing customers and to seek out potential new ones. This research has set about using this technique coupled with machine learning algorithms to explore the question of being able to identify highly intelligent people solely on their social media data. Given that there are millions of people worldwide sharing and collaborating online, the abundance of available data is potentially unlimited. The data collected as part of this research will be stored in a Big Data framework to ensure this work will be able to cope with the vast amounts of data available. It was concluded that it’s not possible to distinguish highly intelligent people by solely analysing their social media data. The observations from analysing over 1 million tweets show that social media users regularly boycott the use correct grammar and punctuation. The findings do suggest that Twitter’s character restriction has a large influence on the quality of content posted.

Futher Reading

Please feel free to download a copy and have a read if you're interested. Download here. If you have any questions or would like a deeper explanation, feel free to get in touch here.