Finding Local Events Using Twitter Data

Using an existing dataset, we divided tweets into location buckets, used DBSCAN to define significant events, and selected a headline tweet that best represents each event..We were able to identify events such as a concert happening at the O2 arena in London and the Queen’s birthday.Throughout this process we learned a lot; the most crucial thing being that we needed more data!.With more data, we could reduce noise and hopefully find more events, reducing the misclassification rate..Along with this, we made the assumption that tweets would more often than not be used for reporting on some type of event..However, this was not the case as more people use the platform for simple, personal updates..From a machine learning perspective, we found evaluating performance on unsupervised problems to be challenging since there are often no quantitative measures to validate off of.Future WorkIn the future, we would start by gathering more data..Ideally, we would want to gather data for denser areas and for different locations..There are also ways for us to improve our method for identifying anomalies..We could look at the time-based density of tweets to identify events of different length, giving us the opportunity to potentially distinguish between hourly and day-long events..We could also try varying the location cluster size..This could highlight events happening across a varying geographic area.Additionally, to report a better summary tweet, we could figure out how to generate new headlines by piecing together important aspects of all tweets associated with an event rather than selecting the best one from existing tweets..Finally, processing tweets and defining events in real-time would be the end goal to make this project have meaningful business impact.. More details

Leave a Reply