Understanding Citizen Participation in the 2019 Nigerian Elections
Applying data analytics, machine learning and visualization techniques in understanding key events associated with the 2019 Nigerian elections, citizen engagement, candidate use of social media, and geo political stuctures.

Note: This project is still in development. Expected completion : ~06/01/2022.
This project began on the premise that the proliferation of social media interactions provides an interesting lens to study human behavior, and ask important questions about election discourse in Nigeria as well as interrogate social/demographic questions. It is based on data collected from twitter between September 2018 to March 2019 (tweets geotagged to Nigeria and tweets containing election related keywords). Overall, the data set contains 25.2 million tweets and retweets, 12.6 million original tweets 8.6 million geotagged tweets and 3.6 million tweets labelled (using an ML model) as political.

Presidential Election Timeline

This section provides an overview of frequency of data collected (all tweets and retweets, original tweets, and tweets labelled as political), and an extracted timeline of notable events.
25.2M
Tweets and retweets
12.6M
Original tweets
3.6M
Political tweets
Events Timeline (wordcloud)
The section below illustrates a word cloud created from the content of tweets for "eventful" days in the dataset. Visualizing these word clouds provide additional context on the key topics that drove discussions. Click the replay button to cycle through each event.
Re-Play Timeline
Sep 2018
March 2019
Method
How is the data derived and analyzed?
Peak findingThe events identified events in the timeline are detected and labelled using a 3 step approach. First we apply a peak finding algorithm on tweet frequency data to identify the days with the most discussion.
Qualitative AnalysisNext we extract tweets for that day and apply qualitative techniques to derive a generic title/description of the key events for the day. This includes computing word frequencies and wordclouds, inspecting a random subset of tweets (~400 tweets per day) and constructing search queries to verify insights.
Insights
What are some insights on the types of events that drove discourse?
Most discussed eventThe event that drew the most discussion was postponement of the elections exactly one week to the scheduled date.
Reactions to decisions by incubent governmentAt least two spikes in the tweet frequency were related to decisions by the incumbent government which could impact election outcomes - appointment of a new elections collation head and the replacement of the chief justice.

Candidate Use of Social Media

This section explores how each candidate leveraged social media in engaging with citizen, as captured by our dataset. This includes how often they shared messages (tweets they authored), how often others replied to their tweets, and how often they replied to others.
Order by Frequency

Geo Tagged Data

The collected dataset contains 8.6 million geotagged tweets. We can visualize these locations (weighted by frequency) to understand clusters and sources of discourse.
8.6M
Geo tagged tweets
8.0M
Geo tagged to Nigeria
loading map data...
Top 20 Cities by # tweets
Acknowledgement
This work began in 2018 and would not be possible without the generous support GCP credits from the Google Developer Expert program. These credits were used to run VMs for ingesting tweets, storage / processing in BigQuery and compute VMs for training ML models.
Note: this work constitutes academic research and not intended to be the sole source of information for decision making.