A Twitter Analysis of the “100 Most Influential People in Crypto”

Ash
codeburst
Published in
4 min readJan 29, 2018

--

I recently ran across an article titled The 100 Most Influential People in Crypto and decided to test my chops by analysing their twitter accounts to see how they felt about Bitcoin Cash. I’ll be using Python 3 for this project, and all the code will be available here.

Data Preprocessing

Hint: If you find preprocessing boring (can’t blame you) skip to the Data Analysis section below.

Scraping Tweets

First, I need to get a list of all the Twitter handles listed on the website. Using the urllib and BeautifulSoup libraries I can write a function to download/parse the page’s HTML, then extract the Twitter handles:

This may look like a ton of code just to get 100 Twitter handles, however the majority of it is boilerplate. In fact, you can use my web scraping boilerplate to kickstart any project that requires web scraping (if you have no experience with web scraping, see this video by Data Science Dojo).

Now that we have all 100 Twitter handles, it’s time to get the Tweets associated with each handle. I’ve found that the Twitter API is the most reliable way of accessing a user’s tweets, even though it only shows us the last ~3000 Tweets from each user. In the past I’ve tried to web scrape Twitter to access greater than 3000 Tweets from an account, however my results were spotty (at best).

After some Googling, I found this Github Gist; a Python script that downloads and saves a user’s Tweets in a CSV file. After modifying it to work with Python 3 and scrape 100 accounts in one go, I’m left with this:

Running the script above results in an error. It turns out the article used Peter Todd’s old Twitter handle (@petertoddbtc) which is set to private, causing the Twitter API to throw the error. So let’s change the outdated handle to Peter’s current handle (@peterktodd), and now we’re off to the races.

After running for an hour the script produces 100 CSV files (40MB in total) containing ~300,000 tweets:

Tweets saved in CSV format

Each of the 100 CSV files has the headings “id”, “created_at”, and “text”:

Andreas Antonopoulos’s (@aantonop) Twitter feed in CSV format

In this project, we’re mainly concerned with the “text” column since it contains the actual Tweets.

Grouping Scraped Tweets

Now that we’ve downloaded all the Tweets, we need an efficient way of referencing them. Let’s do this by creating a DataFrame using the Pandas library:

This results in the following DataFrame:

The first 11 entries in the newly created DataFrame

We now have a convenient representation of all 100 persons of interest, as well as their corresponding Tweets. That does it for pre-processing, now time for data analysis (the fun part).

Data Analysis

Background Information

Bitcoin Cash is a hard fork of Bitcoin with certain (controversial) changes meant to reduce transaction fees. Some in the Bitcoin community find Bitcoin Cash so abhorrent that they’ve taken to calling it “bcash” to remove “Bitcoin” from its name entirely.

Methodology

I’m assuming that the more often someone refers to Bitcoin Cash as “bcash”, the more animosity they feel towards it. Therefore, using the previously created DataFrame, I can write a function that calculates the proportion of Bitcoin Cash mentions that are “bcash” for each of the 100 Twitter accounts:

This function gives each individual a BCH Animosity score between 0 and 1. A score of 1 indicates that all mentions of Bitcoin Cash were “bcash” whereas a score of 0 indicates that the individual never used the term “bcash”. Thus, the closer an individual’s score is to 1, the more animosity they feel towards Bitcoin Cash. Now let’s actually call this function and add the resulting data points to our DataFrame:

The DataFrame now looks like this:

We now have 2 new columns; “BCH Animosity”, a score between 0 and 1, and “BCH Mentions”, the number of times an individual referenced Bitcoin Cash.

After removing individuals who made fewer than 30 mentions of Bitcoin Cash, we can rank the individuals with the highest BCH Animosity Scores:

Which results in the following:

Results

Thus, the individuals with the greatest animosity towards Bitcoin Cash are:

  1. Ansel Lindner (@AnselLindner)
  2. Francis Pouliot (@francispouliot_)
  3. Peter Todd (@peterktodd)
  4. Josh Olszewicz (@CarpeNoctom)
  5. Alan Silbert (@alansilbert)
  6. Pierre Rochard (@pierre_rochard)
  7. Tone Vays (@ToneVays)
  8. Jameson Lopp (@lopp)
  9. Adam Back (@adam3us)
  10. Charlie Lee (@SatoshiLite)

Contact Me

Feel free to DM/Tweet at me with any ideas/tips/feedback.

--

--