codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

Follow publication

Exploring Facebook data at the command-line (Part II)

--

Image modified from the freepik.com

In this project, we’re going to mine a data set generated by using a Facebook scraper on a particular Facebook page (undisclosed).

We use a data set generated by using a simple Facebook scraper. Image via unsplash.com

The goal of this experiment is to find the most vibrant status message on that page, with just one Bash command. You should download the data from below. Let’s save the data as: facebookdata.csv.

wget https://www.scientificprogramming.io/datasets/facebookdata.csv

In the previous (part I) lesson, we have learned how the csvcut command can help us to cut a given set of columns (e.g., 1,4,7–11). Note that we have not previewed the column numbers 2 and 3, which are wider columns and wouldn’t fit properly into our preview-screen above!

Count the no. of uniqe status types @ Bash Shell

Let us now calculate how many entries for each of the status types in the dataset. Status types are in the column 4. To extract column #4 from our file, we can make use of the csvcut again as follows:

$ csvcut -c 4 facebookdata.csv | sort | uniq -c

Here, the command-line option -c specifies which column to extract (or cut out). Note that, despite its name, the cut command does not modify the original file it acts on. Now, we would like to count how many types of entry came from each status type. Here we need the command uniq -c to count (hence the -c ) how many unique appearances of each status type. However, uniq -c requires the input to be ‘sorted’, so the first step was to sort the list of status types.

Find the most popular status entry!

To do this analysis efficiently, we’ll use the command line language called awk , a tool that allows you to filter, extract and transform data files. awk is a very useful tool to put in your bag of tricks. To start, let’s look at a very simple awk program to output every line of our facebook.csv file, where we specify the delimiter of the file (comma) using the -F option.

Since the data set has quoted (”text”) cells we will use csvcut to extract the required columns, e.g., we want to extract the column 1,8–15 into a file called fbreactions.csv first.

csvcut -c 2,8-15 facebookdata.csv > fbreactons.csv

The idea is to sum-up all the reactions: like, share, love, etc. (columns 8 + … + 15) on each FB status and then find the status which had the maximum number of reactions. To calculate the total number of reactions on each entry (status), all we need to do is horizontally add up all the numbers from the columns 8–15 and we do this easily with awk, as follows:

$ awk -F "," '{ total = total + $2 + $3 + $4 + \  
$5 + $6 +$7 +$8 +$9; print $1"," \
total; total=0 }' fbreactons.csv |head

Let’s pay attention to the awk statetment, which not only sums up the columns side by side, but also on each line prints two output (status id and total number of reaction on that row). Finally, at the end of each iteration, it nulls the total=0.

Find the Status with #MAX reactions

To get the status with maximum reactions, next, we sort the status ids, based on the number of reactions (column 2) using the sort -n -r -t”,” -k 2 function, which tells the system to sort out the piped (|) output numerically (-n), on the column 2 (-k 2) which is delimited by a comma (,):

$ awk -F "," '{ total = total \
+ $2 + $3 + $4 + $5 + $6 +$7 +$8 +$9; \
print $1"," total; total=0 }' \
fbreactons.csv | sort -n -r -t"," -k 2 \
| head -n 1

The final output, tells us that the status id: 7331091005.. had the maximum number of reaction of total 668121. If we now use grep, we can easily find the message which had the largest number of reactions.

cat facebookdata.csv | grep 7331091005_10154089857531006

This way find the most popular message on that page which was:

LeBron and the Cavs are tired of being bullied

How interesting it is?

[This project is a part of the ‘Learn of Analyze Data in Bash Shell and Linux’ course.]

Related works

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

No responses yet

Write a response