How to explore Facebook data at the command-line? (Part I — Data preview)

Now that the Facebook.com has more than a billion of active users, it really has become a personal, product and corporate branding hub. Companies would like to understand what people think about topics related to their business, so they can make their products and marketing more relevant to their customers. One way to achieve such goal is to analyse company’s FB pages which can make marketing content more relevant for marketers.
Before we go any further, let’s setup our working environment by creating a folder on the Desktop. To do so, assuming we have a Linux based OS (e.g., Ubuntu) on our computer and let’s first fire up a command line and navigate to our analysis folder:
cd ~/Desktop
mkdir FBdata
cd FBdata
This will create a folder FBdata
on your Desktop. Next, we download the data. In this project, we’re going to mine a data set generated by using a Facebook scraper on a particular Facebook page (undisclosed).

The goal of this experiment is to find the most vibrant status message on that page, with just one Bash command. You should download the data from below. Let’s save the data as: facebookdata.csv
.
wget https://www.scientificprogramming.io/datasets/facebookdata.csv
Learning objectives
By completing this, you will learn to use the following Bash commands:
head
– output the first part of filestail
– opposite to headcat
– concatenate and print filessort
– sort file contentsgrep
– search the input files for lines containing a match to a given pattern listuniq
– remove duplicate entriesawk
– programming language- Bash
functions
Preview

This dataset is also small (toy) and we could in principle open it in a text editor or in Excel. However, real-world datasets are often larger and cumbersome to open in their entirety. Instead, let’s get a sneak peak of the data using the command csvstat
from the csvkit
tool (pip install csvkit
).
Stats
Finding the stat
of the cols:
$ csvstat -n facebookdata.csvoutput1: status_id
2: status_message
3: link_name
4: status_type
5: status_link
6: status_published
7: num_reactions
8: num_comments
9: num_shares
10: num_likes
11: num_loves
12: num_wows
13: num_hahas
14: num_sads
15: num_angrys
Finding the stat
of the rows:
$ csvstat --count facebookdata.csvoutputRow count: 3222
It looks like that the dataset has a total of 11
columns and 3222
rows.
Data Preview
This is often the first thing to do when you get your hands on new data; previewing it is important to get a sense for what it contains, how it is organized, and whether the data makes sense in the first place. To help us get a preview of the data, we can use the command head
, csvlook
and csvcut
:
$ csvcut -c 1,4,7-11 facebookdata.csv | csvlook | head -n 50

The csvcut
command helped us to cut (extract) a given set of columns (e.g., 1,4,7-11
). Note that we have not previewed the column numbers 2
and 3
(status_message
, link_name
), which are wider columns and wouldn’t fit properly into our preview-screen above! See you soon in the Part II ⏰ !