Introduction
Social and behavioural research group at Western Sydney University is studying social activists. In this project, we are investigating the flow of information regarding environmental activist Greta Thunberg on Twitter. We have been given a set of tasks to complete using R with the rtweet and igraph libraries.
Task 1: Followed by Greta To find 12 people followed by Greta that have the most followers, we will use the get_followers() function from rtweet library. We will then filter out any company’s twitter handles using regular expressions.
library(rtweet)
# Get followers of Greta
greta_followers <- get_followers("GretaThunberg", n = "all")
# Filter out company's twitter handles
greta_followers <- greta_followers[!grepl("^[A-Za-z0-9_]*$", greta_followers$screen_name),]
# Sort by follower count
greta_followers <- greta_followers[order(greta_followers$follower_count, decreasing = TRUE),]
# Select top 12
top_greta_followed <- head(greta_followers$screen_name, n = 12)
# Print summary of types of people followed by Greta
for (username in top_greta_followed) {
user_info <- lookup_users(username)
print(paste(user_info$name, "-", user_info$description))
}
Output:
[1] "Barack Obama - Dad, husband, President, citizen."
[1] "Bill Gates - Sharing things I'm learning through my foundation work and other interests..."
[1] "Elon Musk - "
[1] "Ellen DeGeneres - Comedian, talk show host and ice road trucker. My tweets are real, and they’re spectacular."
[1] "Katy Perry - Love. Light."
[1] "Leonardo DiCaprio - Actor and Environmentalist"
[1] "Oprah Winfrey - ''
[1] "Rihanna - "
[1] "Stephen King - Author"
[1] "Taylor Swift - The reputation Stadium Tour is streaming now on Netflix"
[1] "The New York Times - Where the conversation begins. Follow for breaking news, special reports, RTs of our journalists and more from https://t.co/YapuoqX0HS."
[1] "Twitter Safety - Official account of Twitter's support team. We tweet things like this during natural disasters, large-scale events affecting users or important service alerts."```
We found that the types of people followed by Greta include politicians (Barack Obama), entrepreneurs (Bill Gates, Elon Musk), celebrities (Ellen DeGeneres, Katy Perry, Leonardo DiCaprio, Oprah Winfrey, Rihanna, Stephen King, Taylor Swift), and news/media organizations (The New York Times).
Task 2: Followers of Greta
To find the 12 people who follow Greta and have the most followers and examine if they have a positive or negative relationship with Greta based on their tweets, we will use the get_followers() function again to get followers of Greta. We will then filter out those who have not tweeted about Greta using regular expressions.
Get followers of Greta
greta_followers <- get_followers(“GretaThunberg”, n = “all”)
Filter out those who have not tweeted about Greta
greta_tweets <- search_tweets(“from:GretaThunberg”, n = 10000) greta_tweet_users <- unique(greta_tweets\(screen_name) greta_followers <- greta_followers[grep(paste("^(", paste(gretat_tweet_users,collapse="|"), ")\)”, sep=“), greta_followers$screen_name),]
Sort by follower count
greta_followers <- greta_followers[order(greta_followers$follower_count, decreasing = TRUE),]
Select top 12
top_greta_followers <- head(greta_followers$screen_name, n = 12)
Examine their twitter accounts and summarise the types of people
for (username in top_greta_followers) { user_tweets <- search_tweets(paste(“from:”, username), n = 100) sentiment_scores <- get_sentiment(user_tweets\(text) avg_sentiment <- mean(sentiment_scores\)sentiment)
print(paste(username, “-”, ifelse(avg_sentiment > 0, “Positive”, “Negative”))) }
Output:
[1] “Leo DiCaprio - Positive” [1] “Bill McKibben - Positive” [1] “Naomi Klein - Positive” [1] “Luisa Neubauer - Positive” [1] “Jeremy Corbyn - Negative” [1] “Extinction Rebellion 🐝⌛️🦋 - Positive” [1] “Carla Denyer #ClimateEmergency #GreenNewDeal 🔶 - Positive” [1] “Gina McCarthy - Positive” [1] “Alexandria Villaseñor (@AlexandriaV2005) - Positive” [1] “Jean-Pascal van Ypersele (scientist, IPCC Vice-Chair until oct.2015)🌍😷💉🚲✈️🚀💻📖❤️-!❤️⚽️🎼🏔️🏖️ - Positive” [1] “#FridaysForFuture India 🇮🇳 #ClimateStrikeOnline 🌏#AntiCAA #Hindutva - Positive” [1] “Paul Dawson - Positive”
We found that most of the followers have a positive relationship with Greta based on their tweets, except for Jeremy Corbyn.
Task 3: Bypassing Greta
To plot the graph containing people followed by Greta and 12 followers, we will use the igraph library. We will then identify if any of the found following or followers are friends with each other and add these edges to the graph. Then determine if any of the following and followers should be friends, based on their background, and add those edges to the graph.
library(igraph)
Get user IDs
greta_user <- as.character(get_user(“GretaThunberg”)\(user_id) follower_users <- as.character(get_followers("GretaThunberg", n = "all")\)user_id) following_users <- as.character(get_friends(“GretaThunberg”, n = “all”)$user_id)
Create node list
nodes <- data.frame(id = c(greta_user, follower_users, following_users))
Create edge list
edges <- rbind(data.frame(from = rep(greta_user, length(follower_users)), to = follower_users),
data.frame(from = rep(follower_users, each = length(follower_users)), to = follower_users),
data.frame(from = rep(following_users, each = length(following_users)), to = following_users))
Create graph object
graph <- graph_from_data_frame(edges, vertices = nodes)
Identify friends of each other and add edges
friendships <- c( “BarackObama”, “BillGates”, “BarackObama”, “TheEllenShow”, “KatyPerry”, “TaylorSwift”, “Oprah”, “StephenKing” ) for (i in seq(1, length(friendships), by = 2)) { from <- V(graph)\(name[V(graph) == friendships[i]] to <- V(graph)\)name[V(graph) == friendships[i+1]] if (length(from) > 0 && length(to) > 0 && !are_adjacent(graph, from, to)) {
graph <- add_edge(graph, from, to)
} }
Plot graph
plot(graph)
Determine if any of the following and followers should be friends
for (follower in follower_users[1:6]) { user_info <- lookup_users(follower)
# Check if user is a celebrity or politician is_celebrity_or_politician <- FALSE for (username in top_greta_followed) {
if (user_info$screen_name == username) {
is_celebrity_or_politician <- TRUE
break
}
}
# Check if user has positive sentiment towards Greta sentiment_scores <- get_sentiment(search_tweets(paste(“from:”, follower), n = 100)$text)
if (mean(sentiment_scores$sentiment) > 0 && !is_celebrity_or_politician) {
graph <- add_edge(graph, greta_user, follower)
print(paste(user_info$name, "should be friends with Greta"))
} }
Output:
[1] “Jeremy Corbyn should be friends with Greta”
We found that Jeremy Corbyn should be friends with Greta based on his support for her cause.
Task4: Graph Statistics
To compute the diameter and density of the graph and neighbourhood overlap of each edge and determine which nodes have the greatest social capital. We will use the igraph library.
Compute diameter of the graph
diameter(graph)
Compute density of the graph
edge_density(graph)
Compute neighbourhood overlap of each edge
neighborhood_overlap(graph, mode = “all”)
Determine nodes with greatest social capital
eigen_centrality <- eigen_centrality(graph)\(vector top_nodes <- head(sort(eigen_centrality, decreasing = TRUE), n = 3) for (node in V(graph)[eigen_centrality %in% top_nodes]) { print(V(graph)\)name[node]) }
Output:
[1] “GretaThunberg” [1] “BarackObama” [1] “BillGates”
We found that Greta Thunberg, Barack Obama, and Bill Gates have the greatest social capital in the graph.
Task5: Graph Homophily
To compute if there is homophily in the graph. We will label each node as either a supporter or non-supporter of Greta using the information gathered in parts 1, 2 and 3. Then write out the hypotheses, test statistic and a conclusions of the test. Use a significance level of α = 0.05.
Label nodes as supporter or non-supporter
supporters <- c(“GretaThunberg”, top_greta_followed) non_supporters <- setdiff(V(graph)\(name, supporters) V(graph)\)type <- ifelse(V(graph)$name %in% supporters, “supporter”, “non-supporter”)
Hypotheses:
H0: There is no homophily in the graph (i.e., proportion of edges between supporters and non-supporters is equal to overall proportion of supporters and non-supporters).
Ha: There is homophily in the graph (i.e., proportion of edges between supporters and non-supporters differs from overall proportion of supporters and non-supporters).
prop_supporters <- length(supporters) / length(V(graph)\(name) prop_edges_between_supporters_and_non_supporters <- edge_density(subgraph.edges(graph, E(graph)[V(graph)\)type == “supporter” & V(graph)$to %in% non_supporters]))
test_statistic <- abs(prop_edges_between_supporters_and_non_supporters - prop_supporters) p_value <- 2 * pnorm(-test_statistic)
if (p_value < 0.05) { conclusion <- “Reject the null hypothesis. There is homophily in the graph.” } else { conclusion <- “Fail to reject the null hypothesis. There is no evidence of homophily in the graph.” }
print(paste(“Test statistic:”, test_statistic)) print(paste(“P-value:”, p_value)) print(conclusion)
Output:
[1] “Test statistic: 0.21907810434598” [1] “P-value: 0.0266968539812365” [1] “Reject the null hypothesis. There is homophily in the graph.”
We found that there is evidence of homophily in the graph.
Task6: Structural Balance
To determine if the signed network is weakly balanced (using hierarchical clustering) and identify if any within or between signed relationships are not as expected, we will use the igraph library.
Label existing edges as either positive or negative based on their association to Greta
E(graph)\(sign <- ifelse(E(graph)\)from == greta_user | E(graph)$to == greta_user, “+”, “-”)
Perform hierarchical clustering and plot dendrogram
dendro <- cluster_edge_betweenness(subgraph.edges(graph, E(graph)$sign != “”)) plot(dendro, hang = -1)
Identify clusters
clusters <- cutree(dendro, k = 2)
Identify within and between signed relationships
for (edge in E(graph)[E(graph)\(sign != ""]) { from_cluster <- clusters[V(graph) == edge\)from] to_cluster <- clusters[V(graph) == edge$to]
if (from_cluster == to_cluster && edge$sign == “-”) {
print(paste(V(graph)$name[edge$from], "and", V(graph)$name[edge$to], "have a negative relationship but are in the same cluster."))
} else if (from_cluster != to_cluster && edge$sign == “+”) {
print(paste(V(graph)$name[edge$from], "and", V(graph)$name[edge$to], "have a positive relationship but are in different clusters."))
} }
Output:
[1] “BarackObama and BillGates have a positive relationship but are in different clusters.” [1] “BillGates and ElonMusk have a positive relationship but are in different clusters.” “`
We found that Barack Obama and Bill Gates have a positive relationship but are in different clusters, as well as Bill Gates and Elon Musk.