Empirical example
Make sure that you have loaded the required libraries.
library(sna)
library(network)
Download and extract data
We are looking at the Lazega and others (2001) lawyers dataset, which is further described here: https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm
This dataset is available in ziped format online.
temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/LazegaLawyers.zip",temp)
advice.adj <- read.table(unz(temp, "ELadv.dat"))
attributes <- read.table(unz(temp, "ELattr.dat"))
unlink(temp)
The network advice.adj
is the \(71 \times 71\) adjacency matrix of a directed advice seeking network. It is collected using the roster method, where each member of the lawfirm is presented with a list of all members
“Here is the list of all the members of your Firm.”
and then is asked
“Think back over the past year, consider all the lawyers in your Firm. To whom did you go for basic professional advice? For instance, you want to make sure that you are handling a case right, making a proper decision, and you want to consult someone whose professional opinions are in general of great value to you. By advice I do not mean simply technical advice.”
Reading in the matrix using read.table
it is formated as a
class(advice.adj)
## [1] "data.frame"
and we want to trasform it into a matrix
advice.adj <- as.matrix(advice.adj)
The attribute matrix attributes
is a \(71 \times 8\) case by variable data.frame
head(attributes)
## V1 V2 V3 V4 V5 V6 V7 V8
## 1 1 1 1 1 31 64 1 1
## 2 2 1 1 1 32 62 2 1
## 3 3 1 1 2 13 67 1 1
## 4 4 1 1 1 31 59 2 3
## 5 5 1 1 2 31 59 1 2
## 6 6 1 1 2 29 55 1 1
From the description (https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm) of the dataset we know that the variables are the following
- seniority
- status (1=partner; 2=associate)
- gender (1=man; 2=woman)
- office (1=Boston; 2=Hartford; 3=Providence)
- years with the firm
- age
- practice (1=litigation; 2=corporate)
- law school (1: harvard, yale; 2: ucon; 3: other)
Add variable names to the data frame
names( attributes ) <- c('seniority','status','sex','office','tenure','age','practice','school')
Plot network
Plot the network and use the type of practice (litigation/corporate) to give nodes different colours.
plot( as.network(advice.adj), vertex.col = 1 + attributes[,7], vertex.cex = 2*degree(advice.adj)/max(degree(advice.adj)), vertex.border = NA )
There seems to be a clear clustering on the type of law that the lawyers practice. Let’s count the number of ties between litigation lawyers, between litigation and corporate lowyers, and between corporate lawyers
sum(advice.adj[ attributes[,7] ==1 , attributes[,7] ==1 ])# lit to lit
## [1] 420
sum(advice.adj[ attributes[,7] ==1 , attributes[,7] ==2 ])# lit to corp
## [1] 106
sum(advice.adj[ attributes[,7] ==2 , attributes[,7] ==1 ])# corp to lit
## [1] 125
sum(advice.adj[ attributes[,7] ==2 , attributes[,7] ==2 ])# corp to corp
## [1] 241
Is there any evidence of homophily (McPherson, Smith-Lovin, and Cook 2001) ?
Explaining popularity
par( mfrow = c(1,3))
plot(table(degree( advice.adj, cmode='indegree')),main='indegree')
plot(table(degree( advice.adj, cmode='outdegree')),main='outdegree')
hist(betweenness( advice.adj), main = 'betweeness')
Are there differences in betweeeness across the categorical variables
par( mfrow = c(2,3))
boxplot(betweenness( advice.adj)~attributes$status,main='Status')
boxplot(betweenness( advice.adj)~attributes$sex,main='Sex')
boxplot(betweenness( advice.adj)~attributes$office,main='Office')
boxplot(betweenness( advice.adj)~attributes$practice,main='Practice')
boxplot(betweenness( advice.adj)~attributes$school,main='School')
Are there differences in betweeeness across the numerical variables
par( mfrow = c(1,3))
plot(betweenness( advice.adj)~attributes$seniority,main='seniority')
plot(betweenness( advice.adj)~attributes$tenure,main='tenure')
plot(betweenness( advice.adj)~attributes$age,main='age')
We can regress betweeness centraliy on relevant attributes using ordinary least squares (OLS) regression
ans <- lm(betweenness( advice.adj ) ~attributes$seniority+attributes$tenure+attributes$age+attributes$sex+attributes$practice)
summary(ans)
##
## Call:
## lm(formula = betweenness(advice.adj) ~ attributes$seniority +
## attributes$tenure + attributes$age + attributes$sex + attributes$practice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -103.68 -60.36 -32.08 25.10 290.97
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 441.571 124.427 3.549 0.000725 ***
## attributes$seniority -3.697 1.383 -2.672 0.009508 **
## attributes$tenure -1.361 2.632 -0.517 0.606791
## attributes$age -5.048 2.072 -2.437 0.017573 *
## attributes$sex 3.267 29.991 0.109 0.913588
## attributes$practice -4.907 23.299 -0.211 0.833845
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.32 on 65 degrees of freedom
## Multiple R-squared: 0.1444, Adjusted R-squared: 0.07857
## F-statistic: 2.194 on 5 and 65 DF, p-value: 0.06553
Interpretation
The higher your seniority, the less central you are in the network. There is only weak evidence (significant only on the 1%-level) for older people being less central than younger people.
We can also do a simple two-group t-test to see if there is a difference in centrality between seniour and junior. Divide people into two groups
is.senior <- attributes$seniority>=mean(attributes$seniority)
table( is.senior )
## is.senior
## FALSE TRUE
## 35 36
Test difference between the groups (assuming equal variance)
t.test(betweenness( advice.adj )~ is.senior, var.equal = TRUE)
##
## Two Sample t-test
##
## data: betweenness(advice.adj) by is.senior
## t = 2.4289, df = 69, p-value = 0.01775
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 9.993601 101.868486
## sample estimates:
## mean in group FALSE mean in group TRUE
## 108.33123 52.40019
For a simple guide for performing standard statistical tests, check https://www.statmethods.net/stats/index.html.
Note both OLS and the t-test assumes that observations are independent - can you think of reasons why this might not be plausible in this case?
References
Lazega, Emmanuel, and others. 2001. The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford University Press on Demand.
McPherson, Miller, Lynn Smith-Lovin, and James M Cook. 2001. “Birds of a Feather: Homophily in Social Networks.” Annual Review of Sociology 27 (1). Annual Reviews 4139 El Camino Way, PO Box 10139, Palo Alto, CA 94303-0139, USA: 415–44.