Empirical example

Make sure that you have loaded the required libraries.

library(sna)
library(network)

Download and extract data

We are looking at the Lazega and others (2001) lawyers dataset, which is further described here: https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm

This dataset is available in ziped format online.

temp <- tempfile()
download.file("https://www.stats.ox.ac.uk/~snijders/siena/LazegaLawyers.zip",temp)
advice.adj <- read.table(unz(temp, "ELadv.dat"))
attributes <- read.table(unz(temp, "ELattr.dat"))
unlink(temp)

The network advice.adj is the \(71 \times 71\) adjacency matrix of a directed advice seeking network. It is collected using the roster method, where each member of the lawfirm is presented with a list of all members

“Here is the list of all the members of your Firm.”

and then is asked

“Think back over the past year, consider all the lawyers in your Firm. To whom did you go for basic professional advice? For instance, you want to make sure that you are handling a case right, making a proper decision, and you want to consult someone whose professional opinions are in general of great value to you. By advice I do not mean simply technical advice.”

Reading in the matrix using read.table it is formated as a

class(advice.adj)

## [1] "data.frame"

and we want to trasform it into a matrix

advice.adj <- as.matrix(advice.adj)

The attribute matrix attributes is a \(71 \times 8\) case by variable data.frame

head(attributes)

##   V1 V2 V3 V4 V5 V6 V7 V8
## 1  1  1  1  1 31 64  1  1
## 2  2  1  1  1 32 62  2  1
## 3  3  1  1  2 13 67  1  1
## 4  4  1  1  1 31 59  2  3
## 5  5  1  1  2 31 59  1  2
## 6  6  1  1  2 29 55  1  1

From the description (https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm) of the dataset we know that the variables are the following

seniority
status (1=partner; 2=associate)
gender (1=man; 2=woman)
office (1=Boston; 2=Hartford; 3=Providence)
years with the firm
age
practice (1=litigation; 2=corporate)
law school (1: harvard, yale; 2: ucon; 3: other)

Add variable names to the data frame

names( attributes ) <- c('seniority','status','sex','office','tenure','age','practice','school')

Plot network

Plot the network and use the type of practice (litigation/corporate) to give nodes different colours.

plot( as.network(advice.adj), vertex.col = 1 + attributes[,7], vertex.cex = 2*degree(advice.adj)/max(degree(advice.adj)), vertex.border = NA )

There seems to be a clear clustering on the type of law that the lawyers practice. Let’s count the number of ties between litigation lawyers, between litigation and corporate lowyers, and between corporate lawyers

sum(advice.adj[ attributes[,7] ==1  , attributes[,7] ==1 ])# lit to lit

## [1] 420

sum(advice.adj[ attributes[,7] ==1  , attributes[,7] ==2 ])# lit to corp

## [1] 106

sum(advice.adj[ attributes[,7] ==2  , attributes[,7] ==1 ])# corp to lit

## [1] 125

sum(advice.adj[ attributes[,7] ==2  , attributes[,7] ==2 ])# corp to corp

## [1] 241

Is there any evidence of homophily (McPherson, Smith-Lovin, and Cook 2001) ?

Explaining popularity

par( mfrow = c(1,3))
plot(table(degree( advice.adj, cmode='indegree')),main='indegree')
plot(table(degree( advice.adj, cmode='outdegree')),main='outdegree')
hist(betweenness( advice.adj), main = 'betweeness')

Are there differences in betweeeness across the categorical variables

par( mfrow = c(2,3))
boxplot(betweenness( advice.adj)~attributes$status,main='Status')
boxplot(betweenness( advice.adj)~attributes$sex,main='Sex')
boxplot(betweenness( advice.adj)~attributes$office,main='Office')
boxplot(betweenness( advice.adj)~attributes$practice,main='Practice')
boxplot(betweenness( advice.adj)~attributes$school,main='School')

Are there differences in betweeeness across the numerical variables

par( mfrow = c(1,3))
plot(betweenness( advice.adj)~attributes$seniority,main='seniority')
plot(betweenness( advice.adj)~attributes$tenure,main='tenure')
plot(betweenness( advice.adj)~attributes$age,main='age')

We can regress betweeness centraliy on relevant attributes using ordinary least squares (OLS) regression

ans <- lm(betweenness( advice.adj ) ~attributes$seniority+attributes$tenure+attributes$age+attributes$sex+attributes$practice)
summary(ans)

## 
## Call:
## lm(formula = betweenness(advice.adj) ~ attributes$seniority + 
##     attributes$tenure + attributes$age + attributes$sex + attributes$practice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -103.68  -60.36  -32.08   25.10  290.97 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           441.571    124.427   3.549 0.000725 ***
## attributes$seniority   -3.697      1.383  -2.672 0.009508 ** 
## attributes$tenure      -1.361      2.632  -0.517 0.606791    
## attributes$age         -5.048      2.072  -2.437 0.017573 *  
## attributes$sex          3.267     29.991   0.109 0.913588    
## attributes$practice    -4.907     23.299  -0.211 0.833845    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.32 on 65 degrees of freedom
## Multiple R-squared:  0.1444, Adjusted R-squared:  0.07857 
## F-statistic: 2.194 on 5 and 65 DF,  p-value: 0.06553

Interpretation

The higher your seniority, the less central you are in the network. There is only weak evidence (significant only on the 1%-level) for older people being less central than younger people.

We can also do a simple two-group t-test to see if there is a difference in centrality between seniour and junior. Divide people into two groups

is.senior <- attributes$seniority>=mean(attributes$seniority)
table( is.senior  )

## is.senior
## FALSE  TRUE 
##    35    36

Test difference between the groups (assuming equal variance)

t.test(betweenness( advice.adj )~ is.senior, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  betweenness(advice.adj) by is.senior
## t = 2.4289, df = 69, p-value = 0.01775
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##    9.993601 101.868486
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##           108.33123            52.40019

For a simple guide for performing standard statistical tests, check https://www.statmethods.net/stats/index.html.

Note both OLS and the t-test assumes that observations are independent - can you think of reasons why this might not be plausible in this case?

References

Lazega, Emmanuel, and others. 2001. The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford University Press on Demand.

McPherson, Miller, Lynn Smith-Lovin, and James M Cook. 2001. “Birds of a Feather: Homophily in Social Networks.” Annual Review of Sociology 27 (1). Annual Reviews 4139 El Camino Way, PO Box 10139, Palo Alto, CA 94303-0139, USA: 415–44.