The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32. Add Health is re-interviewing cohort members in a Wave V follow-up from 2016-2018 to collect social, environmental, behavioral, and biological data with which to track the emergence of chronic disease as the cohort moves through their fourth decade of life.

More info at: http://www.cpc.unc.edu/projects/addhealth

Import

This is a CSV file, so we could read it in using read.csv but I prefer the functionality of read_csv found in the readr package.

library(readr)
rawdata <- read_csv(file="AddHealth_Wave_IV.csv")

Trim down variables

Goof ups are bound to happen. Let’s rename our rawdata as mydata, and do all our data cleaning on mydata. That way if/when we goof up, we just need to run the code chunk below and it will reset mydata back to pre-recodes (and not have to read the entire CSV data set from the hard drive all over again.)

mydata <- rawdata 

We could use dplyr to %>% select() only variables we want to keep on the data set in this above code chunk, but we can also “clean up” the data set at the end before saving the clean file to disk.

Recode variables

Gender

BIO_SEX is coded as 1=male, 2=female, 6=missing. I want to recode this into an indicator of being female.

mydata$female <- mydata$BIO_SEX-1
mydata$female[mydata$BIO_SEX==6] <- NA
table(mydata$BIO_SEX, mydata$female ,useNA="always")
##       
##           0    1 <NA>
##   1    3147    0    0
##   2       0 3356    0
##   6       0    0    1
##   <NA>    0    0    0

For plotting purposes, I would also like to have a categorical version of gender.

mydata$female_c <- factor(mydata$female, labels=c("Male", "Female"))
table(mydata$female, mydata$female_c ,useNA="always")
##       
##        Male Female <NA>
##   0    3147      0    0
##   1       0   3356    0
##   <NA>    0      0    1

General Health

table(mydata$H4GH1)
## 
##    1    2    3    4    5 
##  979 1963 1683  434   55

No missing variables, but I want to apply labels and convert to a factor variable.

mydata$genhealth <- factor(mydata$H4GH1, 
                              labels = c("Excellent", "Very good", "Good", "Fair", "Poor")) 

Relationships

With how many people have you had a romantic or sexual relationship that lasted less than 6 months since 2001?

table(mydata$H4TR6)
## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
## 3049  515  424  261  176  137   74   41   46   18   95    6   23    9    6 
##   15   16   17   18   20   21   22   23   24   25   26   28   30   37   38 
##   28    1    2    3   25    2    1    1    1   11    3    4   10    1    1 
##   40   45   50   60   65   75   90   95  996  998 
##    3    1    8    4    1    1    1    7   40   74
mydata$H4TR6[mydata$H4TR6 >990] <- NA
boxplot(mydata$H4TR6)