#### Revision 

This is an old revision of ResearchGraphsForGrafton made by ChrisMarcum on 2014-08-22 11:34:16.

# Graphs for Grafton

Here are some graphs for my son, Grafton.

## When is the Baby Due?

8-22-14
As of today, 8-22-14, my son is two days past his due date. So many people have a narrow view on the question of when the baby is due. They typically mean what day is the birth going to happen on. When I ask, when is the baby due I mean what time is the baby going to arrive on the day that it does arrive. To address this, I acquired some data from the CDC on every single birth reported by hospitals in 2012 (minus about 16% that had missing data). Here are some results: First, we consider the effect of parity on the moment of arrival. Along the x-axis, we have each minute of the day starting at 0 (i.e., 12:00 am) and ending at 1440 (i.e., 11:59pm). Inter-minute intervals are collapsed into the next earliest minute for consistency. The mid-point represents noon (i.e., minute 700). Along the y-axis we have the number of births associated with each minute. The different factors here represent the number of viable lifetime births the mother experienced right up to moment after her latest baby was born. A value of 1 means that this baby is her first baby. It's clear that first time mothers have a greater likelihood of laboring into the night than mothers with greater birth parity. Second, we consider the mode of delivery. The orientation to the graph is the same. Spontaneous deliveries mean that a cephalic birth without any of the other listed interventions occurred. The effect of scheduled cesarean sections is clearly evident; spontaneous labors tend to result in births later in the day. Next, we consider a measure of the relative expected arrival date. The orientation to the graph is the same. The arrival date is a categorical variable that was estimated in the following way: first, we draw a proposal date of birth (dobs are not included in the public files, just the day of week and month and time of birth) with a probability proportional to the frequency of having a baby on a particular day of the week in a particular month. Within a month, we then sample uniformly from the distribution of proposal dates. This date is then subtracted from the number of weeks in gestation (a variable contained in the data) to obtain the expected days of gestation. An on-time baby is classified if it falls within plus or minus 1 standard deviations (roughly 17 days) of the mean of this number (roughly 280) and early and late are respectively outside that interval. Finally, based on a recommendation by my friend and colleague Beck Tippet, we consider the cumulative probability of being born up to a certain point in the day (i.e., the survival of the gestation). On the y-axis, we have the probability of being born up to a particular moment (on the x-axis). When y is 0 at time 0, then 0 births have occurred and when y is 1 then all births have occurred by time 1440. The vertical dotted lines represent the fraction of births up to that time in the day. That is, vertical lines represent the 25, 50, and 75th percentiles. Interestingly, the uniform prior would have given 12 noon as the 50th pctile, but the effect of business hours, diurnal cycles, and planned births, push that up such that 50% of babies are born almost an hour later, by 12:54pm.

##### R Code to Generate These Plots
```#Graphs for Grafton
#Created by Chris Marcum <chris.marcum@nih.gov>
system("wget ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2012us.zip")
system("gunzip Nat2012us.zip")
system("mv Nat2012PublicUS.r20131217 nat2012")
system("cut -c 15-18,19-20,25-28,29,89-90,139-140,217,393,436,438-439,442-445,451-452 --output-delimiter=, nat2012 > nat2012sub.csv")

library(lubridate)
EoD2011<-c(wday(paste("2012-01",1:31,sep="-")),wday(paste("2012-02",1:28,sep="-")),wday(paste("2012-03",1:31,sep="-")),wday(paste("2012-04",1:30,sep="-")),wday(paste("2012-05",1:31,sep="-")),wday(paste("2012-06",1:30,sep="-")),wday(paste("2012-07",1:31,sep="-")),wday(paste("2012-08",1:31,sep="-")),wday(paste("2012-09",1:30,sep="-")),wday(paste("2012-10",1:31,sep="-")),wday(paste("2012-11",1:30,sep="-")),wday(paste("2012-12",1:31,sep="-")))

EoD2012<-list(wday(paste("2012-01",1:31,sep="-")),wday(paste("2012-02",1:29,sep="-")),wday(paste("2012-03",1:31,sep="-")),wday(paste("2012-04",1:30,sep="-")),wday(paste("2012-05",1:31,sep="-")),wday(paste("2012-06",1:30,sep="-")),wday(paste("2012-07",1:31,sep="-")),wday(paste("2012-08",1:31,sep="-")),wday(paste("2012-09",1:30,sep="-")),wday(paste("2012-10",1:31,sep="-")),wday(paste("2012-11",1:30,sep="-")),wday(paste("2012-12",1:31,sep="-")))

colnames(natdata)<-c("Year","Month","Time","WD","MAge","MRace","Parity","Del","BSex","MensM","MensY","Gestation")
natdata\$MRace<-as.numeric(natdata[,"MRace"]==1)
natdata\$MensY[which(natdata\$MensY==9999)]<-NA
natdata\$MensM[which(natdata\$MensM==99)]<-NA
natdata\$Gestation[which(natdata\$Gestation==99)]<-NA
natdata\$Time[which(natdata\$Time==9999)]<-NA
length(na.action(na.omit(natdata)))/nrow(natdata)
natdata<-na.omit(natdata)
natdata<-natdata[order(natdata\$Month,natdata\$WD,natdata\$Mod),]

#Convert time to ordered minutes of the day
# and estimate new gestation times
b1<-as.character(natdata\$Time)
b1.nc<-nchar(b1)
b1.st<-strsplit(b1,"")
ptab<-prop.table(table(natdata\$Month,natdata\$WD),1)
samp.unif.day<-function(x) {
sample(1:length(EoD2012[[x]]),1,prob=ptab[x,EoD2012[[x]]])
}

natdata\$DOM<-sapply(natdata\$Month,samp.unif.day)
natdata\$SDOB<-as.Date(paste(natdata\$Year,natdata\$Month,natdata\$DOM,sep="-"))
natdata\$NewGest<-difftime(as.Date(natdata\$SDOB),as.Date(natdata\$SDOB-weeks(natdata\$Gestation)))
natdata\$SDOC<-natdata\$SDOB-weeks(natdata\$Gestation)
nht<-lapply(b1.st,function(x){ if(length(x)==4){return(paste(x,x,sep=""))};if(length(x)==3){return(x)};if(length(x)<3){return(0)}})
nhm<-lapply(b1.st,function(x) paste(rev(na.omit(rev(x)[1:2])),collapse=""))
nht<-as.numeric(unlist(nht))
nhm<-as.numeric(unlist(nhm))
natdata\$Mod<-((nht*60)+nhm)
natdata\$Arrival<-ifelse(as.numeric(natdata\$NewGest)>mean(as.numeric(natdata\$NewGest))+sd(as.numeric(natdata\$NewGest)),"Late",ifelse(as.numeric(natdata\$NewGest)<mean(as.numeric(natdata\$NewGest))-sd(as.numeric(natdata\$NewGest)),"Early","On-Time"))
augpar<-natdata[which(natdata\$Month==8 & natdata\$Parity==1),]
png("ModXPar.png",800,800)
matplot(table(natdata\$Mod,natdata\$Parity),col=c(rainbow(length(unique(natdata\$Parity)))),pch=19,cex=.5,ylab="f(x)",xlab="Minute of the Day",main=paste("Aggregate Timing of Births in a Day \n by  Parity, 2012 (n=",nrow(natdata),")",sep=""))
legend("topright",legend=sort(unique(natdata\$Parity)),text.col=c(rainbow(length(unique(natdata\$Parity)))),pch=19,col=c(rainbow(length(unique(natdata\$Parity)))),title="Birth #")
dev.off()
png("ModXDel.png",800,800)
matplot(table(natdata\$Mod,natdata\$Del),col=c(rainbow(length(unique(natdata\$Del)))),pch=19,cex=.5,ylab="f(x)",xlab="Minute of the Day",main=paste("Aggregate Timing of Births in a Day \n by Mode of Delivery, 2012 (n=",nrow(natdata),")",sep=""))
legend("topright",legend=c("Spontaneous","Forceps","Vacuum","Cesarean","DK"),text.col=c(rainbow(length(unique(natdata\$Del)))),pch=19,col=c(rainbow(length(unique(natdata\$Del)))),title="Mode")
dev.off()
png("ModXArr.png",800,800)
matplot(table(natdata\$Mod,natdata\$Arrival),col=c(rainbow(length(unique(natdata\$Arrival)))),pch=19,cex=.5,ylab="f(x)",xlab="Minute of the Day",main=paste("Aggregate Timing of Births in a Day \n by Estimated Due Date, 2012 (n=",nrow(natdata),")",sep=""))
legend("topright",legend=sort(unique(natdata\$Arrival)),text.col=c(rainbow(length(unique(natdata\$Arrival)))),pch=19,col=c(rainbow(length(unique(natdata\$Arrival)))),title="Born")
dev.off()
#Cum.Freq. as recommended by B. Tippet
clp<-apply(table(natdata\$Mod,natdata\$Arrival),2,cumsum)
clp.m<-mean(apply(apply(clp,2,function(x) x/x),2,function(x) min(which(x>=.5))))
clp.l<-mean(apply(apply(clp,2,function(x) x/x),2,function(x) min(which(x>=.25))))
clp.u<-mean(apply(apply(clp,2,function(x) x/x),2,function(x) min(which(x>=.75))))
png("CFXArr.png",800,800)
matplot(apply(clp,2,function(x) x/x),col=c(rainbow(length(unique(natdata\$Arrival)))),type="l",lty=1,lwd=2,ylab="S(x)",xlab="Minute of the Day",main=paste("Cumulative `Survival` of Gestation \n by Estimated Due Date, 2012 (n=",nrow(natdata),")",sep=""))
legend("topleft",legend=sort(unique(natdata\$Arrival)),text.col=c(rainbow(length(unique(natdata\$Arrival)))),title.col="black",col=c(rainbow(length(unique(natdata\$Arrival)))),title="Born")
text(y=c(.25,.5,.75),x=c(clp.l,clp.m,clp.u),label=c("8:05am","12:54pm","5:45pm"),col="black",pos=2)
segments(y0=c(0,0,0),y1=c(.25,.5,.75),x0=c(clp.l,clp.m,clp.u),x1=c(clp.l,clp.m,clp.u),lty="dotted",col="black")
dev.off()
pm1<-lm(Mod~as.factor(Del)+I(Parity==1)+as.factor(MRace)+BSex+MAge+relevel(as.factor(Arrival),ref="On-Time")+as.factor(Month),data=natdata)
save.image("NatData.Rdata")```