12  Merging Data

flowchart LR
SM[Merge One Record/Subject Data Table With Longitudinal Data]
HM[Hmisc::Merge<br>Monitors Merging Process]
tlu[Merging for Table Look-up]

Consider a baseline dataset b and a longitudinal dataset L, with subject ID of id.

For more information see this, this, this, this and this. To merge any number of datasets at once and obtain a printed report of how the merge went, use the Hmisc Merge function.
require(Hmisc)
require(data.table)
b <- data.table(id=1:4, age=c(21, 28, 32, 23), key='id')
L <- data.table(id  = c(2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 5),
                day = c(1, 2, 3, 1, 2, 3, 4, 1, 2, 1, 2, 3),
                y    =  1 : 12, key='id')
b
   id age
1:  1  21
2:  2  28
3:  3  32
4:  4  23
L
    id day  y
 1:  2   1  1
 2:  2   2  2
 3:  2   3  3
 4:  3   1  4
 5:  3   2  5
 6:  3   3  6
 7:  3   4  7
 8:  4   1  8
 9:  4   2  9
10:  5   1 10
11:  5   2 11
12:  5   3 12
# Merge b and L to look up baseline age and associate it with all follow-ups
b[L, on=.(id)]   # Keeps all ids in L (left inner join)
    id age day  y
 1:  2  28   1  1
 2:  2  28   2  2
 3:  2  28   3  3
 4:  3  32   1  4
 5:  3  32   2  5
 6:  3  32   3  6
 7:  3  32   4  7
 8:  4  23   1  8
 9:  4  23   2  9
10:  5  NA   1 10
11:  5  NA   2 11
12:  5  NA   3 12
L[b, on=.(id)]   # Keeps all ids in b (right inner join)
    id day  y age
 1:  1  NA NA  21
 2:  2   1  1  28
 3:  2   2  2  28
 4:  2   3  3  28
 5:  3   1  4  32
 6:  3   2  5  32
 7:  3   3  6  32
 8:  3   4  7  32
 9:  4   1  8  23
10:  4   2  9  23
L[b, on=.(id), nomatch=NULL]  # Keeps only ids in both b and L (right outer join)
   id day y age
1:  2   1 1  28
2:  2   2 2  28
3:  2   3 3  28
4:  3   1 4  32
5:  3   2 5  32
6:  3   3 6  32
7:  3   4 7  32
8:  4   1 8  23
9:  4   2 9  23
uid <- unique(c(b[, id], L[, id]))
L[b[.(uid), on=.(id)]]         # Keeps ids in either b or c (full outer join)
    id day  y age
 1:  1  NA NA  21
 2:  2   1  1  28
 3:  2   2  2  28
 4:  2   3  3  28
 5:  3   1  4  32
 6:  3   2  5  32
 7:  3   3  6  32
 8:  3   4  7  32
 9:  4   1  8  23
10:  4   2  9  23
11:  5   1 10  NA
12:  5   2 11  NA
13:  5   3 12  NA
merge(b, L, by='id', all=TRUE) # also full outer join; calls merge.data.table
    id age day  y
 1:  1  21  NA NA
 2:  2  28   1  1
 3:  2  28   2  2
 4:  2  28   3  3
 5:  3  32   1  4
 6:  3  32   2  5
 7:  3  32   3  6
 8:  3  32   4  7
 9:  4  23   1  8
10:  4  23   2  9
11:  5  NA   1 10
12:  5  NA   2 11
13:  5  NA   3 12
# Use the Hmisc Merge function which reports on how the merge went
# Merge operates on both data.tables and data.frames and can handle
# any number of them.  In details printed by Merge below, #1
# refers to the first data table (b).
Merge(b, L, id= ~ id)
       Vars Obs Unique IDs IDs in #1 IDs not in #1
b         2   4          4        NA            NA
L         3  12          4         3             1
Merged    4  13          5         4             1

Number of unique IDs in any data frame : 5 
Number of unique IDs in all data frames: 3 
    id age day  y
 1:  1  21  NA NA
 2:  2  28   1  1
 3:  2  28   2  2
 4:  2  28   3  3
 5:  3  32   1  4
 6:  3  32   2  5
 7:  3  32   3  6
 8:  3  32   4  7
 9:  4  23   1  8
10:  4  23   2  9
11:  5  NA   1 10
12:  5  NA   2 11
13:  5  NA   3 12

For very large data tables, giving the data tables keys will speed execution, e.g.:

setkey(d, id)
setkey(d, state, city)

Join/merge can be used for data lookups:

s <- data.table(st=.q(AL, AK, AZ, CA, OK), y=5:1)
stateAbbrevs <- data.table(state=state.abb, State=state.name)
s[stateAbbrevs, , on=.(st=state), nomatch=NULL]
   st y      State
1: AL 5    Alabama
2: AK 4     Alaska
3: AZ 3    Arizona
4: CA 2 California
5: OK 1   Oklahoma