r - How to join (merge) data frames (inner, outer, left, right)? -
given 2 data frames:
df1 = data.frame(customerid = c(1:6), product = c(rep("toaster", 3), rep("radio", 3))) df2 = data.frame(customerid = c(2, 4, 6), state = c(rep("alabama", 2), rep("ohio", 1))) df1 # customerid product # 1 toaster # 2 toaster # 3 toaster # 4 radio # 5 radio # 6 radio df2 # customerid state # 2 alabama # 4 alabama # 6 ohio
how can database style, i.e., sql style, joins? is, how get:
- an inner join of
df1
,df2
:
return rows in left table have matching keys in right table. - an outer join of
df1
,df2
:
returns rows both tables, join records left have matching keys in right table. - a left outer join (or left join) of
df1
,df2
return rows left table, , rows matching keys right table. - a right outer join of
df1
,df2
return rows right table, , rows matching keys left table.
extra credit:
how can sql style select statement?
by using merge
function , optional parameters:
inner join: merge(df1, df2)
work these examples because r automatically joins frames common variable names, want specify merge(df1, df2, = "customerid")
make sure matching on fields desired. can use by.x
, by.y
parameters if matching variables have different names in different data frames.
outer join: merge(x = df1, y = df2, = "customerid", = true)
left outer: merge(x = df1, y = df2, = "customerid", all.x = true)
right outer: merge(x = df1, y = df2, = "customerid", all.y = true)
cross join: merge(x = df1, y = df2, = null)
just inner join, want explicitly pass "customerid" r matching variable. think it's best explicitly state identifiers on want merge; it's safer if input data.frames change unexpectedly , easier read later on.
Comments
Post a Comment