Mutating joins behave as dplyr
joins, except the history graph of the two
sides of the joins is merged resulting in a tracked dataframe with the
history of both input dataframes. See dplyr::left_join()
for more details
on the underlying functions.
p_left_join(
x,
y,
...,
.messages = c("{.count.lhs} on LHS", "{.count.rhs} on RHS",
"{.count.out} in linked set"),
.headline = "Left join by {.keys}"
)
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
Arguments passed on to dplyr::left_join
x,y
A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
by
A join specification created with join_by()
, or a character
vector of variables to join by.
If NULL
, the default, *_join()
will perform a natural join, using all
variables in common across x
and y
. A message lists the variables so
that you can check they're correct; suppress the message by supplying by
explicitly.
To join on different variables between x
and y
, use a join_by()
specification. For example, join_by(a == b)
will match x$a
to y$b
.
To join by multiple variables, use a join_by()
specification with
multiple expressions. For example, join_by(a == b, c == d)
will match
x$a
to y$b
and x$c
to y$d
. If the column names are the same between
x
and y
, you can shorten this by listing only the variable names, like
join_by(a, c)
.
join_by()
can also be used to perform inequality, rolling, and overlap
joins. See the documentation at ?join_by for details on
these types of joins.
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, by = c("a", "b")
joins x$a
to y$a
and x$b
to y$b
. If variable names differ between x
and y
,
use a named character vector like by = c("x_a" = "y_a", "x_b" = "y_b")
.
To perform a cross-join, generating all combinations of x
and y
, see
cross_join()
.
copy
If x
and y
are not from the same data source,
and copy
is TRUE
, then y
will be copied into the
same src as x
. This allows you to join tables across srcs, but
it is a potentially expensive operation so you must opt into it.
suffix
If there are non-joined duplicate variables in x
and
y
, these suffixes will be added to the output to disambiguate them.
Should be a character vector of length 2.
keep
Should the join keys from both x
and y
be preserved in the
output?
If NULL
, the default, joins on equality retain only the keys from x
,
while joins on inequality retain the keys from both inputs.
If TRUE
, all keys from both inputs are retained.
If FALSE
, only keys from x
are retained. For right and full joins,
the data in key columns corresponding to rows that only exist in y
are
merged into the key columns from x
. Can't be used when joining on
inequality conditions.
a set of glue specs. The glue code can use any global variable, {.keys} for the joining columns, {.count.lhs}, {.count.rhs}, {.count.out} for the input and output dataframes sizes respectively
a glue spec. The glue code can use any global variable, {.keys} for the joining columns, {.count.lhs}, {.count.rhs}, {.count.out} for the input and output dataframes sizes respectively
the join of the two dataframes with the history graph updated.
dplyr::left_join()
library(dplyr)
library(dtrackr)
# Joins across data sets
# example data uses the dplyr starways data
people = starwars %>% select(-films, -vehicles, -starships)
films = starwars %>% select(name,films) %>% tidyr::unnest(cols = c(films))
lhs = people %>% track() %>% comment("People df {.total}")
rhs = films %>% track() %>% comment("Films df {.total}") %>%
comment("a test comment")
# Left join
join = lhs %>% left_join(rhs, by="name", multiple = "all") %>% comment("joined {.total}")
# See what the history of the graph is:
join %>% history()
#> dtrackr history:
#> number of flowchart steps: 5 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "joined 173"
nrow(join)
#> [1] 173
# Display the tracked graph (not run in examples)
# join %>% flowchart()