See dplyr::mutate()
, dplyr::add_count()
, dplyr::add_tally()
,
dplyr::transmute()
, dplyr::select()
, dplyr::relocate()
,
dplyr::rename()
dplyr::rename_with()
, dplyr::arrange()
for more details
on underlying functions. dtrackr
provides equivalent functions for
mutating, selecting and renaming a data set which act in the same way as
dplyr
. mutate
/ select
/ rename
generally don't add anything in terms
of provenance of data so the default behaviour is to miss these out of the
dtrackr
history. This can be overridden with the .messages
, or
.headline
values in which case they behave just like a comment()
.
Arguments
- .data
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
- ...
<
data-masking
> Name-value pairs. The name gives the name of the column in the output.The value can be:
A vector of length 1, which will be recycled to the correct length.
A vector the same length as the current group (or the whole data frame if ungrouped).
NULL
, to remove the column.A data frame or tibble, to create multiple columns in the output.
Named arguments passed on to
dplyr::mutate
.by
-
<
tidy-select
> Optionally, a selection of columns to group by for just this operation, functioning as an alternative togroup_by()
. For details and examples, see ?dplyr_by. .keep
Control which columns from
.data
are retained in the output. Grouping columns and columns created by...
are always kept."all"
retains all columns from.data
. This is the default."used"
retains only the columns used in...
to create new columns. This is useful for checking your work, as it displays inputs and outputs side-by-side."unused"
retains only the columns not used in...
to create new columns. This is useful if you generate new columns, but no longer need the columns used to generate them."none"
doesn't retain any extra columns from.data
. Only the grouping variables and columns created by...
are kept.
.before,.after
<
tidy-select
> Optionally, control where new columns should appear (the default is to add to the right hand side). Seerelocate()
for more details.
- .messages
a set of glue specs. The glue code can use any global variable, grouping variable, {.new_cols} or {.dropped_cols} for changes to columns, {.cols} for the output column names, or {.strata}. Defaults to nothing.
- .headline
a headline glue spec. The glue code can use any global variable, grouping variable, {.new_cols}, {.dropped_cols}, {.cols} or {.strata}. Defaults to nothing.
- .tag
if you want the summary data from this step in the future then give it a name with .tag.
Value
the .data
dataframe after being modified by the dplyr
equivalent
function, but with the history graph updated with a new stage if the
.messages
or .headline
parameter is not empty.
Examples
library(dplyr)
library(dtrackr)
# mutate and other functions are unitary operations that generally change
# the structure but not size of a dataframe. In dtrackr these are by ignored
# by default but we can change that so that their behaviour is obvious.
# mutate
# In this example we compare the column names of the input and the
# output to identify the new columns created by the mutate operation as
# the `.new_cols` variable
iris %>%
track() %>%
mutate(extra_col = NA_real_,
.messages="{.new_cols}",
.headline="Extra columns from mutate:") %>%
history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "Extra columns from mutate:", "extra_col"