See dplyr::mutate(), dplyr::add_count(), dplyr::add_tally(),
dplyr::transmute(), dplyr::select(), dplyr::relocate(),
dplyr::rename() dplyr::rename_with(), dplyr::arrange() for more details
on underlying functions. dtrackr provides equivalent functions for
mutating, selecting and renaming a data set which act in the same way as
dplyr. mutate / select / rename generally don't add anything in terms
of provenance of data so the default behaviour is to miss these out of the
dtrackr history. This can be overridden with the .messages, or
.headline values in which case they behave just like a comment().
Arguments
- x
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).
- ...
<
data-masking> Variables to group by. Named arguments passed on todplyr::add_countwt<
data-masking> Frequency weights. Can beNULLor a variable:If
NULL(the default), counts the number of rows in each group.If a variable, computes
sum(wt)for each group.
sortIf
TRUE, will show the largest groups at the top.nameThe name of the new column in the output.
If omitted, it will default to
n. If there's already a column calledn, it will usenn. If there's a column callednandnn, it'll usennn, and so on, addingns until it gets a new name..dropHandling of factor levels that don't appear in the data, passed on to
group_by().For
count(): ifFALSEwill include counts for empty groups (i.e. for levels of factors that don't exist in the data).For
add_count(): deprecated since it can't actually affect the output.
- .messages
a set of glue specs. The glue code can use any global variable, grouping variable, {.new_cols} or {.dropped_cols} for changes to columns, {.cols} for the output column names, or {.strata}. Defaults to nothing.
- .headline
a headline glue spec. The glue code can use any global variable, grouping variable, {.new_cols}, {.dropped_cols}, {.cols} or {.strata}. Defaults to nothing.
- .tag
if you want the summary data from this step in the future then give it a name with .tag.
Value
the .data dataframe after being modified by the dplyr equivalent
function, but with the history graph updated with a new stage if the
.messages or .headline parameter is not empty.
Examples
library(dplyr)
library(dtrackr)
# mutate and other functions are unitary operations that generally change
# the structure but not size of a dataframe. In dtrackr these are by ignored
# by default but we can change that so that their behaviour is obvious.
# add_count
# adding in a count or tally column as a new column
iris %>%
track() %>%
add_count(Species, name="new_count_total",
.messages="{.new_cols}",
# .messages="{.cols}",
.headline="New columns from add_count:") %>%
history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "New columns from add_count:", "new_count_total"
# add_tally
iris %>%
track() %>%
group_by(Species) %>%
dtrackr::add_tally(wt=Petal.Length, name="new_tally_total",
.messages="{.new_cols}",
.headline="New columns from add_tally:") %>%
history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [setosa]: "New columns from add_tally:", "new_tally_total"
#> ├ [versicolor]: "New columns from add_tally:", "new_tally_total"
#> └ [virginica]: "New columns from add_tally:", "new_tally_total"
