dplyr modifying operations

See dplyr::mutate(), dplyr::add_count(), dplyr::add_tally(), dplyr::transmute(), dplyr::select(), dplyr::relocate(), dplyr::rename() dplyr::rename_with(), dplyr::arrange() for more details on underlying functions. dtrackr provides equivalent functions for mutating, selecting and renaming a data set which act in the same way as dplyr. mutate / select / rename generally don't add anything in terms of provenance of data so the default behaviour is to miss these out of the dtrackr history. This can be overridden with the .messages, or .headline values in which case they behave just like a comment().

Usage

# S3 method for class 'trackr_df'
add_count(x, ..., .messages = "", .headline = "", .tag = NULL)

# S3 method for class 'trackr_df'
add_count(x, ..., .messages = "", .headline = "", .tag = NULL)

Arguments

x

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

...

<data-masking> Variables to group by. Named arguments passed on to dplyr::add_count

wt

<data-masking> Frequency weights. Can be NULL or a variable:

If NULL (the default), counts the number of rows in each group.
If a variable, computes sum(wt) for each group.

sort

If TRUE, will show the largest groups at the top.

name

The name of the new column in the output.

If omitted, it will default to n. If there's already a column called n, it will use nn. If there's a column called n and nn, it'll use nnn, and so on, adding ns until it gets a new name.

.drop

Handling of factor levels that don't appear in the data, passed on to group_by().

For count(): if FALSE will include counts for empty groups (i.e. for levels of factors that don't exist in the data).

For add_count(): deprecated since it can't actually affect the output.

Named arguments passed on to tidyr::unnest

data

A data frame.

cols

<tidy-select> List-columns to unnest.

When selecting multiple columns, values from the same row will be recycled to their common size.

keep_empty

By default, you get one row of output for each element of the list that you are unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame or vector), then that entire row will be dropped from the output. If you want to preserve all rows, use keep_empty = TRUE to replace size-0 elements with a single row of missing values.

ptype

Optionally, a named list of column name-prototype pairs to coerce cols to, overriding the default that will be guessed from combining the individual values. Alternatively, a single empty ptype can be supplied, which will be applied to all cols.

names_sep

If NULL, the default, the outer names will come from the inner names. If a string, the outer names will be formed by pasting together the outer and the inner column names, separated by names_sep.

names_repair

Used to check that output data frame has valid names. Must be one of the following options:

"minimal": no name repair or checks, beyond basic existence,
"unique": make sure names are unique and not empty,
"check_unique": (the default), no name repair, but check they are unique,
"universal": make the names unique and syntactic
a function: apply custom name repair.
tidyr_legacy: use the name repair from tidyr 0.8.
a formula: a purrr-style anonymous function (see rlang::as_function())

See vctrs::vec_as_names() for more details on these terms and the strategies used to enforce them.

.drop,.preserve

: all list-columns are now preserved; If there are any that you don't want in the output use select() to remove them prior to unnesting.

.id

: convert df %>% unnest(x, .id = "id") to df %>% mutate(id = names(x)) %>% unnest(x)).

.sep

: use names_sep instead.

.messages

a set of glue specs. The glue code can use any global variable, grouping variable, {.new_cols} or {.dropped_cols} for changes to columns, {.cols} for the output column names, or {.strata}. Defaults to nothing.

.headline

a headline glue spec. The glue code can use any global variable, grouping variable, {.new_cols}, {.dropped_cols}, {.cols} or {.strata}. Defaults to nothing.

.tag

if you want the summary data from this step in the future then give it a name with .tag.

Value

the .data dataframe after being modified by the dplyr equivalent function, but with the history graph updated with a new stage if the .messages or .headline parameter is not empty.

Examples

library(dplyr)
library(dtrackr)

# mutate and other functions are unitary operations that generally change
# the structure but not size of a dataframe. In dtrackr these are by ignored
# by default but we can change that so that their behaviour is obvious.

# add_count
# adding in a count or tally column as a new column
iris %>%
  track() %>%
  add_count(Species, name="new_count_total",
            .messages="{.new_cols}",
            # .messages="{.cols}",
            .headline="New columns from add_count:") %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "New columns from add_count:", "new_count_total"

# add_tally
iris %>%
  track() %>%
  group_by(Species) %>%
  dtrackr::add_tally(wt=Petal.Length, name="new_tally_total",
            .messages="{.new_cols}",
            .headline="New columns from add_tally:") %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [setosa]: "New columns from add_tally:", "new_tally_total"
#> ├ [versicolor]: "New columns from add_tally:", "new_tally_total"
#> └ [virginica]: "New columns from add_tally:", "new_tally_total"

Usage

Arguments

Value

See also

Examples