dplyr modifying operations

See dplyr::mutate(), dplyr::add_count(), dplyr::add_tally(), dplyr::transmute(), dplyr::select(), dplyr::relocate(), dplyr::rename() dplyr::rename_with(), dplyr::arrange() for more details on underlying functions. dtrackr provides equivalent functions for mutating, selecting and renaming a data set which act in the same way as dplyr. mutate / select / rename generally don't add anything in terms of provenance of data so the default behaviour is to miss these out of the dtrackr history. This can be overridden with the .messages, or .headline values in which case they behave just like a comment().

Usage

p_arrange(.data, ..., .messages = "", .headline = "", .tag = NULL)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Name-value pairs. The name gives the name of the column in the output.

The value can be:

A vector of length 1, which will be recycled to the correct length.
A vector the same length as the current group (or the whole data frame if ungrouped).
NULL, to remove the column.
A data frame or tibble, to create multiple columns in the output.

Named arguments passed on to dplyr::arrange

.by_group

If TRUE, will sort first by grouping variable. Applies to grouped data frames only.

.locale

The locale to sort character vectors in.

If NULL, the default, uses the "C" locale unless the dplyr.legacy_locale global option escape hatch is active. See the dplyr-locale help page for more details.
If a single string from stringi::stri_locale_list() is supplied, then this will be used as the locale to sort with. For example, "en" will sort with the American English locale. This requires the stringi package.
If "C" is supplied, then character vectors will always be sorted in the C locale. This does not require stringi and is often much faster than supplying a locale identifier.

The C locale is not the same as English locales, such as "en", particularly when it comes to data containing a mix of upper and lower case letters. This is explained in more detail on the locale help page under the Default locale section.

.messages

a set of glue specs. The glue code can use any global variable, grouping variable, {.new_cols} or {.dropped_cols} for changes to columns, {.cols} for the output column names, or {.strata}. Defaults to nothing.

.headline

a headline glue spec. The glue code can use any global variable, grouping variable, {.new_cols}, {.dropped_cols}, {.cols} or {.strata}. Defaults to nothing.

.tag

if you want the summary data from this step in the future then give it a name with .tag.

Value

the .data dataframe after being modified by the dplyr equivalent function, but with the history graph updated with a new stage if the .messages or .headline parameter is not empty.

Examples

library(dplyr)
library(dtrackr)

# mutate and other functions are unitary operations that generally change
# the structure but not size of a dataframe. In dtrackr these are by ignored
# by default but we can change that so that their behaviour is obvious.

# arrange
# In this case we sort the data descending and show the first value
# is the same as the maximum value.
iris %>%
  track() %>%
  arrange(
    desc(Petal.Width),
    .messages="{.count} items, columns: {.cols}",
    .headline="Reordered dataframe:") %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "Reordered dataframe:", "150 items, columns: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species"

Usage

Arguments

Value

See also

Examples