Summarise a data set

Summarising a data set acts in the normal dplyr manner to collapse groups to individual rows. Any columns resulting from the summary can be added to the history graph. In the history this also joins any stratified branches and allows you to generate some summary statistics about the un-grouped data. See dplyr::summarise().

Usage

p_summarise(.data, ..., .messages = "", .headline = "", .tag = NULL)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Name-value pairs of summary functions. The name will be the name of the variable in the result.

The value can be:

A vector of length 1, e.g. min(x), n(), or sum(is.na(y)).
A data frame, to add multiple columns from a single expression.

Returning values with size 0 or >1 was deprecated as of 1.1.0. Please use reframe() for this instead. Named arguments passed on to dplyr::summarise

.by

<tidy-select> Optionally, a selection of columns to group by for just this operation, functioning as an alternative to group_by(). For details and examples, see ?dplyr_by.

.groups

Grouping structure of the result.

"drop_last": dropping the last level of grouping. This was the only supported option before version 1.0.0.
"drop": All levels of grouping are dropped.
"keep": Same grouping structure as .data.
"rowwise": Each row is its own group.

When .groups is not specified, it is chosen based on the number of rows of the results:

If all the results have 1 row, you get "drop_last".
If the number of rows varies, you get "keep" (note that returning a variable number of rows was deprecated in favor of reframe(), which also unconditionally drops all levels of grouping).

In addition, a message informs you of that choice, unless the result is ungrouped, the option "dplyr.summarise.inform" is set to FALSE, or when summarise() is called from a function in a package.

.messages

a set of glue specs. The glue code can use any summary variable defined in the ... parameter, or any global variable, or {.strata}

.headline

a headline glue spec. The glue code can use any summary variable defined in the ... parameter, or any global variable, or {.strata}

.tag

if you want the summary data from this step in the future then give it a name with .tag.

Value

the .data dataframe summarised with the history graph updated showing the summarise operation as a new stage

Examples

library(dplyr)
library(dtrackr)

tmp = iris %>% group_by(Species) %>% track()
tmp %>% summarise(avg = mean(Petal.Length), .messages="{avg} length") %>% history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "1.462 length"
#> ├ [Species:versicolor]: "4.26 length"
#> └ [Species:virginica]: "5.552 length"

Usage

Arguments

Value

See also

Examples