Grouping a data set acts in the normal way. When tracking a dataframe sometimes a group_by() operation will create a lot of groups. This happens for example if you are doing a group_by(), summarise() step that is aggregating data on a fine scale, e.g. by day in a timeseries. This is generally a terrible idea when tracking a dataframe as the resulting flowchart will have many many branches and be illegible. dtrackr will detect this issue and pause tracking the dataframe with a warning. It is up to the user to the resume() tracking when the large number of groups have been resolved e.g. using a dplyr::ungroup(). This limit is configurable with options("dtrackr.max_supported_groupings"=XX). The default is 16. See dplyr::group_by().

# S3 method for trackr_df
group_by(
  .data,
  ...,
  .messages = "stratify by {.cols}",
  .headline = NULL,
  .tag = NULL,
  .maxgroups = .defaultMaxSupportedGroupings()
)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

In group_by(), variables or computations to group by. Computations are always done on the ungrouped data frame. To perform computations on the grouped data, you need to use a separate mutate() step before the group_by(). Computations are not allowed in nest_by(). In ungroup(), variables to remove from the grouping.

.messages

a set of glue specs. The glue code can use any global variable, or {.cols} which is the columns that are being grouped by.

.headline

a headline glue spec. The glue code can use any global variable, or {.cols}.

.tag

if you want the summary data from this step in the future then give it a name with .tag.

.maxgroups

the maximum number of subgroups allowed before the tracking is paused.

Value

the .data but grouped.

See also

dplyr::group_by()

Examples

library(dplyr)
library(dtrackr)

tmp = iris %>% track() %>% group_by(Species, .messages="stratify by {.cols}")
tmp %>% comment("{.strata}") %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "Species:setosa"
#> ├ [Species:versicolor]: "Species:versicolor", "Species:versicolor"
#> └ [Species:virginica]: "Species:virginica", "Species:virginica"