Grouping a data set acts in the normal way. When tracking a dataframe
sometimes a group_by()
operation will create a lot of groups. This happens
for example if you are doing a group_by()
, summarise()
step that is
aggregating data on a fine scale, e.g. by day in a time-series. This is
generally a terrible idea when tracking a dataframe as the resulting
flowchart will have many many branches and be illegible. dtrackr
will detect this issue and
pause tracking the dataframe with a warning. It is up to the user to the
resume()
tracking when the large number of groups have been resolved e.g.
using a dplyr::ungroup()
. This limit is configurable with
options("dtrackr.max_supported_groupings"=XX)
. The default is 16. See
dplyr::group_by()
.
Usage
p_group_by(
.data,
...,
.messages = "stratify by {.cols}",
.headline = NULL,
.tag = NULL,
.maxgroups = .defaultMaxSupportedGroupings()
)
Arguments
- .data
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
- ...
In
group_by()
, variables or computations to group by. Computations are always done on the ungrouped data frame. To perform computations on the grouped data, you need to use a separatemutate()
step before thegroup_by()
. Computations are not allowed innest_by()
. Inungroup()
, variables to remove from the grouping. Named arguments passed on todplyr::group_by
.add
When
FALSE
, the default,group_by()
will override existing groups. To add to the existing groups, use.add = TRUE
.This argument was previously called
add
, but that prevented creating a new grouping variable calledadd
, and conflicts with our naming conventions..drop
Drop groups formed by factor levels that don't appear in the data? The default is
TRUE
except when.data
has been previously grouped with.drop = FALSE
. Seegroup_by_drop_default()
for details.x
A
tbl()
- .messages
a set of glue specs. The glue code can use any global variable, or {.cols} which is the columns that are being grouped by.
- .headline
a headline glue spec. The glue code can use any global variable, or {.cols}.
- .tag
if you want the summary data from this step in the future then give it a name with .tag.
- .maxgroups
the maximum number of subgroups allowed before the tracking is paused.
Examples
library(dplyr)
library(dtrackr)
tmp = iris %>% track() %>% group_by(Species, .messages="stratify by {.cols}")
tmp %>% comment("{.strata}") %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "Species:setosa"
#> ├ [Species:versicolor]: "Species:versicolor", "Species:versicolor"
#> └ [Species:virginica]: "Species:virginica", "Species:virginica"