Grouping a data set acts in the normal way. When tracking a dataframe
sometimes a group_by()
operation will create a lot of groups. This happens
for example if you are doing a group_by()
, summarise()
step that is
aggregating data on a fine scale, e.g. by day in a timeseries. This is
generally a terrible idea when tracking a dataframe as the resulting
flowchart will have many many branches and be illegible. dtrackr
will detect this issue and
pause tracking the dataframe with a warning. It is up to the user to the
resume()
tracking when the large number of groups have been resolved e.g.
using a dplyr::ungroup()
. This limit is configurable with
options("dtrackr.max_supported_groupings"=XX)
. The default is 16. See
dplyr::group_by()
.
# S3 method for trackr_df
group_by(
.data,
...,
.messages = "stratify by {.cols}",
.headline = NULL,
.tag = NULL,
.maxgroups = .defaultMaxSupportedGroupings()
)
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
In group_by()
, variables or computations to group by.
Computations are always done on the ungrouped data frame.
To perform computations on the grouped data, you need to use
a separate mutate()
step before the group_by()
.
Computations are not allowed in nest_by()
.
In ungroup()
, variables to remove from the grouping.
a set of glue specs. The glue code can use any global variable, or {.cols} which is the columns that are being grouped by.
a headline glue spec. The glue code can use any global variable, or {.cols}.
if you want the summary data from this step in the future then give it a name with .tag.
the maximum number of subgroups allowed before the tracking is paused.
the .data but grouped.
dplyr::group_by()
library(dplyr)
library(dtrackr)
tmp = iris %>% track() %>% group_by(Species, .messages="stratify by {.cols}")
tmp %>% comment("{.strata}") %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "Species:setosa"
#> ├ [Species:versicolor]: "Species:versicolor", "Species:versicolor"
#> └ [Species:virginica]: "Species:virginica", "Species:virginica"