In the middle of a pipeline you may wish to document something about the data
that is more complex than the simple counts. status
is essentially a
dplyr
summarisation step which is connected to a glue
specification
output, that is recorded in the data frame history. This means you can do an
arbitrary interim summarisation and put the result into the flowchart without
disrupting the pipeline flow.
status(
.data,
...,
.messages = .defaultMessage(),
.headline = .defaultHeadline(),
.type = "info",
.asOffshoot = FALSE,
.tag = NULL
)
a dataframe which may be grouped
any normal dplyr::summarise specification, e.g. count=n()
or
av=mean(x)
, etcetera.
a character vector of glue specifications. A glue specification can refer to the summary outputs, any grouping variables of .data, the {.strata}, or any variables defined in the calling environment
a glue specification which can refer to grouping variables of .data, or any variables defined in the calling environment
one of "info","exclusion": used to define formatting
do you want this comment to be an offshoot of the main flow (default = FALSE).
if you want the summary data from this step in the future then give it a name with .tag.
the same .data dataframe with the history metadata updated with the status inserted as a new stage
Because of the ... summary specification parameters MUST BE NAMED.
library(dplyr)
library(dtrackr)
tmp = iris %>% track() %>% group_by(Species)
tmp %>% status(
long = p_count_if(Petal.Length>5),
short = p_count_if(Petal.Length<2),
.messages="{Species}: {long} long ones & {short} short ones"
) %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "setosa: 0 long ones & 50 short ones"
#> ├ [Species:versicolor]: "Species:versicolor", "versicolor: 1 long ones & 0 short ones"
#> └ [Species:virginica]: "Species:virginica", "virginica: 41 long ones & 0 short ones"