In the middle of a pipeline you may wish to document something about the data
that is more complex than the simple counts. status
is essentially a
dplyr
summarisation step which is connected to a glue
specification
output, that is recorded in the data frame history. This means you can do an
arbitrary interim summarisation and put the result into the flowchart without
disrupting the pipeline flow.
Usage
p_status(
.data,
...,
.messages = .defaultMessage(),
.headline = .defaultHeadline(),
.type = "info",
.asOffshoot = FALSE,
.tag = NULL
)
Arguments
- .data
a dataframe which may be grouped
- ...
any normal dplyr::summarise specification, e.g.
count=n()
orav=mean(x)
, etcetera.- .messages
a character vector of glue specifications. A glue specification can refer to the summary outputs, any grouping variables of .data, the {.strata}, or any variables defined in the calling environment
- .headline
a glue specification which can refer to grouping variables of .data, or any variables defined in the calling environment
- .type
one of "info","exclusion": used to define formatting
- .asOffshoot
do you want this comment to be an offshoot of the main flow (default = FALSE).
- .tag
if you want the summary data from this step in the future then give it a name with .tag.
Value
the same .data dataframe with the history metadata updated with the status inserted as a new stage
Examples
library(dplyr)
library(dtrackr)
tmp = iris %>% track() %>% group_by(Species)
tmp %>% status(
long = p_count_if(Petal.Length>5),
short = p_count_if(Petal.Length<2),
.messages="{Species}: {long} long ones & {short} short ones"
) %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "setosa: 0 long ones & 50 short ones"
#> ├ [Species:versicolor]: "Species:versicolor", "versicolor: 1 long ones & 0 short ones"
#> └ [Species:virginica]: "Species:virginica", "virginica: 41 long ones & 0 short ones"