Apply a set of filters and summarise the actions of the filter to the dtrackr
history graph. Because of the ... filter specification, all parameters MUST BE
NAMED. The filters work in an combinatorial manner, i.e. the results EXCLUDE ALL
rows that match any of the criteria. If na.rm = TRUE
they also remove
anything that cannot be evaluated by any criteria.
Usage
p_exclude_all(
.data,
...,
.headline = .defaultHeadline(),
na.rm = FALSE,
.type = "exclusion",
.asOffshoot = TRUE,
.stage = (if (is.null(.tag)) "" else .tag),
.tag = NULL
)
Arguments
- .data
a dataframe which may be grouped
- ...
a dplyr filter specification as a set of formulae where the LHS are predicates to test the data set against, items that match any of the predicates will be excluded. The RHS is a glue specification, defining the message, to be entered in the history graph for each predicate. This can refer to grouping variables variables from the environment and {.excluded} and {.matched} or {.missing} (excluded = matched+missing), {.count} and {.total} - group and overall counts respectively, e.g. "excluding {.matched} items and {.missing} with missing values".
- .headline
a glue specification which can refer to grouping variables of .data, or any variables defined in the calling environment
- na.rm
(default FALSE) if the filter cannot be evaluated for a row count that row as missing and either exclude it (TRUE) or don't exclude it (FALSE)
- .type
default "exclusion": used to define formatting
- .asOffshoot
do you want this comment to be an offshoot of the main flow (default = TRUE).
- .stage
a name for this step in the pathway
- .tag
if you want the summary data from this step in the future then give it a name with .tag.
Value
the filtered .data dataframe with the history graph updated with the summary of excluded items as a new offshoot stage
Examples
library(dplyr)
library(dtrackr)
iris %>% track() %>% capture_exclusions() %>% exclude_all(
Petal.Length > 5 ~ "{.excluded} long ones",
Petal.Length < 2 ~ "{.excluded} short ones"
) %>% history()
#> dtrackr history:
#> number of flowchart steps: 1 (approx)
#> tags defined: <none>
#> items excluded so far: 92
#> last entry / entries:
#> └ "150 items"
# simultaneous evaluation of criteria:
data.frame(a = 1:10) %>%
track() %>%
exclude_all(
# These two criteria identify the same value and one item is excluded
a > 9 ~ "{.excluded} value > 9",
a == max(a) ~ "{.excluded} max value",
) %>%
status() %>%
history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "9 items"
# the behaviour is equivalent to the inverse of dplyr's filter function:
data.frame(a=1:10) %>%
dplyr::filter(a <= 9, a != max(a)) %>%
nrow()
#> [1] 9
# step-wise evaluation of criteria results in a different output
data.frame(a = 1:10) %>%
track() %>%
# Performing the same exclusion sequentially results in 2 items
# being excluded as the criteria no longer identify the same
# item.
exclude_all(a > 9 ~ "{.excluded} value > 9") %>%
exclude_all(a == max(a) ~ "{.excluded} max value") %>%
status() %>%
history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "8 items"
# the behaviour is equivalent to the inverse of dplyr's filter function:
data.frame(a=1:10) %>%
dplyr::filter(a <= 9) %>%
dplyr::filter(a != max(a)) %>%
nrow()
#> [1] 8