Skip to contents

Apply a set of filters and summarise the actions of the filter to the dtrackr history graph. Because of the ... filter specification, all parameters MUST BE NAMED. The filters work in an combinatorial manner, i.e. the results EXCLUDE ALL rows that match any of the criteria. If na.rm = TRUE they also remove anything that cannot be evaluated by any criteria.

Usage

p_exclude_all(
  .data,
  ...,
  .headline = .defaultHeadline(),
  na.rm = FALSE,
  .type = "exclusion",
  .asOffshoot = TRUE,
  .stage = (if (is.null(.tag)) "" else .tag),
  .tag = NULL
)

Arguments

.data

a dataframe which may be grouped

...

a dplyr filter specification as a set of formulae where the LHS are predicates to test the data set against, items that match any of the predicates will be excluded. The RHS is a glue specification, defining the message, to be entered in the history graph for each predicate. This can refer to grouping variables variables from the environment and {.excluded} and {.matched} or {.missing} (excluded = matched+missing), {.count} and {.total} - group and overall counts respectively, e.g. "excluding {.matched} items and {.missing} with missing values".

.headline

a glue specification which can refer to grouping variables of .data, or any variables defined in the calling environment

na.rm

(default FALSE) if the filter cannot be evaluated for a row count that row as missing and either exclude it (TRUE) or don't exclude it (FALSE)

.type

default "exclusion": used to define formatting

.asOffshoot

do you want this comment to be an offshoot of the main flow (default = TRUE).

.stage

a name for this step in the pathway

.tag

if you want the summary data from this step in the future then give it a name with .tag.

Value

the filtered .data dataframe with the history graph updated with the summary of excluded items as a new offshoot stage

Examples

library(dplyr)
library(dtrackr)

iris %>% track() %>% capture_exclusions() %>% exclude_all(
      Petal.Length > 5 ~ "{.excluded} long ones",
      Petal.Length < 2 ~ "{.excluded} short ones"
) %>% history()
#> dtrackr history:
#> number of flowchart steps: 1 (approx)
#> tags defined: <none>
#> items excluded so far: 92
#> last entry / entries:
#> └ "150 items"


# simultaneous evaluation of criteria:
data.frame(a = 1:10) %>%
  track() %>%
  exclude_all(
    # These two criteria identify the same value and one item is excluded
    a > 9 ~ "{.excluded} value > 9",
    a == max(a) ~ "{.excluded} max value",
  ) %>%
  status() %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "9 items"

# the behaviour is equivalent to the inverse of dplyr's filter function:
data.frame(a=1:10) %>%
  dplyr::filter(a <= 9, a != max(a)) %>%
  nrow()
#> [1] 9

# step-wise evaluation of criteria results in a different output
data.frame(a = 1:10) %>%
  track() %>%
  # Performing the same exclusion sequentially results in 2 items
  # being excluded as the criteria no longer identify the same
  # item.
  exclude_all(a > 9 ~ "{.excluded} value > 9") %>%
  exclude_all(a == max(a) ~ "{.excluded} max value") %>%
  status() %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "8 items"

# the behaviour is equivalent to the inverse of dplyr's filter function:
data.frame(a=1:10) %>%
  dplyr::filter(a <= 9) %>%
  dplyr::filter(a != max(a)) %>%
  nrow()
#> [1] 8