Skip to contents

Distinct acts in the same way as in dplyr::distinct. Prior to the operation the size of the group is calculated {.count.in} and after the operation the output size {.count.out} The group {.strata} is also available (if grouped) for reporting. See dplyr::distinct().

Usage

p_distinct(
  .data,
  ...,
  .messages = "removing {.count.in-.count.out} duplicates",
  .headline = .defaultHeadline(),
  .tag = NULL
)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables in the data frame. Named arguments passed on to dplyr::distinct

.keep_all

If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.

.messages

a set of glue specs. The glue code can use any global variable, or {.strata},{.count.in},and {.count.out}

.headline

a headline glue spec. The glue code can use any global variable, or {.strata},{.count.in},and {.count.out}

.tag

if you want the summary data from this step in the future then give it a name with .tag.

Value

the .data dataframe with distinct values and history graph updated.

See also

dplyr::distinct()

Examples

library(dplyr)
library(dtrackr)

tmp = bind_rows(iris %>% track(), iris %>% track() %>% filter(Petal.Length > 5))
tmp %>% group_by(Species) %>% distinct() %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "removing 0 duplicates"
#> ├ [Species:versicolor]: "Species:versicolor", "removing 1 duplicates"
#> └ [Species:virginica]: "Species:virginica", "removing 42 duplicates"