Distinct acts in the same way as in dplyr::distinct
. Prior to the operation
the size of the group is calculated {.count.in} and after the operation the
output size {.count.out} The group {.strata} is also available (if
grouped) for reporting. See dplyr::distinct()
.
Usage
# S3 method for class 'trackr_df'
distinct(
.data,
...,
.messages = "removing {.count.in-.count.out} duplicates",
.headline = .defaultHeadline(),
.tag = NULL
)
Arguments
- .data
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
- ...
<
data-masking
> Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables in the data frame. Named arguments passed on todplyr::distinct
.keep_all
If
TRUE
, keep all variables in.data
. If a combination of...
is not distinct, this keeps the first row of values.
- .messages
a set of glue specs. The glue code can use any global variable, or {.strata},{.count.in},and {.count.out}
- .headline
a headline glue spec. The glue code can use any global variable, or {.strata},{.count.in},and {.count.out}
- .tag
if you want the summary data from this step in the future then give it a name with .tag.
Examples
library(dplyr)
library(dtrackr)
tmp = bind_rows(iris %>% track(), iris %>% track() %>% filter(Petal.Length > 5))
tmp %>% group_by(Species) %>% distinct() %>% history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Species:setosa", "removing 0 duplicates"
#> ├ [Species:versicolor]: "Species:versicolor", "removing 1 duplicates"
#> └ [Species:virginica]: "Species:virginica", "removing 42 duplicates"