Slice operations behave as in dplyr, except the history graph can be updated with
tracked dataframe with the before and after sizes of the dataframe.
See dplyr::slice()
, dplyr::slice_head()
, dplyr::slice_tail()
,
dplyr::slice_min()
, dplyr::slice_max()
, dplyr::slice_sample()
,
for more details on the underlying functions.
p_slice_max(
.data,
...,
.messages = c("{.count.in} before", "{.count.out} after"),
.headline = "slice data"
)
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
Arguments passed on to dplyr::slice_max
n,prop
Provide either n
, the number of rows, or prop
, the
proportion of rows to select. If neither are supplied, n = 1
will be
used. If n
is greater than the number of rows in the group
(or prop > 1
), the result will be silently truncated to the group size.
prop
will be rounded towards zero to generate an integer number of
rows.
A negative value of n
or prop
will be subtracted from the group
size. For example, n = -2
with a group of 5 rows will select 5 - 2 = 3
rows; prop = -0.25
with 8 rows will select 8 * (1 - 0.25) = 6 rows.
order_by
<data-masking
> Variable or
function of variables to order by. To order by multiple variables, wrap
them in a data frame or tibble.
with_ties
Should ties be kept together? The default, TRUE
,
may return more rows than you request. Use FALSE
to ignore ties,
and return the first n
rows.
na_rm
Should missing values in order_by
be removed from the result?
If FALSE
, NA
values are sorted to the end (like in arrange()
), so
they will only be included if there are insufficient non-missing values to
reach n
/prop
.
a set of glue specs. The glue code can use any global variable, {.count.in}, {.count.out} for the input and output dataframes sizes respectively and {.excluded} for the difference
a glue spec. The glue code can use any global variable, {.count.in}, {.count.out} for the input and output dataframes sizes respectively.
the sliced dataframe with the history graph updated.
dplyr::slice_max()
library(dplyr)
library(dtrackr)
# Subset the data by the maximum of a given value
iris %>% track() %>% group_by(Species) %>%
slice_max(prop=0.5, order_by = Sepal.Width,
.messages="{.count.out} / {.count.in} = {prop} (with ties)",
.headline="Widest 50% Sepals") %>%
history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "Widest 50% Sepals", "31 / 50 = 0.5 (with ties)"
#> ├ [Species:versicolor]: "Widest 50% Sepals", "29 / 50 = 0.5 (with ties)"
#> └ [Species:virginica]: "Widest 50% Sepals", "29 / 50 = 0.5 (with ties)"
# The narrowest 25% of the iris data set by group can be calculated in the
# slice_min() function. Recording this is a matter of tracking and
# using glue specs.
iris %>%
track() %>%
group_by(Species) %>%
slice_min(prop=0.25, order_by = Sepal.Width,
.messages="{.count.out} / {.count.in} (with ties)",
.headline="narrowest {sprintf('%1.0f',prop*100)}% {Species}") %>%
history()
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [Species:setosa]: "narrowest 25% setosa", "12 / 50 (with ties)"
#> ├ [Species:versicolor]: "narrowest 25% versicolor", "13 / 50 (with ties)"
#> └ [Species:virginica]: "narrowest 25% virginica", "19 / 50 (with ties)"