A drop in replacement for tidyr::unnest()
which optionally takes a message
and headline to store in the history graph. Older versions of tidyr::unnest
can throw an error if .messages
is more than 1 item long and in that case
use the dtrackr
specific p_unnest
will work instead.
Usage
# S3 method for class 'trackr_df'
unnest(
data,
cols,
...,
keep_empty = FALSE,
ptype = NULL,
names_sep = NULL,
names_repair = "check_unique",
.drop = deprecated(),
.id = deprecated(),
.sep = deprecated(),
.preserve = deprecated(),
.messages = "",
.headline = ""
)
Arguments
- data
A data frame.
- cols
<
tidy-select
> List-columns to unnest.When selecting multiple columns, values from the same row will be recycled to their common size.
- ...
: previously you could write
df %>% unnest(x, y, z)
. Convert todf %>% unnest(c(x, y, z))
. If you previously created a new variable inunnest()
you'll now need to do it explicitly withmutate()
. Convertdf %>% unnest(y = fun(x, y, z))
todf %>% mutate(y = fun(x, y, z)) %>% unnest(y)
.- keep_empty
By default, you get one row of output for each element of the list that you are unchopping/unnesting. This means that if there's a size-0 element (like
NULL
or an empty data frame or vector), then that entire row will be dropped from the output. If you want to preserve all rows, usekeep_empty = TRUE
to replace size-0 elements with a single row of missing values.- ptype
Optionally, a named list of column name-prototype pairs to coerce
cols
to, overriding the default that will be guessed from combining the individual values. Alternatively, a single empty ptype can be supplied, which will be applied to allcols
.- names_sep
If
NULL
, the default, the outer names will come from the inner names. If a string, the outer names will be formed by pasting together the outer and the inner column names, separated bynames_sep
.- names_repair
Used to check that output data frame has valid names. Must be one of the following options:
"minimal
": no name repair or checks, beyond basic existence,"unique
": make sure names are unique and not empty,"check_unique
": (the default), no name repair, but check they are unique,"universal
": make the names unique and syntactica function: apply custom name repair.
tidyr_legacy: use the name repair from tidyr 0.8.
a formula: a purrr-style anonymous function (see
rlang::as_function()
)
See
vctrs::vec_as_names()
for more details on these terms and the strategies used to enforce them.- .drop, .preserve
: all list-columns are now preserved; If there are any that you don't want in the output use
select()
to remove them prior to unnesting.- .id
: convert
df %>% unnest(x, .id = "id")
todf %>% mutate(id = names(x)) %>% unnest(x))
.- .sep
- .messages
a set of glue specs. The glue code can use any global variable, grouping variable, {.count.in}, {.count.out} or {.strata}. Defaults to nothing. Older versions of
tidyr::unnest
can throw an error if this is more than 1 item long and and in that case use thedtrackr
specificp_nest
will work instead.- .headline
a headline glue spec. The glue code can use any global variable, grouping variable, or {.strata}. Defaults to nothing.
Value
the result of the tidyr::unnest
but with a history graph
updated.
Examples
library(dplyr)
library(dtrackr)
starwars %>%
track() %>%
tidyr::unnest(starships, keep_empty = TRUE) %>%
tidyr::nest(world_data = c(-homeworld)) %>%
history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "13 items"
# There is a problem with `tidyr::unnest` that means if you want to override the
# `.messages` option at the moment it will most likely fail. Forcing the use of
# the specific `dtrackr::p_unnest` version solves this problem, until hopefully it is
# resolved in `tidyr`:
starwars %>%
track() %>%
p_unnest(
films,
.messages = c("{.count.in} characters", "{.count.out} appearances")
) %>%
dplyr::group_by(gender) %>%
tidyr::nest(
people = c(-gender, -species, -homeworld),
.messages = c("{.count.in} appearances", "{.count.out} planets")
) %>%
status() %>%
history()
#> dtrackr history:
#> number of flowchart steps: 5 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [NA]: "3 items"
#> ├ [feminine]: "feminine", "13 items"
#> └ [masculine]: "masculine", "49 items"
# This example includes pivoting and nesting. The CMS patient care data
# has multiple tests per institution in a long format, and observed /
# denominator types. Firstly we pivot the data to allow us to easily calculate
# a total percentage for each institution. This is duplicated for every test
# so we nest the tests to get to one row per institution. Those institutions
# with invalid scores are excluded.
cms_history = tidyr::cms_patient_care %>%
track() %>%
tidyr::pivot_wider(names_from = type, values_from = score) %>%
dplyr::mutate(
percentage = sum(observed) / sum(denominator) * 100,
.by = c(ccn, facility_name)
) %>%
tidyr::nest(
results = c(measure_abbr, observed, denominator),
.messages = c("{.count.in} test results", "{.count.out} facilities")
) %>%
exclude_all(
percentage > 100 ~ "{.excluded} facilities with anomalous percentages",
na.rm = TRUE
)
print(cms_history %>% dtrackr::history())
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "126 test results", "14 facilities"
# not run in examples:
if (interactive()) {
cms_history %>% flowchart()
}