Skip to contents

A drop in replacement for tidyr::unnest() which optionally takes a message and headline to store in the history graph. Older versions of tidyr::unnest can throw an error if .messages is more than 1 item long and in that case use the dtrackr specific p_unnest will work instead.

Usage

# S3 method for class 'trackr_df'
unnest(
  data,
  cols,
  ...,
  keep_empty = FALSE,
  ptype = NULL,
  names_sep = NULL,
  names_repair = "check_unique",
  .drop = deprecated(),
  .id = deprecated(),
  .sep = deprecated(),
  .preserve = deprecated(),
  .messages = "",
  .headline = ""
)

Arguments

data

A data frame.

cols

<tidy-select> List-columns to unnest.

When selecting multiple columns, values from the same row will be recycled to their common size.

...

[Deprecated]: previously you could write df %>% unnest(x, y, z). Convert to df %>% unnest(c(x, y, z)). If you previously created a new variable in unnest() you'll now need to do it explicitly with mutate(). Convert df %>% unnest(y = fun(x, y, z)) to df %>% mutate(y = fun(x, y, z)) %>% unnest(y).

keep_empty

By default, you get one row of output for each element of the list that you are unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame or vector), then that entire row will be dropped from the output. If you want to preserve all rows, use keep_empty = TRUE to replace size-0 elements with a single row of missing values.

ptype

Optionally, a named list of column name-prototype pairs to coerce cols to, overriding the default that will be guessed from combining the individual values. Alternatively, a single empty ptype can be supplied, which will be applied to all cols.

names_sep

If NULL, the default, the outer names will come from the inner names. If a string, the outer names will be formed by pasting together the outer and the inner column names, separated by names_sep.

names_repair

Used to check that output data frame has valid names. Must be one of the following options:

  • "minimal": no name repair or checks, beyond basic existence,

  • "unique": make sure names are unique and not empty,

  • "check_unique": (the default), no name repair, but check they are unique,

  • "universal": make the names unique and syntactic

  • a function: apply custom name repair.

  • tidyr_legacy: use the name repair from tidyr 0.8.

  • a formula: a purrr-style anonymous function (see rlang::as_function())

See vctrs::vec_as_names() for more details on these terms and the strategies used to enforce them.

.drop, .preserve

[Deprecated]: all list-columns are now preserved; If there are any that you don't want in the output use select() to remove them prior to unnesting.

.id

[Deprecated]: convert df %>% unnest(x, .id = "id") to df %>% mutate(id = names(x)) %>% unnest(x)).

.sep

[Deprecated]: use names_sep instead.

.messages

a set of glue specs. The glue code can use any global variable, grouping variable, {.count.in}, {.count.out} or {.strata}. Defaults to nothing. Older versions of tidyr::unnest can throw an error if this is more than 1 item long and and in that case use the dtrackr specific p_nest will work instead.

.headline

a headline glue spec. The glue code can use any global variable, grouping variable, or {.strata}. Defaults to nothing.

Value

the result of the tidyr::unnest but with a history graph updated.

See also

Examples

library(dplyr)
library(dtrackr)

starwars %>%
  track() %>%
  tidyr::unnest(starships, keep_empty = TRUE) %>%
  tidyr::nest(world_data = c(-homeworld)) %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "13 items"

# There is a problem with `tidyr::unnest` that means if you want to override the
# `.messages` option at the moment it will most likely fail. Forcing the use of
# the specific `dtrackr::p_unnest` version solves this problem, until hopefully it is
# resolved in `tidyr`:
starwars %>%
  track() %>%
  p_unnest(
    films,
    .messages = c("{.count.in} characters", "{.count.out} appearances")
  ) %>%
  dplyr::group_by(gender) %>%
  tidyr::nest(
    people = c(-gender, -species, -homeworld),
    .messages = c("{.count.in} appearances", "{.count.out} planets")
  ) %>%
  status() %>%
  history()
#> dtrackr history:
#> number of flowchart steps: 5 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [NA]: "3 items"
#> ├ [feminine]: "feminine", "13 items"
#> └ [masculine]: "masculine", "49 items"

# This example includes pivoting and nesting. The CMS patient care data
# has multiple tests per institution in a long format, and observed /
# denominator types. Firstly we pivot the data to allow us to easily calculate
# a total percentage for each institution. This is duplicated for every test
# so we nest the tests to get to one row per institution. Those institutions
# with invalid scores are excluded.
cms_history = tidyr::cms_patient_care %>%
  track() %>%
  tidyr::pivot_wider(names_from = type, values_from = score) %>%
  dplyr::mutate(
    percentage = sum(observed) / sum(denominator) * 100,
    .by = c(ccn, facility_name)
  ) %>%
  tidyr::nest(
    results = c(measure_abbr, observed, denominator),
    .messages = c("{.count.in} test results", "{.count.out} facilities")
  ) %>%
  exclude_all(
    percentage > 100 ~ "{.excluded} facilities with anomalous percentages",
    na.rm = TRUE
  )

print(cms_history %>% dtrackr::history())
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "126 test results", "14 facilities"

# not run in examples:
if (interactive()) {
  cms_history %>% flowchart()
}