A drop in replacement for tidyr::pivot_wider() which optionally takes a
message and headline to store in the history graph.
Usage
# S3 method for class 'trackr_df'
pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL,
.messages = c("wide format", "{.count.in} before", "{.count.out} after"),
.headline = .defaultHeadline()
)Arguments
- data
A data frame to pivot.
- ...
Additional arguments passed on to methods.
- id_cols
<
tidy-select> A set of columns that uniquely identify each observation. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables.Defaults to all columns in
dataexcept for the columns specified throughnames_fromandvalues_from. If a tidyselect expression is supplied, it will be evaluated ondataafter removing the columns specified throughnames_fromandvalues_from.- id_expand
Should the values in the
id_colscolumns be expanded byexpand()before pivoting? This results in more rows, the output will contain a complete expansion of all possible values inid_cols. Implicit factor levels that aren't represented in the data will become explicit. Additionally, the row values corresponding to the expandedid_colswill be sorted.- names_from, values_from
<
tidy-select> A pair of arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).If
values_fromcontains multiple values, the value will be added to the front of the output column.- names_prefix
String added to the start of every variable name. This is particularly useful if
names_fromis a numeric vector and you want to create syntactic variable names.- names_sep
If
names_fromorvalues_fromcontains multiple variables, this will be used to join their values together into a single string to use as a column name.- names_glue
Instead of
names_sepandnames_prefix, you can supply a glue specification that uses thenames_fromcolumns (and special.value) to create custom column names.- names_sort
Should the column names be sorted? If
FALSE, the default, column names are ordered by first appearance.- names_vary
When
names_fromidentifies a column (or columns) with multiple unique values, and multiplevalues_fromcolumns are provided, in what order should the resulting column names be combined?"fastest"variesnames_fromvalues fastest, resulting in a column naming scheme of the form:value1_name1, value1_name2, value2_name1, value2_name2. This is the default."slowest"variesnames_fromvalues slowest, resulting in a column naming scheme of the form:value1_name1, value2_name1, value1_name2, value2_name2.
- names_expand
Should the values in the
names_fromcolumns be expanded byexpand()before pivoting? This results in more columns, the output will contain column names corresponding to a complete expansion of all possible values innames_from. Implicit factor levels that aren't represented in the data will become explicit. Additionally, the column names will be sorted, identical to whatnames_sortwould produce.- names_repair
What happens if the output has invalid column names? The default,
"check_unique"is to error if the columns are duplicated. Use"minimal"to allow duplicates in the output, or"unique"to de-duplicated by adding numeric suffixes. Seevctrs::vec_as_names()for more options.- values_fill
Optionally, a (scalar) value that specifies what each
valueshould be filled in with when missing.This can be a named list if you want to apply different fill values to different value columns.
- values_fn
Optionally, a function applied to the value in each cell in the output. You will typically use this when the combination of
id_colsandnames_fromcolumns does not uniquely identify an observation.This can be a named list if you want to apply different aggregations to different
values_fromcolumns.- unused_fn
Optionally, a function applied to summarize the values from the unused columns (i.e. columns not identified by
id_cols,names_from, orvalues_from).The default drops all unused columns from the result.
This can be a named list if you want to apply different aggregations to different unused columns.
id_colsmust be supplied forunused_fnto be useful, since otherwise all unspecified columns will be consideredid_cols.This is similar to grouping by the
id_colsthen summarizing the unused columns usingunused_fn.- .messages
a set of glue specs. The glue code can use any global variable, grouping variable, or {.strata}. Defaults to nothing.
- .headline
a headline glue spec. The glue code can use any global variable, grouping variable, or {.strata}. Defaults to nothing.
Value
the data dataframe result of the tidyr::pivot_wider function but with
a history graph updated.
Examples
library(dplyr)
library(dtrackr)
starwars %>%
track() %>%
tidyr::unnest(starships, keep_empty = TRUE) %>%
tidyr::nest(world_data = c(-homeworld)) %>%
history()
#> dtrackr history:
#> number of flowchart steps: 2 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "13 items"
# There is a problem with `tidyr::unnest` that means if you want to override the
# `.messages` option at the moment it will most likely fail. Forcing the use of
# the specific `dtrackr::p_unnest` version solves this problem, until hopefully it is
# resolved in `tidyr`:
starwars %>%
track() %>%
p_unnest(
films,
.messages = c("{.count.in} characters", "{.count.out} appearances")
) %>%
dplyr::group_by(gender) %>%
tidyr::nest(
people = c(-gender, -species, -homeworld),
.messages = c("{.count.in} appearances", "{.count.out} planets")
) %>%
status() %>%
history()
#> dtrackr history:
#> number of flowchart steps: 5 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> ├ [NA]: "3 items"
#> ├ [feminine]: "feminine", "13 items"
#> └ [masculine]: "masculine", "49 items"
# This example includes pivoting and nesting. The CMS patient care data
# has multiple tests per institution in a long format, and observed /
# denominator types. Firstly we pivot the data to allow us to easily calculate
# a total percentage for each institution. This is duplicated for every test
# so we nest the tests to get to one row per institution. Those institutions
# with invalid scores are excluded.
cms_history = tidyr::cms_patient_care %>%
track() %>%
tidyr::pivot_wider(names_from = type, values_from = score) %>%
dplyr::mutate(
percentage = sum(observed) / sum(denominator) * 100,
.by = c(ccn, facility_name)
) %>%
tidyr::nest(
results = c(measure_abbr, observed, denominator),
.messages = c("{.count.in} test results", "{.count.out} facilities")
) %>%
exclude_all(
percentage > 100 ~ "{.excluded} facilities with anomalous percentages",
na.rm = TRUE
)
print(cms_history %>% dtrackr::history())
#> dtrackr history:
#> number of flowchart steps: 3 (approx)
#> tags defined: <none>
#> items excluded so far: <not capturing exclusions>
#> last entry / entries:
#> └ "126 test results", "14 facilities"
# not run in examples:
if (interactive()) {
cms_history %>% flowchart()
}
