Using roogledocs for documents

remove.packages("roogledocs")
devtools::install_local("~/Git/roogledocs",upgrade = FALSE)

Initialising the library

Prior to doing any analysis we may have some form of template. This might be a document skeleton or report template. It can contain empty tables, place-holder images, and double-brace tags, all of which can be replaced by calculated content from R. To do this we initialise the roogledocs library.


# There is a global flag to disable `roogledocs` in case you want to develop and
# test offline.
options('roogledocs.disabled'=FALSE)

# roogledocs stores an authentication token on your local hard drive.
options("roogledocs.tokenDirectory"="~/.roogledocs-test")

J = roogledocs::JavaApi$get(logLevel = "WARN")
x = J$RoogleDocs$new()

Most of the time you will be creating or updating a single document. For this vignettes sake it is useful to be able to delete previous versions. The point of roogledocs though is actually to work with a continuously updated document and therefore deleting documents is usually not what you want to do. Likewise for this vignette it is useful to get a copy of the Google doc as a PDF from R. This may not be that useful in real life. The main function here though is the findOrCloneTemplate() method which lets you find a Google doc by name, or clone a template document if you can’t find it. There are also equivalent methods to find or create blank documents, or just find Google docs by name or sharing URL if they already exist.

roogledocs::delete_document("roogledocs-demo",areYouSure = TRUE)

Sometimes (particularly if the roogledocs library has been updated) we get a TokenResponseException, saying the token has been expired or revoked. In this event explicitly re-authenticating the library can be done though a call to roogledocs::reauth().

roogledocs::reauth()

Once authentication is working we get a new document based on a template I creates and shared:

doc = roogledocs::doc_from_template(
  "roogledocs-demo",
  "https://docs.Google.com/document/d/1XnrBgBJFz7jEMYtw3o3YKbOuMdWvUkzIul4hb2B-SC4/edit?usp=sharing"
)
fs::dir_create(here::here("docs/articles/web-only"))
doc$saveAsPdf(here::here("docs/articles/web-only/example-template-doc.pdf"))

Running the chunk above should authenticate you and grab a publicly shared template I created, and make a copy of it in your Google drive under the name “roogledocs-demo”. The document template can be seen here, or as the original Google doc.

Tabular data

Inserting tables in this document is done by index. There is already a blank table 1 in the document. At the moment we support only huxtable tables and plain data-frames. The following chunk creates a sample huxtable from the diamonds data set, applies some formatting and replaces the content of table one in the template with this data. The formatting is preserved more or less. There is only support for basic text formatting, borders (black solid only at present), background colour, and alignment. The table will respect column widths as a relative measure and the command takes a overall table width parameter. Layout will then depend on the content. Custom row heights are not supported.

hux = diamonds %>% mutate(colorCat = ifelse(color <= "G", "D-G","G-J")) %>% group_by(cut,colorCat) %>% summarise(
  `Size (mean + sd)` = sprintf("%1.2f \u00B1 %1.2f",mean(carat),sd(carat)),
  `Cost (mean + sd)` = sprintf("%1.0f \u00B1 %1.0f",mean(price),sd(price))
) %>% huxtable::as_hux() %>%
  huxtable::theme_article() %>% 
  huxtable::set_all_padding(value = 0) %>%
  huxtable::merge_repeated_rows()
## `summarise()` has grouped output by 'cut'. You can override using the `.groups`
## argument.

table_1 = hux %>% roogledocs::as.long_format_table() 
doc$updateTaggedTable(table_1, tableWidthInches = 4)

hux

cut	colorCat	Size (mean + sd)	Cost (mean + sd)
Fair	D-G	0.93 ± 0.43	3997 ± 3312
Fair	G-J	1.24 ± 0.58	4972 ± 3873
Good	D-G	0.78 ± 0.39	3620 ± 3380
Good	G-J	1.00 ± 0.54	4610 ± 4194
Very Good	D-G	0.72 ± 0.39	3587 ± 3666
Very Good	G-J	1.00 ± 0.54	4873 ± 4358
Premium	D-G	0.79 ± 0.44	4060 ± 4044
Premium	G-J	1.11 ± 0.59	5633 ± 4732
Ideal	D-G	0.63 ± 0.36	3151 ± 3562
Ideal	G-J	0.88 ± 0.53	4233 ± 4273

Updating figures

A similar process exists for figures. We need to have the figure as a PNG image on the local computer as a result of, for example, a ggplot. Once a local PNG is available, it is temporarily uploaded to your Google drive, added to the document and then temporary drive file deleted. In this example we update figure 1 replacing the {{figure_1}} tag in the original Google doc with the image:

g = ggplot(diamonds, aes(x=carat,y=price, colour=color))+geom_point()
figure_1 = roogledocs::ggplot_to_png(g, width=6, height=3)

# If the first parameter is passed as a variable and no tag is given, 
# as in this example the variable name is used as the tag and the image is 
# inserted into the document at the location of the tag:

doc$updateTaggedImage(figure_1)
## Figure figure_1 updated

# This is equivalent to:
# doc$updateTaggedImage(figure_1, tagName = "figure_1")

g

Updating a second figure can happen in the same way, but in this case we use the alternative by specifying the index of insertion. The dimensions of the image in the Google doc should exactly match the dimensions of the PNG file saved from R. This means if you change the size of an image in R it will be changed in the document. Image dimensions are therefore important to decide on in R. If the figure or table had not already existed in the target Google doc (e.g. because you started with a blank document) they would simply have been uploaded and added at the end of the document as a sequentially numbered image. If you rearrange the order figures in the Google doc it is up to you to fix the indexes in your code. Captions are not handled here at all as it is assumed that the captions will be maintained in the Google doc and not in R (see “Updating tagged text” section later).

g2 = ggplot(diamonds, aes(x=cut,y=price, fill=cut))+
  geom_violin(draw_quantiles = c(0.95,0.5,0.05))+
  scale_fill_brewer()+
  theme(axis.text.x = element_text(angle = 15, vjust=1,hjust=1))
filename = roogledocs::ggplot_to_png(g2, width=4, height=3)

# The figure index has to be calculated with respect to any changes that have 
# already been made in the document. In this case inserting figure 1 before
# means this is inserted in the right place, but it is up to the user to make 
# sure this is right.
filename %>% doc$updateFigure(figureIndex = 2)
## Figure 2 updated
g2

Along with updating a figure within a document as a png it is useful to also be able to keep a copy of the figure with the document in a publication ready format such as pdf. We might also be generating supplementary material / or data, and or tables in a separate word document. In my work flow all of these are generated by R scripts. Automatically uploading these documents to a folder in Google drive makes managing the output of an analysis fairly straightforward. Because Google Drive can have multiple files of the same name in the same directory the behaviour must be specified if the file already exists.

doc$uploadSupplementaryFiles(absoluteFilePath = figure_1,overwrite = TRUE)

Updating tagged text

If you want to update small textual results - e.g. results in the abstract of a paper (similar to RMarkdown in-line chunks) you can place a double-brace tag into the Google doc and replace this with text generated in R. The result is inserted in the Google doc as a URL link so that further changes or updates in code can find the tagged text. Links like this can be moved around the document, or copied and pasted without losing the tag. You can get a list of the tags present in a document like this:

doc$tagsDefined()

tag	count
table_1_update_date	1
table_1	1
cite:challen2019	1
cite:challen2021	1
references	1
cite:r6gen	1
figure_1	1
cite:roogledocs	1
diamonds_mean_sd	2

Here we have 2 tags. The tags can be then set to specific content like this:

format(Sys.Date(),"%d/%m/%Y") %>% doc$updateTaggedText(tagName = "table_1_update_date")
## Text table_1_update_date updated
diamonds_mean_sd = sprintf("%1.1f \u00B1 %1.1f",mean(diamonds$price),sd(diamonds$price)) 

# if we don't give a specific tag name then the variable name is used:
doc$updateTaggedText(diamonds_mean_sd)
## Text diamonds_mean_sd updated

If the tags have become broken, you may need to revert all the tags in a document so you can easily see where the good tags are (or to identify if some tags have been lost by copy paste). This can be done with the doc$revertTags() function, which puts the double-brace tags back in the document removing the auto-text. It is possible to use this kind of approach to automatically add in captions for figures or tables.

New content

Appending new content is also possible either as a simple styled text string, with consistent formatting, or as a continuous block (or blocks) with different styles, as specified in a data-frame. At the moment this is only possible at the end of the document and is really designed if a document is being generated completely from scratch. A minimal subset of text formatting is supported although by using named styles you can do more. It is not actually that hard to support more styles but at the moment this is not the primary use case for roogledocs.


doc$appendText("\nAdding new content\n","HEADING_1")

content = tibble::tribble(
  ~label, ~link, ~fontName, ~fontFace,
  "Roogledocs", "https://terminological.github.io/roogledocs/r-library/docs/", "Courier New", "plain",
  " is also able to add text at the end of the document with complex formatting. ", NA, NA, "plain",
  "Supporting fonts and font face formatting such as ",  NA, NA, "plain",
  "bold, ", NA, NA, "bold",
  "italic ", NA, NA, "italic",
  "and underlined", NA, NA, "underlined",
  " amongst other things.\n\n", NA, NA, "plain"
  )

doc$appendFormattedParagraph(content)

content

label	link	fontName	fontFace
Roogledocs	https://terminological.github.io/roogledocs/r-library/docs/	Courier New	plain
is also able to add text at the end of the document with complex formatting.			plain
Supporting fonts and font face formatting such as			plain
bold,			bold
italic			italic
and underlined			underlined
amongst other things.			plain

It could be possible to combine writing new content and updating tagged text in the same script to programmatically generate replacement content. Likewise this could be used for captions of tables and figures when they are added. When you write new content you can write in double-brace tags and these can then be updated at a later stage for example.

Citations are referenced as a tag of the format {{cite:XXXX;YYYY}} where X and Y are bibtext ids. References will be updated if present or if not


doc$updateCitations(here::here("vignettes/web-only/test.bib"), citationStyle = "journal-of-infection")

Finally we can write out the new document to a PDF, mostly so we can see what we have done. When we write out a document any roogledocs links are removed:

doc$saveAsPdf(here::here("docs/articles/web-only/example-after-update.pdf"))

After the analysis has run we have a new version of the Google document which should look like this.

Limitations and further options

There are a lot of possible ways to extend roogledocs. The current implementation is in evolution to meet my own needs.

For example, additional formatting options such as text colour is not implemented but would be relatively straightforward.
Currently there is no support for lists, which should be fairly simple, but I didn’t have a need yet.
Absolutely positioned images are ignored completely. This is probably a good thing as it lets you have a logo within the document for example without messing up the dynamic images from R.
Building a Google docs based drop in for the officer library for MS word could be good if it means we can leverage that interface for other uses (e.g. knitr support).

Rob Challen