Synchronous vs. asynchronous usage

By default, when you call methods from a Chromote or ChromoteSession object, it operates in synchronous mode. For example, when you call a command function (like b$Page$navigate()), a command message is sent to the headless browser, the headless browser executes that command, and it sends a response message back. When the R process receives the response, it converts it from JSON to an R object and the function returns that value. During this time, the R process is blocked; no other R code can execute.

The methods in Chromote/ChromoteSession objects can also be called in asynchronous mode. In async mode, a command function fires off a message to the browser, and then the R process continues running other code; when the response comes back at some time in the future, the R process calls another function and passes the response value to it.

There are two different ways of using async with Chromote. The first is with promises (note that these are not the regular R-language promises; these are similar to JavaScript promises for async programming.) The second way is with callbacks: you call methods with a callback_ argument. Although callbacks are initially easier to use than promises, once you start writing more complex code, managing callbacks becomes very difficult, especially when error handling is involved. For this reason, this document will focus mostly on promises instead of callback-style programming.

When Chromote methods are called in synchronous mode, under the hood, they are implemented with asynchronous functions, and then waiting for the asynchronous functions to resolve.

Technical note: About the event loop

When methods are called asynchronously, the R process will run callbacks and promises using an event loop provided by the later package. This event loop is very similar to the one used in JavaScript, which is explained in depth by Philip Roberts in this video. One important difference between JavaScript’s event loop and the one provided by later’s is that in JavaScript, the event loop only runs when the call stack is empty (essentially, when the JS runtime is idle); with later the event loop similarly runs when the call stack is empty (when the R console is idle), but it can also be run at any point by calling later::run_now().

There is another important difference between the JS event loop and the one used by Chromote: Chromote uses private event loops provided by later. Running the private event loop with run_now() will not interfere with the global event loop. This is crucial for being able to run asynchronous code in a way that appears synchronous.

Why use async?

The synchronous API is easier to use than the asynchronous one. So why would you want to use the async API? Here are some reasons:

  • The async API allows you to send commands to the browser that may take some time for the browser to complete, and they will not block the R process from doing other work while the browser executes the command.
  • The async API lets you send commands to multiple browser “tabs” and let them work in parallel.

On the other hand, async programming can make it difficult to write code that proceeds in a straightforward, linear manner. Async programming may be difficult to use in, say, an analysis script.

When using Chromote interactively at the R console, it’s usually best to just call methods synchronously. This fits well with a iterative, interactive data analysis workflow.

When you are programming with Chromote instead of using it interactively, it is in many cases better to call the methods asynchronously, because it allows for better performance. In a later section, we’ll see how to write asynchronous code with Chromote that can be run either synchronously or asynchronously. This provides the best of both worlds.

To see this in action, we’ll first start a new chromote session:

library(chromote)
b <- ChromoteSession$new()

Async commands

When a method is called in synchronous mode, it blocks until the browser sends back a response, and then it returns the value, converted from JSON to an R object. For example:

# Synchronous
str(b$Browser$getVersion())
#> List of 5
#>  $ protocolVersion: chr "1.3"
#>  $ product        : chr "HeadlessChrome/98.0.4758.102"
#>  $ revision       : chr "@273bf7ac8c909cde36982d27f66f3c70846a3718"
#>  $ userAgent      : chr "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36"
#>  $ jsVersion      : chr "9.8.177.11"

In async mode, there are two ways to use the value that the browser sends to the R process. One is to use the callback_ argument with wait_=FALSE. The wait_=FALSE tells it to run the command in async mode; instead of returning the value from the browser, it returns a promise. For example:

# Async with callback
b$Browser$getVersion(wait_ = FALSE, callback_ = str)
#> <Promise [pending]>
#> List of 5
#>  $ protocolVersion: chr "1.3"
#>  $ product        : chr "HeadlessChrome/98.0.4758.102"
#>  $ revision       : chr "@273bf7ac8c909cde36982d27f66f3c70846a3718"
#>  $ userAgent      : chr "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36"
#>  $ jsVersion      : chr "9.8.177.11"

Notice that the function returned <Promise [pending]>, and then it printed out the data. We’ll come back to the promise part.

Technical note

When you pass a function as callback_, that function is used as the first step in the promise chain that is returned.

If you run the command in a code block (or a function), the entire code block will finish executing before the callback can be executed. For example:

{
  b$Browser$getVersion(wait_ = FALSE, callback_ = str)
  1+1
}
#> [1] 2
#> List of 5
#>  $ protocolVersion: chr "1.3"
#>  $ product        : chr "HeadlessChrome/98.0.4758.102"
#>  $ revision       : chr "@273bf7ac8c909cde36982d27f66f3c70846a3718"
#>  $ userAgent      : chr "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36"
#>  $ jsVersion      : chr "9.8.177.11"

In the code above, it executes the 1+1 and returns the value before the str callback can be executed on the message from the browser.

If you want to store the value from the browser, you can write a callback that stores the value like so:

# This will extract the product field
product <- NULL
b$Browser$getVersion(wait_ = FALSE, callback_ = function(msg) {
  product <<- msg$product
})
#> <Promise [pending]>
# Wait for a moment, then run:
product
#> [1] "HeadlessChrome/98.0.4758.102"

But to get the value, you need to wait for the callback to execute before you can use the value. Waiting for a value is simple when running R interactively – you can just add a message("message arrived") call in the callback and wait for it before running the next line of code – but waiting for the value is not easy to do using ordinary straight-line coding. Fortunately, Chromote has a way to wait for async operations, which we’ll see later.

The other way of using the value is to use promises. If wait_=FALSE and no callback_ is passed to the command, then it will simply return a promise. Promises have many advantages over plain old callbacks: they are easier to chain, and they provide better error-handling capabilities. You can chain more steps to the promise: when the promise resolves – that is, when the message is received from the browser – it will run the next step in the promise chain.

Here’s an example that uses promises to print out the version information. Note that the surrounding curly braces are there to indicate that this whole thing must be run as a block without any idle time in between the function calls – if you were to run the code in the R console line-by-line, the browser would send back the message and the promise would resolve before you called p$then(), which is where you tell the promise what to do with the return value. (The curly braces aren’t strictly necessary – you could run the code inside the braces in a single paste operation and have the same effect.)

{
  p <- b$Browser$getVersion(wait_ = FALSE)
  p$then(function(value) {
    print(value$product)
  })
}
# Wait for a moment, then prints:
#> [1] "HeadlessChrome/98.0.4758.102"

Here are some progressively more concise ways of achieving the same thing. As you work with promises, you will see these various forms of promise chaining. For more information, see the promises documentation.

library(promises)

# Chained method call to $then()  
b$Browser$getVersion(wait_ = FALSE)$then(function(value) {  
  print(value$product)  
})  

# Regular function pipe to promises::then() function 
b$Browser$getVersion(wait_ = FALSE) |> then(function(value) {  
  print(value$product)  
})  


# Promise-pipe to anonymous function, which must be wrapped in parens
b$Browser$getVersion(wait_ = FALSE) %...>% (function(value) {
  print(value$product)
})

# Promise-pipe to an expression (which gets converted to a function with the first argument `.`)
b$Browser$getVersion(wait_ = FALSE) %...>% { print(.$product) }

# Promise-pipe to a named function, with parentheses
print_product <- function(msg) print(msg$product)
b$Browser$getVersion(wait_ = FALSE) %...>% print_product()

# Promise-pipe to a named function, without parentheses
b$Browser$getVersion(wait_ = FALSE) %...>% print_product

The earlier example where we found the dimensions of a DOM element using CSS selectors was done with the synchronous API and %>% pipes. The same can be done in async mode by switching from the regular pipe to the promise-pipe, and calling all the methods with wait_=FALSE:

b$DOM$getDocument(wait_ = FALSE) %...>%
  { b$DOM$querySelector(.$root$nodeId, ".sidebar", wait_ = FALSE) } %...>%
  { b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) } %...>%
  str()


# Or, more verbosely:
b$DOM$getDocument(wait_ = FALSE)$
  then(function(value) {
    b$DOM$querySelector(value$root$nodeId, ".sidebar", wait_ = FALSE)
  })$
  then(function(value) {
    b$DOM$getBoxModel(value$nodeId, wait_ = FALSE)
  })$
  then(function(value) {
    str(value)
  })

Each step in the promise chain uses the value from the previous step, via . or value. Note that not all asynchronous code works in such a linear, straightforward way. Sometimes it is necessary to save data from intermediate steps in a broader-scoped variable, if it is to be used in a later step in the promise chain.

Turning asynchronous code into synchronous code

There may be times, especially when programming with Chromote, where you want to wait for a promise to resolve before continuing. To do this, you can use the Chromote or ChromoteSession’s wait_for() method.

# A promise chain
p <- b$DOM$getDocument(wait_ = FALSE) %...>%
  { b$DOM$querySelector(.$root$nodeId, ".sidebar", wait_ = FALSE) } %...>%
  { b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) } %...>%
  str()

b$wait_for(p)
#> List of 1
#>  $ model:List of 6
#>   ..$ content:List of 8
#>   .. ..$ : num 128
#>   .. ..$ : int 28
#>   .. ..$ : num 292
#>   .. ..$ : int 28
#>   .. ..$ : num 292
#>   .. ..$ : num 988
#>   .. ..$ : num 128
#>   .. ..$ : num 988
#>   ..$ padding:List of 8
#>   .. ..$ : num 112
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : num 988
#>   .. ..$ : num 112
#>   .. ..$ : num 988
#>   ..$ border :List of 8
#>   .. ..$ : num 112
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : num 988
#>   .. ..$ : num 112
#>   .. ..$ : num 988
#>   ..$ margin :List of 8
#>   .. ..$ : int 15
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : num 1030
#>   .. ..$ : int 15
#>   .. ..$ : num 1030
#>   ..$ width  : int 195
#>   ..$ height : int 960

This documentation will refer to this technique as synchronizing asynchronous code. The way that wait_for() works is that it runs the Chromote object’s private event loop until the promise has resolved. Because the event loop is private, running it will not interfere with the global event loop, which, for example, may used by Shiny to serve a web application.

The $wait_for() method will return the value from the promise, so instead of putting the str() in the chain, you call str() on the value returned by $wait_for():

p <- b$DOM$getDocument(wait_ = FALSE) %...>%
  { b$DOM$querySelector(.$root$nodeId, ".sidebar", wait_ = FALSE) } %...>%
  { b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) }

x <- b$wait_for(p)
str(x)
#> List of 1
#>  $ model:List of 6
#>   ..$ content:List of 8
#>   .. ..$ : num 128
#>   .. ..$ : int 28
#>   .. ..$ : num 292
#>   .. ..$ : int 28
#>   .. ..$ : num 292
#>   .. ..$ : num 988
#>   .. ..$ : num 128
#>   .. ..$ : num 988
#>   ..$ padding:List of 8
#>   .. ..$ : num 112
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : num 988
#>   .. ..$ : num 112
#>   .. ..$ : num 988
#>   ..$ border :List of 8
#>   .. ..$ : num 112
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : num 988
#>   .. ..$ : num 112
#>   .. ..$ : num 988
#>   ..$ margin :List of 8
#>   .. ..$ : int 15
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : int 28
#>   .. ..$ : num 308
#>   .. ..$ : num 1030
#>   .. ..$ : int 15
#>   .. ..$ : num 1030
#>   ..$ width  : int 195
#>   ..$ height : int 960

There are some methods in Chromote and ChromoteSession objects which are written using asynchronous method calls, but conditionally use wait_for() so that they can be called either synchronously or asynchronously. The $screenshot() method works this way, for example. You can call b$screenshot(wait_=TRUE) (which is the default) for synchronous behavior, or b$screenshot(wait_=FALSE) for async behavior.

If you want to write a function that can be called in either sync or async mode, you can use this basic structure: First, construct a promise chain by calling the CDP methods with wait_=FALSE. Then, at the end, if the user used wait_=TRUE, wait for the promise to resolve; otherwise, simply return the promise.

getBoxModel <- function(b, selector = "html", wait_ = TRUE) {
  p <- b$DOM$getDocument(wait_ = FALSE) %...>%
    { b$DOM$querySelector(.$root$nodeId, selector, wait_ = FALSE) } %...>%
    { b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) }

  if (wait_) {
    b$wait_for(p)
  } else {
    p
  }
}

# Synchronous call
str(getBoxModel(b, ".sidebar"))

# Asynchronous call
getBoxModel(b, ".sidebar", wait_ = FALSE) %...>%
  str()

But, you might be wondering, if we want a synchronous API, why not simply write the synchronous code by calling the individual methods synchronously, and using a normal pipe to connect them, as in:

b$DOM$getDocument() %>%
  { b$DOM$querySelector(.$root$nodeId, ".sidebar") } %>%
  { b$DOM$getBoxModel(.$nodeId) } %>%
  str()

There are two reasons for this. The first is that this would require a duplication of all the code for the sync and async code paths. Another reason is that the internal async code can be written to send multiple independent command chains to the ChromoteSession (or multiple ChromoteSessions), and they will be executed concurrently. If there are multiple promise chains, you can do something like the following to wait for all of them to resolve:

# Starting with promises p1, p2, and p3, create a promise that resolves only
# after they have all been resolved.
p <- promise_all(p1, p2, p3)
b$wait_for(p)

Async events

In addition to commands The Chrome DevTools Protocol also has events. These are messages that are sent from the browser to the R process when various browser events happen.

As an example, it can be a bit tricky to find out when to take a screenshot. When you send the browser a command to navigate to a page, it sends a response immediately, but it may take several more seconds for it to actually finish loading that page. When it does, the Page.loadEventFired event will be fired.

b <- ChromoteSession$new()

# Navigate and wait for Page.loadEventFired.
# Note: these lines are put in a single code block to ensure that there is no
# idle time in between.
{
  b$Page$navigate("https://www.r-project.org/")
  str(b$Page$loadEventFired())
}
#> List of 1
#>  $ timestamp: num 683

With the synchronous API, the call to b$Page$loadEventFired() will block until Chromote receives a Page.loadEventFired message from the browser. However, with the async promise API, you would write it like this:

b$Page$navigate("https://www.r-project.org/", wait_ = FALSE) %...>%
  { b$Page$loadEventFired(wait_ = FALSE) } %...>%
  { str(.) }

# Or, more verbosely:
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE)$
  then(function(value) {
    b$Page$loadEventFired(wait_ = FALSE)
  })$
  then(function(value) {
    str(value)
  })

There will be a short delay after running the code before the value is printed.

However, even this is not perfectly reliable, because in some cases, the browser will navigate to the page before it receives the loadEventFired command from Chromote. If that happens, the load even will have already happened before the browser starts waiting for it, and it will hang. The correct way to deal with this is to issue the loadEventFired command before navigating to the page, and then wait for the loadEventFired promise to resolve.

# This is the correct way to wait for a page to load with async and then chain more commands
p <- b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE)

# A promise chain of more commands after the page has loaded
p$then(function(value) {
  str(value)
})

If you want to block until the page has loaded, you can once again use wait_for(). For example:

p <- b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE)

# wait_for returns the last value in the chain, so we can call str() on it
str(b$wait_for(p))
#> List of 1
#>  $ timestamp: num 683

Technical note

The Chrome DevTools Protocol itself does not automatically enable event notifications. Normally, you would have to call the Page.enable method to turn on event notifications for the Page domain. However, Chromote saves you from needing to do this step by keeping track of how many callbacks there are for each domain. When the number of event callbacks for a domain goes from 0 to 1, Chromote automatically calls $enable() for that domain, and when it goes from 1 to 0, it it calls $disable(). See vignette("commands-and-events") for more details.

In addition to async events with promises, they can also be used with regular callbacks. For example:

b$Page$loadEventFired(callback_ = str)

This tells Chromote to call str() (which prints to the console) on the message value every single time that a Page.loadEventFired event message is received. It will continue doing this indefinitely. (Calling an event method this way also increments the event callback counter.)

When an event method is called with a callback, the return value is a function which will cancel the callback, so that it will no longer fire. (The canceller function also decrements the event callback counter. If you lose the canceller function, there is no way to decrement the callback counter back to 0.)

cancel_load_event_callback <- b$Page$loadEventFired(callback_ = str)

# Each of these will cause the callback to fire.
n1 <- b$Page$navigate("https://www.r-project.org/")
n2 <- b$Page$navigate("https://cran.r-project.org/")

cancel_load_event_callback()

# No longer causes the callback to fire.
n3 <- b$Page$navigate("https://www.rstudio.com/")