By default, when you call methods
from a Chromote
or ChromoteSession
object, it
operates in synchronous mode. For example, when you
call a command function (like b$Page$navigate()
), a command
message is sent to the headless browser, the headless browser executes
that command, and it sends a response message back. When the R process
receives the response, it converts it from JSON to an R object and the
function returns that value. During this time, the R process is blocked;
no other R code can execute.
The methods in Chromote/ChromoteSession objects can also be called in asynchronous mode. In async mode, a command function fires off a message to the browser, and then the R process continues running other code; when the response comes back at some time in the future, the R process calls another function and passes the response value to it.
There are two different ways of using async with Chromote. The first
is with promises (note
that these are not the regular R-language promises; these are similar to
JavaScript promises for async programming.) The second way is with
callbacks: you call methods with a callback_
argument.
Although callbacks are initially easier to use than promises, once you
start writing more complex code, managing callbacks becomes very
difficult, especially when error handling is involved. For this reason,
this document will focus mostly on promises instead of callback-style
programming.
When Chromote methods are called in synchronous mode, under the hood, they are implemented with asynchronous functions, and then waiting for the asynchronous functions to resolve.
Technical note: About the event loop
When methods are called asynchronously, the R process will run callbacks and promises using an event loop provided by the later package. This event loop is very similar to the one used in JavaScript, which is explained in depth by Philip Roberts in this video. One important difference between JavaScript’s event loop and the one provided by later’s is that in JavaScript, the event loop only runs when the call stack is empty (essentially, when the JS runtime is idle); with later the event loop similarly runs when the call stack is empty (when the R console is idle), but it can also be run at any point by calling
later::run_now()
.There is another important difference between the JS event loop and the one used by Chromote: Chromote uses private event loops provided by later. Running the private event loop with
run_now()
will not interfere with the global event loop. This is crucial for being able to run asynchronous code in a way that appears synchronous.
The synchronous API is easier to use than the asynchronous one. So why would you want to use the async API? Here are some reasons:
On the other hand, async programming can make it difficult to write code that proceeds in a straightforward, linear manner. Async programming may be difficult to use in, say, an analysis script.
When using Chromote interactively at the R console, it’s usually best to just call methods synchronously. This fits well with a iterative, interactive data analysis workflow.
When you are programming with Chromote instead of using it interactively, it is in many cases better to call the methods asynchronously, because it allows for better performance. In a later section, we’ll see how to write asynchronous code with Chromote that can be run either synchronously or asynchronously. This provides the best of both worlds.
To see this in action, we’ll first start a new chromote session:
When a method is called in synchronous mode, it blocks until the browser sends back a response, and then it returns the value, converted from JSON to an R object. For example:
# Synchronous
str(b$Browser$getVersion())
#> List of 5
#> $ protocolVersion: chr "1.3"
#> $ product : chr "HeadlessChrome/98.0.4758.102"
#> $ revision : chr "@273bf7ac8c909cde36982d27f66f3c70846a3718"
#> $ userAgent : chr "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36"
#> $ jsVersion : chr "9.8.177.11"
In async mode, there are two ways to use the value that the browser
sends to the R process. One is to use the callback_
argument with wait_=FALSE
. The wait_=FALSE
tells it to run the command in async mode; instead of returning the
value from the browser, it returns a promise. For example:
# Async with callback
b$Browser$getVersion(wait_ = FALSE, callback_ = str)
#> <Promise [pending]>
#> List of 5
#> $ protocolVersion: chr "1.3"
#> $ product : chr "HeadlessChrome/98.0.4758.102"
#> $ revision : chr "@273bf7ac8c909cde36982d27f66f3c70846a3718"
#> $ userAgent : chr "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36"
#> $ jsVersion : chr "9.8.177.11"
Notice that the function returned
<Promise [pending]>
, and then it printed out the
data. We’ll come back to the promise part.
Technical note
When you pass a function as
callback_
, that function is used as the first step in the promise chain that is returned.
If you run the command in a code block (or a function), the entire code block will finish executing before the callback can be executed. For example:
{
b$Browser$getVersion(wait_ = FALSE, callback_ = str)
1+1
}
#> [1] 2
#> List of 5
#> $ protocolVersion: chr "1.3"
#> $ product : chr "HeadlessChrome/98.0.4758.102"
#> $ revision : chr "@273bf7ac8c909cde36982d27f66f3c70846a3718"
#> $ userAgent : chr "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/98.0.4758.102 Safari/537.36"
#> $ jsVersion : chr "9.8.177.11"
In the code above, it executes the 1+1
and returns the
value before the str
callback can be executed on the
message from the browser.
If you want to store the value from the browser, you can write a callback that stores the value like so:
# This will extract the product field
product <- NULL
b$Browser$getVersion(wait_ = FALSE, callback_ = function(msg) {
product <<- msg$product
})
#> <Promise [pending]>
# Wait for a moment, then run:
product
#> [1] "HeadlessChrome/98.0.4758.102"
But to get the value, you need to wait for the callback to execute
before you can use the value. Waiting for a value is simple when running
R interactively – you can just add a
message("message arrived")
call in the callback and wait
for it before running the next line of code – but waiting for the value
is not easy to do using ordinary straight-line coding. Fortunately,
Chromote has a way to wait for async operations, which we’ll see
later.
The other way of using the value is to use promises. If
wait_=FALSE
and no callback_
is passed to the
command, then it will simply return a promise. Promises have many
advantages over plain old callbacks: they are easier to chain, and they
provide better error-handling capabilities. You can chain more
steps to the promise: when the promise resolves – that is, when the
message is received from the browser – it will run the next step in the
promise chain.
Here’s an example that uses promises to print out the version
information. Note that the surrounding curly braces are there to
indicate that this whole thing must be run as a block without any idle
time in between the function calls – if you were to run the code in the
R console line-by-line, the browser would send back the message and the
promise would resolve before you called p$then()
, which is
where you tell the promise what to do with the return value. (The curly
braces aren’t strictly necessary – you could run the code inside the
braces in a single paste operation and have the same effect.)
{
p <- b$Browser$getVersion(wait_ = FALSE)
p$then(function(value) {
print(value$product)
})
}
# Wait for a moment, then prints:
#> [1] "HeadlessChrome/98.0.4758.102"
Here are some progressively more concise ways of achieving the same thing. As you work with promises, you will see these various forms of promise chaining. For more information, see the promises documentation.
library(promises)
# Chained method call to $then()
b$Browser$getVersion(wait_ = FALSE)$then(function(value) {
print(value$product)
})
# Regular function pipe to promises::then() function
b$Browser$getVersion(wait_ = FALSE) |> then(function(value) {
print(value$product)
})
# Promise-pipe to anonymous function, which must be wrapped in parens
b$Browser$getVersion(wait_ = FALSE) %...>% (function(value) {
print(value$product)
})
# Promise-pipe to an expression (which gets converted to a function with the first argument `.`)
b$Browser$getVersion(wait_ = FALSE) %...>% { print(.$product) }
# Promise-pipe to a named function, with parentheses
print_product <- function(msg) print(msg$product)
b$Browser$getVersion(wait_ = FALSE) %...>% print_product()
# Promise-pipe to a named function, without parentheses
b$Browser$getVersion(wait_ = FALSE) %...>% print_product
The earlier example where we found the dimensions of a DOM element
using CSS selectors was done with the synchronous API and
%>%
pipes. The same can be done in async mode by
switching from the regular pipe to the promise-pipe, and calling all the
methods with wait_=FALSE
:
b$DOM$getDocument(wait_ = FALSE) %...>%
{ b$DOM$querySelector(.$root$nodeId, ".sidebar", wait_ = FALSE) } %...>%
{ b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) } %...>%
str()
# Or, more verbosely:
b$DOM$getDocument(wait_ = FALSE)$
then(function(value) {
b$DOM$querySelector(value$root$nodeId, ".sidebar", wait_ = FALSE)
})$
then(function(value) {
b$DOM$getBoxModel(value$nodeId, wait_ = FALSE)
})$
then(function(value) {
str(value)
})
Each step in the promise chain uses the value from the previous step,
via .
or value
. Note that not all asynchronous
code works in such a linear, straightforward way. Sometimes it is
necessary to save data from intermediate steps in a broader-scoped
variable, if it is to be used in a later step in the promise chain.
There may be times, especially when programming with Chromote, where
you want to wait for a promise to resolve before continuing. To do this,
you can use the Chromote or ChromoteSession’s wait_for()
method.
# A promise chain
p <- b$DOM$getDocument(wait_ = FALSE) %...>%
{ b$DOM$querySelector(.$root$nodeId, ".sidebar", wait_ = FALSE) } %...>%
{ b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) } %...>%
str()
b$wait_for(p)
#> List of 1
#> $ model:List of 6
#> ..$ content:List of 8
#> .. ..$ : num 128
#> .. ..$ : int 28
#> .. ..$ : num 292
#> .. ..$ : int 28
#> .. ..$ : num 292
#> .. ..$ : num 988
#> .. ..$ : num 128
#> .. ..$ : num 988
#> ..$ padding:List of 8
#> .. ..$ : num 112
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : num 988
#> .. ..$ : num 112
#> .. ..$ : num 988
#> ..$ border :List of 8
#> .. ..$ : num 112
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : num 988
#> .. ..$ : num 112
#> .. ..$ : num 988
#> ..$ margin :List of 8
#> .. ..$ : int 15
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : num 1030
#> .. ..$ : int 15
#> .. ..$ : num 1030
#> ..$ width : int 195
#> ..$ height : int 960
This documentation will refer to this technique as
synchronizing asynchronous code. The way that
wait_for()
works is that it runs the Chromote object’s
private event loop until the promise has resolved. Because the event
loop is private, running it will not interfere with the global
event loop, which, for example, may used by Shiny to serve a web
application.
The $wait_for()
method will return the value from the
promise, so instead of putting the str()
in the chain, you
call str()
on the value returned by
$wait_for()
:
p <- b$DOM$getDocument(wait_ = FALSE) %...>%
{ b$DOM$querySelector(.$root$nodeId, ".sidebar", wait_ = FALSE) } %...>%
{ b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) }
x <- b$wait_for(p)
str(x)
#> List of 1
#> $ model:List of 6
#> ..$ content:List of 8
#> .. ..$ : num 128
#> .. ..$ : int 28
#> .. ..$ : num 292
#> .. ..$ : int 28
#> .. ..$ : num 292
#> .. ..$ : num 988
#> .. ..$ : num 128
#> .. ..$ : num 988
#> ..$ padding:List of 8
#> .. ..$ : num 112
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : num 988
#> .. ..$ : num 112
#> .. ..$ : num 988
#> ..$ border :List of 8
#> .. ..$ : num 112
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : num 988
#> .. ..$ : num 112
#> .. ..$ : num 988
#> ..$ margin :List of 8
#> .. ..$ : int 15
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : int 28
#> .. ..$ : num 308
#> .. ..$ : num 1030
#> .. ..$ : int 15
#> .. ..$ : num 1030
#> ..$ width : int 195
#> ..$ height : int 960
There are some methods in Chromote and ChromoteSession objects which
are written using asynchronous method calls, but conditionally use
wait_for()
so that they can be called either synchronously
or asynchronously. The $screenshot()
method works this way,
for example. You can call b$screenshot(wait_=TRUE)
(which
is the default) for synchronous behavior, or
b$screenshot(wait_=FALSE)
for async behavior.
If you want to write a function that can be called in either sync or
async mode, you can use this basic structure: First, construct a promise
chain by calling the CDP methods with wait_=FALSE
. Then, at
the end, if the user used wait_=TRUE
, wait for the promise
to resolve; otherwise, simply return the promise.
getBoxModel <- function(b, selector = "html", wait_ = TRUE) {
p <- b$DOM$getDocument(wait_ = FALSE) %...>%
{ b$DOM$querySelector(.$root$nodeId, selector, wait_ = FALSE) } %...>%
{ b$DOM$getBoxModel(.$nodeId, wait_ = FALSE) }
if (wait_) {
b$wait_for(p)
} else {
p
}
}
# Synchronous call
str(getBoxModel(b, ".sidebar"))
# Asynchronous call
getBoxModel(b, ".sidebar", wait_ = FALSE) %...>%
str()
But, you might be wondering, if we want a synchronous API, why not simply write the synchronous code by calling the individual methods synchronously, and using a normal pipe to connect them, as in:
b$DOM$getDocument() %>%
{ b$DOM$querySelector(.$root$nodeId, ".sidebar") } %>%
{ b$DOM$getBoxModel(.$nodeId) } %>%
str()
There are two reasons for this. The first is that this would require a duplication of all the code for the sync and async code paths. Another reason is that the internal async code can be written to send multiple independent command chains to the ChromoteSession (or multiple ChromoteSessions), and they will be executed concurrently. If there are multiple promise chains, you can do something like the following to wait for all of them to resolve:
In addition to commands The Chrome DevTools Protocol also has events. These are messages that are sent from the browser to the R process when various browser events happen.
As an example, it can be a bit tricky to find out when to take a
screenshot. When you send the browser a command to navigate to a page,
it sends a response immediately, but it may take several more seconds
for it to actually finish loading that page. When it does, the
Page.loadEventFired
event will be fired.
b <- ChromoteSession$new()
# Navigate and wait for Page.loadEventFired.
# Note: these lines are put in a single code block to ensure that there is no
# idle time in between.
{
b$Page$navigate("https://www.r-project.org/")
str(b$Page$loadEventFired())
}
#> List of 1
#> $ timestamp: num 683
With the synchronous API, the call to
b$Page$loadEventFired()
will block until Chromote receives
a Page.loadEventFired
message from the browser. However,
with the async promise API, you would write it like this:
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE) %...>%
{ b$Page$loadEventFired(wait_ = FALSE) } %...>%
{ str(.) }
# Or, more verbosely:
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE)$
then(function(value) {
b$Page$loadEventFired(wait_ = FALSE)
})$
then(function(value) {
str(value)
})
There will be a short delay after running the code before the value is printed.
However, even this is not perfectly reliable, because in some cases,
the browser will navigate to the page before it receives the
loadEventFired
command from Chromote. If that happens, the
load even will have already happened before the browser starts waiting
for it, and it will hang. The correct way to deal with this is to issue
the loadEventFired
command before navigating to
the page, and then wait for the loadEventFired
promise to
resolve.
# This is the correct way to wait for a page to load with async and then chain more commands
p <- b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE)
# A promise chain of more commands after the page has loaded
p$then(function(value) {
str(value)
})
If you want to block until the page has loaded, you can once again
use wait_for()
. For example:
p <- b$Page$loadEventFired(wait_ = FALSE)
b$Page$navigate("https://www.r-project.org/", wait_ = FALSE)
# wait_for returns the last value in the chain, so we can call str() on it
str(b$wait_for(p))
#> List of 1
#> $ timestamp: num 683
Technical note
The Chrome DevTools Protocol itself does not automatically enable event notifications. Normally, you would have to call the
Page.enable
method to turn on event notifications for the Page domain. However, Chromote saves you from needing to do this step by keeping track of how many callbacks there are for each domain. When the number of event callbacks for a domain goes from 0 to 1, Chromote automatically calls$enable()
for that domain, and when it goes from 1 to 0, it it calls$disable()
. Seevignette("commands-and-events")
for more details.
In addition to async events with promises, they can also be used with regular callbacks. For example:
This tells Chromote to call str()
(which prints to the
console) on the message value every single time that a
Page.loadEventFired
event message is received. It will
continue doing this indefinitely. (Calling an event method this way also
increments the event callback counter.)
When an event method is called with a callback, the return value is a function which will cancel the callback, so that it will no longer fire. (The canceller function also decrements the event callback counter. If you lose the canceller function, there is no way to decrement the callback counter back to 0.)
cancel_load_event_callback <- b$Page$loadEventFired(callback_ = str)
# Each of these will cause the callback to fire.
n1 <- b$Page$navigate("https://www.r-project.org/")
n2 <- b$Page$navigate("https://cran.r-project.org/")
cancel_load_event_callback()
# No longer causes the callback to fire.
n3 <- b$Page$navigate("https://www.rstudio.com/")