browserless is an efficient way to interact with a headless browser built on top of puppeteer.
- Compatible with Puppeteer API (text, screenshot, html, pdf).
- Built-in adblocker for canceling unnecessary requests.
- Shell interaction via Browserless CLI.
- Easy Google Lighthouse integration.
- Automatic retry & error handling.
- Sensible good defaults.
You can install it via npm:
$ npm install browserless puppeteer --save
browserless is backed by puppeteer, so you need to install it as well.
You can use it next to puppeteer
, puppeteer-core
or puppeteer-firefox
, interchangeably.
This is a full example showcasing all the browserless capabilities:
const createBrowser = require('browserless')
const termImg = require('term-img')
// First, create a browserless factory
// that it will keep a singleton process running
const browser = createBrowser()
// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache
// with other browser contexts.
const browserless = await browser.createContext()
// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('http://example.com', {
device: 'iPhone 6'
})
console.log(termImg(buffer))
// After your task is done, destroy your browser context
await browserless.destroyContext()
// At the end, gracefully shutdown the browser process
await browser.close()
As you can see, browserless is implemented using a single browser process and creating/destroying specific browser contexts.
If you're already using puppeteer, you can upgrade to use browserless instead almost with no effort.
Additionally, you can use some specific packages in your codebase, interacting with them from puppeteer.
With the command-line interface (CLI) you can interact with browserless methods using a terminal, or through an automated system:
cli.webm
Just install @browserless/cli
globally in your system using your favorite package manager:
npm install -g @browserless/cli
The browserless main method is for creating a headless browser.
const createBrowser = require('browserless')
const browser = createBrowser({
timeout: 25000,
lossyDeviceName: true,
ignoreHTTPSErrors: true
})
Once the browser is initialized, some browser high-level methods are available:
// Now, just call `createContext` for creating a browser tab
const browserless = await browser.createContext({ retry: 2 })
const buffer = await browserless.screenshot('https://example.com')
// You call `destroyContext` to close the browser tab.
await browserless.destroyContext()
The browser keeps running until you explicitly close it:
// At the end, gracefully shutdown the browser process
await browser.close()
You can pass any puppeteer.launch#options.
Additionally, you can setup:
type: string
default: 'Macbook Pro 13'
Sets a consistent device viewport for each page.
type: boolean
default: false
It enables lossy detection over the device descriptor input.
const browserless = require('browserless')({ lossyDeviceName: true })
browserless.getDevice({ device: 'macbook pro 13' })
browserless.getDevice({ device: 'MACBOOK PRO 13' })
browserless.getDevice({ device: 'macbook pro' })
browserless.getDevice({ device: 'macboo pro' })
This setting is oriented for find the device even if the descriptor device name is not exactly the same.
type: string
default: launch
values: 'launch'
| 'connect'
It defines if a browser should be spawned using puppeteer.launch or puppeteer.connect
type: number
default: 30000
This setting will change the default maximum navigation time.
type: Puppeteer
default: puppeteer
|puppeteer-core
|puppeteer-firefox
It's automatically detected based on your dependencies
being supported puppeteer, puppeteer-core or puppeteer-firefox.
After initializing the browser, you can create a browser context which is equivalent to opening a tab:
const browserless = browser.createContext({
retry: 2
})
Every browser context is isolated. They won't share cookies/cache with other browser contexts. They also can contain specific options.
Any browser.createBrowserContext#options can be passed.
Additionally, you can setup:
type: number
default: 2
The number of retries that can be performed before considering a navigation as failed.
It returns the internal Browser instance.
const headlessBrowser = await browser.browser()
console.log('My headless browser PID is', headlessBrowser.process().pid)
It will respawn the internal browser.
const getPID = promise => (await promise).process().pid
console.log('Process PID:', await getPID(browser.browser()))
await browser.respawn()
console.log('Process PID:', await getPID(browser.browser()))
This method is an implementation detail, normally you don't need to call it.
It will close the internal browser.
const { onExit } = require('signal-exit')
// automatically teardown resources after
// `process.exit` is called
onExit(browser.close)
It serializes the content from the target url
into HTML.
const html = await browserless.html('https://example.com')
console.log(html)
// => "<!DOCTYPE html><html><head>…"
See browserless.goto to know all the options and values supported.
It serializes the content from the target url
into plain text.
const text = await browserless.text('https://example.com')
console.log(text)
// => "Example Domain\nThis domain is for use in illustrative…"
See browserless.goto to know all the options and values supported.
It generates the PDF version of a website behind a url
.
const buffer = await browserless.pdf('https://example.com')
console.log(`PDF generated in ${buffer.byteLength()} bytes`)
This method uses the following options by default:
{
margin: '0.35cm',
printBackground: true,
scale: 0.65
}
See browserless.goto to know all the options and values supported.
Also, any page.pdf option is supported.
Additionally, you can setup:
type: string
| string[]
default: '0.35cm'
It sets paper margins. All possible units are:
px
for pixel.in
for inches.cm
for centimeters.mm
for millimeters.
You can pass an object
object specifying each corner side of the paper:
const buffer = await browserless.pdf(url.toString(), {
margin: {
top: '0.35cm',
bottom: '0.35cm',
left: '0.35cm',
right: '0.35cm'
}
})
Or, in case you pass a string
, it will be used for all the sides:
const buffer = await browserless.pdf(url.toString(), {
margin: '0.35cm'
})
It takes a screenshot from the target url
.
const buffer = await browserless.screenshot('https://example.com')
console.log(`Screenshot taken in ${buffer.byteLength()} bytes`)
This method uses the following options by default:
{
device: 'macbook pro 13'
}
See browserless.goto to know all the options and values supported.
Also, any page.screenshot option is supported.
Additionally, you can setup:
type: string
default: 'atom-dark'
When this value is present and the response 'Content-Type'
header is 'json'
, it beautifies HTML markup using Prism.
The syntax highlight theme can be customized, during setup:
- A prism-themes identifier (e.g.,
'dracula'
). - A remote URL (e.g.,
'https://unpkg.com/prism-theme-night-owl'
).
type: string
Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible.
type: object
After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay
You can configure the overlay by specifying:
- browser: It sets the browser image overlay to use, being
light
anddark
supported values. - background: It sets the background to use, being supported to pass:
- An hexadecimal/rgb/rgba color code, eg.
#c1c1c1
. - A CSS gradient, eg.
linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
- An image URL, eg.
https://source.unsplash.com/random/1920x1080
.
- An hexadecimal/rgb/rgba color code, eg.
const buffer = await browserless.screenshot(url.toString(), {
styles: ['.crisp-client, #cookies-policy { display: none; }'],
overlay: {
browser: 'dark',
background:
'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
}
})
It will destroy the current browser context.
const browserless = await browser.createContext({ retry: 0 })
const content = await browserless.html('https://example.com')
await browserless.destroyContext()
type: string
default: 'force'
When force is passed, it avoids recreating the context in case a browser action is being executed.
Used to set specific device descriptions, this method will be the device's settings.
browserless.getDevice({ device: 'Macbook Pro 15' })
// => {
// userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
// viewport: {
// width: 1440,
// height: 900,
// deviceScaleFactor: 2,
// isMobile: false,
// hasTouch: false,
// isLandscape: false
// }
// }
It extends from puppeteer.KnownDevices, adding some missing devices there.
type: string
The device descriptor name. It's used to find the rest presets associated with it.
When lossyDeviceName is enabled, a fuzzy search rather than a strict search will be performed in order to maximize getting a result back.
type: object
An extra viewport settings that will be merged with the device presets.
browserless.getDevice({
device: 'iPad',
viewport: {
isLandscape: true
}
})
type: object
Extra headers that will be merged with the device presets.
browserless.getDevice({
device: 'iPad',
headers: {
'user-agent': 'googlebot'
}
})
It exposes an interface for creating your own evaluate
function, passing you the page
and response
.
The fn
will receive page
and response
as arguments:
const ping = browserless.evaluate((page, response) => ({
statusCode: response.status(),
url: response.url(),
redirectUrls: response.request().redirectChain()
}))
await ping('https://example.com')
// {
// "statusCode": 200,
// "url": "https://example.com/",
// "redirectUrls": []
// }
You don't need to close the page; It will be closed automatically.
Internally, the method performs a [browserless.goto, making it possible to pass extra arguments as a second parameter:
const serialize = browserless.evaluate(page => page.evaluate(() => document.body.innerText), {
waitUntil: 'domcontentloaded'
})
await serialize('https://example.com')
// => '<!DOCTYPE html><html><div>…'
It performs a page.goto with a lot of extra capabilities:
const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })
Any option passed here will bypass to page.goto.
Additionally, you can setup:
type: array
default: []
It sets the ability to abort requests based on the ResourceType.
type: boolean
default: true
It enabled the built-in adblocker by Cliqz](https://www.npmjs.com/package/@cliqz/adblocker) that aborts unnecessary third-party requests associated with ads services.
type: boolean
default: false
Disable CSS animations and transitions, also it sets prefers-reduced-motion consequently.
type: string
| string[]
Click the DOM element matching the given CSS selector.
type: string
default: 'no-preference'
Sets prefers-color-scheme CSS media feature, used to detect if the user has requested the system use a 'light'
or 'dark'
color theme.
type: string
default: 'macbook pro 13'
It specifies the device descriptor used to retrieve userAgent`` and
viewport`.
type: object
An object containing additional HTTP headers to be sent with every request.
const browserless = require('browserless')
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://example.com',
headers: {
'user-agent': 'googlebot',
cookie: 'foo=bar; hello=world'
}
})
This sets visibility: hidden
on the matched elements.
type: string
In case you provide HTML markup, a page.setContent avoiding fetch the content from the target URL.
type: boolean
default: true
When it's false
, it disables JavaScript on the current page.
type: string
default: 'screen'
Changes the CSS media type of the page using page.emulateMediaType.
type: string
| string[]
Injects <script type="module"> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js'
). - Local file (e.g., `'local-file.js').
- Inline code (e.g.,
"document.body.style.backgroundColor = 'red'"
).
const buffer = await browserless.screenshot(url.toString(), {
modules: [
'https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js',
'local-file.js',
"document.body.style.backgroundColor = 'red'"
]
})
type:function
Associate a handler for every request in the page.
type: string
| string[]
Injects <script> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js'
). - Local file (e.g., `'local-file.js').
- Inline code (e.g.,
"document.body.style.backgroundColor = 'red'"
).
const buffer = await browserless.screenshot(url.toString(), {
scripts: [
'https://cdn.jsdelivr.net/npm/jquery@3.4.1/dist/jquery.min.js',
'local-file.js',
"document.body.style.backgroundColor = 'red'"
]
})
Prefer to use modules whenever possible.
type: string
Scroll to the DOM element matching the given CSS selector.
type: string
| string[]
Injects <style> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css'
). - Local file (e.g., `'local-file.css').
- Inline code (e.g.,
"body { background: red; }"
).
const buffer = await browserless.screenshot(url.toString(), {
styles: [
'https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css',
'local-file.css',
'body { background: red; }'
]
})
type: string
It changes the timezone of the page.
type: string
The target URL.
It will setup a custom viewport, using page.setViewport method.
type:string
Wait a quantity of time, selector or function using page.waitForSelector.
type:number
Wait a quantity time in milliseconds.
type: string
| string[]
default: 'auto'
values: 'auto'
| 'load'
| 'domcontentloaded'
| 'networkidle0'
| 'networkidle2'
When to consider navigation successful.
If you provide an array of event strings, navigation is considered to be successful after all events have been fired.
Events can be either:
'auto'
: A combination of'load'
and'networkidle2'
in a smart way to wait the minimum time necessary.'load'
: Consider navigation to be finished when the load event is fired.'domcontentloaded'
: Consider navigation to be finished when the DOMContentLoaded event is fired.'networkidle0'
: Consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.'networkidle2'
: Consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
It returns the BrowserContext associated with your instance.
const browserContext = await browserless.context()
console.log({ isIncognito: browserContext.isIncognito() })
// => { isIncognito: true }
It returns a higher-order function as convenient way to interact with a page:
const getTitle = browserless.withPage((page, goto) => opts => {
const result = await goto(page, opts)
return page.title()
})
The function will be invoked in the following way:
const title = getTitle({ url: 'https://example.com' })
type: function
The function to be executed. It receives page, goto
as arguments.
type: number
default: browserless.timeout
This setting will change the default maximum navigation time.
It returns a standalone Page associated with the current browser context.
const page = await browserless.page()
await page.content()
// => '<html><head></head><body></body></html>'
The @browserless/function
package provides an isolated VM scope to run arbitrary JavaScript code with runtime access to a browser page:
const createFunction = require('@browserless/function')
const code = async ({ page }) => page.evaluate('jQuery.fn.jquery')
const version = createFunction(code)
const { isFulfilled, isRejected, value } = await version('https://jquery.com')
// => {
// isFulfilled: true,
// isRejected: false,
// value: '1.13.1'
// }
Besides the following properties, any other argument provided will be available during the code execution.
The hosted code is also running inside a secure sandbox created via vm2.
Any goto#options can be passed for tuning the internal URL resolution.
The @browserless/lighthouse
package provides you the setup for running Lighthouse reports backed by browserless.
const createLighthouse = require('@browserless/lighthouse')
const createBrowser = require('browserless')
const { writeFile } = require('fs/promises')
const { onExit } = require('signal-exit')
const browser = createBrowser()
onExit(browser.close)
const lighthouse = createLighthouse(async teardown => {
const browserless = await browser.createContext()
teardown(() => browserless.destroyContext())
return browserless
})
const report = await lighthouse('https://microlink.io')
await writeFile('report.json', JSON.stringify(report, null, 2))
The report will be generated for the provided URL. This extends the lighthouse:default
settings. These settings are similar to the Google Chrome Audits reports on Developer Tools.
The Lighthouse configuration that will extend 'lighthouse:default'
settings:
const report = await lighthouse(url, {
onlyAudits: ['accessibility']
})
Also, you can extend from a different preset of settings:
const report = await lighthouse(url, {
preset: 'desktop',
onlyAudits: ['accessibility']
})
Additionally, you can setup:
The lighthouse execution runs as a worker thread, any worker#options are supported.
type: string
default: 'error'
values: 'silent'
| 'error'
| 'info'
| 'verbose'
The level of logging to enable.
type: string
| string[]
default: 'json'
values: 'json'
| 'csv'
| 'html'
The type(s) of report output to be produced.
type: number
default: browserless.timeout
This setting will change the default maximum navigation time.
The @browserless/screencast
package allows you to capture each frame of a browser navigation using puppeteer.
preview.mp4
This API is similar to screenshots, but you have a more granular control over the frame and the output:
const createScreencast = require('@browserless/screencast')
const createBrowser = require('browserless')
const browser = createBrowser()
const browserless = await browser.createContext()
const page = await browserless.page()
const screencast = createScreencast(page, {
maxWidth: 1280,
maxHeight: 800
})
const frames = []
screencast.onFrame(data => frames.push(data))
screencast.start()
await browserless.goto(page, { url, waitForTimeout: 300 })
await screencast.stop()
console.log(frames)
Check a full example generating a GIF as output.
type: object
The Page object.
See Page.startScreencast to know all the options and values supported.
browserless is internally divided into multiple packages, this way you only use code you need.
Q: Why use browserless
over puppeteer
?
browserless does not replace puppeteer, it complements it. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared to just fetching the content from a website.
To speed up the process, we block ad scripts by default because most of them are resource-intensive.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless
environment variable in order to see what is happening behind the code:
Consider opening an issue with the debug trace.
Q: I want to use browserless
with my AWS Lambda like project
Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.
browserless © Microlink, released under the MIT License.
Authored and maintained by Microlink with help from contributors.
The logo has been designed by xinh studio.
microlink.io · GitHub microlinkhq · Twitter @microlinkhq