Add parser rules for bots #227

oyeanuj · 2017-03-08T01:26:33Z

Hi @faisalman! Thank you for putting out this very useful library! I'm wondering if you'd consider adding rules for bots as well, given that they are useful to know with server-rendering, etc.

Here is the latest from Google and Bing for their bots, if that helps:

Google: https://support.google.com/webmasters/answer/1061943?hl=en
Bing: https://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0

Thank you!

cc: @rossnoble if it would then make sense to add to your helper library!

rossnoble · 2017-03-16T21:10:30Z

Haha, didn't think anyone was using my helper lib. Thanks for the heads up though.

sashakru · 2017-04-13T07:20:08Z

+1

ebbmo · 2017-06-20T21:22:02Z

In general I think this would be an awesome addition to the library, since we currently handle bots using a "sorted" list (at the moment going up to 2160 entries) outside / next to the UA-Parser lib.

At the same time though I think we should NOT add (the vast amount of) bots to the parser, since I usually go this route: If ua-parser can identify it, it's (probably) a human, if not, it's a bot.
=> yes, yes. Anyone can fake User Agents, I know... but that is not my point here ;)

Therefore I would refrain from adding it to the lib.

Any other thoughts from you guys? I can imagine the "speed" of ua-recognition going downhill, but that's just an assumption without having real data to work with (e.g. extending a forked ua-parser with bots to see how fast it recognizes bots / non-bots)

brianchirls · 2017-06-20T21:36:59Z

You make a great point @ebbmo. We don't want to bloat the size or the speed of the library with information that not everybody is going to use.

I think a good compromise would be to create a set of bot rules that could optionally be added as an extension. It might make sense in its own repo or as a source file in this one that's only included optionally. However, you'd want the extension to be added at the end of the list, not the beginning. That way, in most cases you'd have a browser that would match earlier, so you'd only have to go through the longer list of bots in those rare cases with no browser match.

This would require a change to the library to allow optionally adding extensions to the end of the regex list.

ebbmo · 2017-06-20T21:53:51Z

Very good idea @brianchirls.
So we have potentially 2 options:

A lib like ua-parser-with-bots that extends the current ua-parser without touching any of the existing source code
including "isBot" logic (with corresponding fields) in the ua-parser library and only adding the bot recognition like, for example, so: var parser = new UAParser({withBots: true});

Any other options?
@faisalman What do you think?

faisalman · 2017-06-22T19:18:36Z

I'm still considering on how to include any other non-browser agents (such as bots, apps, media players, libraries, cli, etc) but can still offer them as optional, maybe using something like option (2).

To create extensions for option (1) without touching the existing code, you can already make use of it by defining your own regexes that will be added to the end of the selected list, then pass it when instantiating a new parser. Please refer to this example:

var NAME = UAParser.BROWSER.NAME;
var VERSION = UAParser.BROWSER.VERSION;
var TYPE_BOT = ['type', 'bot'];
var botsRegExt = [
  // google, bing, msn
  [/((?:google|bing|msn)bot(?:\-[imagevdo]{5})?)\/([\w\.]+)/i], [NAME, VERSION, TYPE_BOT],
  // bing preview
  [/(bingpreview)\/([\w\.]+)/i], [NAME, VERSION, TYPE_BOT]
];

var agent1 = 'Googlebot-Video/1.0';
var agent2 = 'msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)';
var agent3 = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b';
var agent4 = 'Opera/8.5 (Macintosh; PPC Mac OS X; U; en)';

// try agent1
var parser = new UAParser(agent1, { browser: botsRegExt });
console.log(parser.getBrowser());   // {name: "Googlebot-Video", version: "1.0", type: "bot"}

// try agent2
parser.setUA(agent2);
console.log(parser.getBrowser());   // {name: "msnbot-media", version: "1.1", type: "bot"}

// try agent3
parser.setUA(agent3);
console.log(parser.getBrowser());   // {name: "BingPreview", version: "1.0b", type: "bot"}

// try agent4
parser.setUA(agent4);
console.log(parser.getBrowser());   // {name: "Opera", version: "8.5"}

brianchirls · 2017-06-22T19:56:26Z

@faisalman Can you clarify please - do the extension regexes get added to the end or the beginning of the list? It could make a big difference for performance. Thanks.

faisalman · 2017-07-01T12:13:47Z

At this moment, you can only add new regexes to the end of the list (see util.extend).

faisalman · 2017-07-01T12:16:36Z

Sorry for the misclick, reopening this issue again

extensionsapp · 2019-02-13T15:49:08Z

+1

Eliaxs1900 · 2019-03-26T11:38:18Z

I think that this will be very usefull for detecting bots browsers.
It's a work from biggora, called express-useragen, this link is to npm repository
I think that will help you with bot's detect.
I tested and work very well with Culr 👍
PD: this is the userAgent: curl/7.55.1

jimblue · 2019-06-22T20:36:13Z

Friendly ping 😄

andrei-svistunou · 2020-12-17T13:48:24Z

Any updates?

felixmeziere · 2021-07-27T18:22:55Z

Wish this existed in the library ! :)

everdrone · 2022-04-17T13:34:57Z

Another very friendly ping! Chiming in with curl wget requests and scrapy

jaketrimble · 2022-06-16T03:52:43Z

FacebookBot

Mozilla/5.0 (compatible; FacebookBot/1.0; +https://developers.facebook.com/docs/sharing/webmasters/facebookbot/)

browser: FacebookBot 1.1
browser.name: FacebookBot
device: Desktop
device.family: Spider

Axios: `axios/VERSION` https://www.zenrows.com/blog/axios-user-agent#what-is-axios-user-agent JSDOM: `Mozilla/5.0 (${process.platform || "unknown OS"}) AppleWebKit/537.36 (KHTML, like Gecko) jsdom/${jsdomVersion}` https://github.com/jsdom/jsdom Scrapy: `Scrapy/VERSION (+https://scrapy.org)` https://docs.scrapy.org/en/master/topics/settings.html

faisalman added Open Discussion Feature Request labels May 23, 2017

faisalman mentioned this issue Jun 22, 2017

Feature request: detect crawler #156

Closed

faisalman closed this as completed Jul 1, 2017

faisalman reopened this Jul 1, 2017

faisalman mentioned this issue Dec 26, 2017

Unable to detect Bot - #284

Closed

faisalman mentioned this issue Oct 30, 2018

Can add VLC to detection (maybe as browser)? #348

Closed

rmccue mentioned this issue May 31, 2021

Store bot status in attributes humanmade/aws-analytics#217

Closed

faisalman mentioned this issue Jan 31, 2023

Parse data for search engine user-agent #524

Closed

faisalman closed this as completed in 7a4fe6f Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser rules for bots #227

Add parser rules for bots #227

oyeanuj commented Mar 8, 2017

rossnoble commented Mar 16, 2017

sashakru commented Apr 13, 2017

ebbmo commented Jun 20, 2017

brianchirls commented Jun 20, 2017

ebbmo commented Jun 20, 2017

faisalman commented Jun 22, 2017

brianchirls commented Jun 22, 2017

faisalman commented Jul 1, 2017

faisalman commented Jul 1, 2017 •

edited

Loading

extensionsapp commented Feb 13, 2019

Eliaxs1900 commented Mar 26, 2019 •

edited

Loading

jimblue commented Jun 22, 2019

andrei-svistunou commented Dec 17, 2020

felixmeziere commented Jul 27, 2021 •

edited

Loading

everdrone commented Apr 17, 2022

jaketrimble commented Jun 16, 2022

Add parser rules for bots #227

Add parser rules for bots #227

Comments

oyeanuj commented Mar 8, 2017

rossnoble commented Mar 16, 2017

sashakru commented Apr 13, 2017

ebbmo commented Jun 20, 2017

brianchirls commented Jun 20, 2017

ebbmo commented Jun 20, 2017

faisalman commented Jun 22, 2017

brianchirls commented Jun 22, 2017

faisalman commented Jul 1, 2017

faisalman commented Jul 1, 2017 • edited Loading

extensionsapp commented Feb 13, 2019

Eliaxs1900 commented Mar 26, 2019 • edited Loading

jimblue commented Jun 22, 2019

andrei-svistunou commented Dec 17, 2020

felixmeziere commented Jul 27, 2021 • edited Loading

everdrone commented Apr 17, 2022

jaketrimble commented Jun 16, 2022

faisalman commented Jul 1, 2017 •

edited

Loading

Eliaxs1900 commented Mar 26, 2019 •

edited

Loading

felixmeziere commented Jul 27, 2021 •

edited

Loading