Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser rules for bots #227

Closed
oyeanuj opened this issue Mar 8, 2017 · 16 comments
Closed

Add parser rules for bots #227

oyeanuj opened this issue Mar 8, 2017 · 16 comments

Comments

@oyeanuj
Copy link

oyeanuj commented Mar 8, 2017

Hi @faisalman! Thank you for putting out this very useful library! I'm wondering if you'd consider adding rules for bots as well, given that they are useful to know with server-rendering, etc.

Here is the latest from Google and Bing for their bots, if that helps:

Google: https://support.google.com/webmasters/answer/1061943?hl=en
Bing: https://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0

Thank you!

cc: @rossnoble if it would then make sense to add to your helper library!

@rossnoble
Copy link
Contributor

Haha, didn't think anyone was using my helper lib. Thanks for the heads up though.

@sashakru
Copy link

+1

@ebbmo
Copy link

ebbmo commented Jun 20, 2017

In general I think this would be an awesome addition to the library, since we currently handle bots using a "sorted" list (at the moment going up to 2160 entries) outside / next to the UA-Parser lib.

At the same time though I think we should NOT add (the vast amount of) bots to the parser, since I usually go this route: If ua-parser can identify it, it's (probably) a human, if not, it's a bot.
=> yes, yes. Anyone can fake User Agents, I know... but that is not my point here ;)

Therefore I would refrain from adding it to the lib.

Any other thoughts from you guys? I can imagine the "speed" of ua-recognition going downhill, but that's just an assumption without having real data to work with (e.g. extending a forked ua-parser with bots to see how fast it recognizes bots / non-bots)

@brianchirls
Copy link

You make a great point @ebbmo. We don't want to bloat the size or the speed of the library with information that not everybody is going to use.

I think a good compromise would be to create a set of bot rules that could optionally be added as an extension. It might make sense in its own repo or as a source file in this one that's only included optionally. However, you'd want the extension to be added at the end of the list, not the beginning. That way, in most cases you'd have a browser that would match earlier, so you'd only have to go through the longer list of bots in those rare cases with no browser match.

This would require a change to the library to allow optionally adding extensions to the end of the regex list.

@ebbmo
Copy link

ebbmo commented Jun 20, 2017

Very good idea @brianchirls.
So we have potentially 2 options:

  1. A lib like ua-parser-with-bots that extends the current ua-parser without touching any of the existing source code
  2. including "isBot" logic (with corresponding fields) in the ua-parser library and only adding the bot recognition like, for example, so: var parser = new UAParser({withBots: true});

Any other options?
@faisalman What do you think?

@faisalman
Copy link
Owner

I'm still considering on how to include any other non-browser agents (such as bots, apps, media players, libraries, cli, etc) but can still offer them as optional, maybe using something like option (2).

To create extensions for option (1) without touching the existing code, you can already make use of it by defining your own regexes that will be added to the end of the selected list, then pass it when instantiating a new parser. Please refer to this example:

var NAME = UAParser.BROWSER.NAME;
var VERSION = UAParser.BROWSER.VERSION;
var TYPE_BOT = ['type', 'bot'];
var botsRegExt = [
  // google, bing, msn
  [/((?:google|bing|msn)bot(?:\-[imagevdo]{5})?)\/([\w\.]+)/i], [NAME, VERSION, TYPE_BOT],
  // bing preview
  [/(bingpreview)\/([\w\.]+)/i], [NAME, VERSION, TYPE_BOT]
];

var agent1 = 'Googlebot-Video/1.0';
var agent2 = 'msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)';
var agent3 = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b';
var agent4 = 'Opera/8.5 (Macintosh; PPC Mac OS X; U; en)';

// try agent1
var parser = new UAParser(agent1, { browser: botsRegExt });
console.log(parser.getBrowser());   // {name: "Googlebot-Video", version: "1.0", type: "bot"}

// try agent2
parser.setUA(agent2);
console.log(parser.getBrowser());   // {name: "msnbot-media", version: "1.1", type: "bot"}

// try agent3
parser.setUA(agent3);
console.log(parser.getBrowser());   // {name: "BingPreview", version: "1.0b", type: "bot"}

// try agent4
parser.setUA(agent4);
console.log(parser.getBrowser());   // {name: "Opera", version: "8.5"}

@brianchirls
Copy link

@faisalman Can you clarify please - do the extension regexes get added to the end or the beginning of the list? It could make a big difference for performance. Thanks.

@faisalman
Copy link
Owner

At this moment, you can only add new regexes to the end of the list (see util.extend).

@faisalman faisalman reopened this Jul 1, 2017
@faisalman
Copy link
Owner

faisalman commented Jul 1, 2017

Sorry for the misclick, reopening this issue again

@extensionsapp
Copy link

+1

@Eliaxs1900
Copy link

Eliaxs1900 commented Mar 26, 2019

I think that this will be very usefull for detecting bots browsers.
It's a work from biggora, called express-useragen, this link is to npm repository
I think that will help you with bot's detect.
I tested and work very well with Culr 👍
PD: this is the userAgent: curl/7.55.1

@jimblue
Copy link

jimblue commented Jun 22, 2019

Friendly ping 😄

@andrei-svistunou
Copy link

Any updates?

@felixmeziere
Copy link

felixmeziere commented Jul 27, 2021

Wish this existed in the library ! :)

@everdrone
Copy link

Another very friendly ping! Chiming in with curl wget requests and scrapy

@jaketrimble
Copy link

FacebookBot

Mozilla/5.0 (compatible; FacebookBot/1.0; +https://developers.facebook.com/docs/sharing/webmasters/facebookbot/)
  • browser: FacebookBot 1.1
  • browser.name: FacebookBot
  • device: Desktop
  • device.family: Spider

faisalman added a commit that referenced this issue Aug 15, 2023
Axios: `axios/VERSION`
https://www.zenrows.com/blog/axios-user-agent#what-is-axios-user-agent

JSDOM: `Mozilla/5.0 (${process.platform || "unknown OS"}) AppleWebKit/537.36 (KHTML, like Gecko) jsdom/${jsdomVersion}`
https://github.com/jsdom/jsdom

Scrapy: `Scrapy/VERSION (+https://scrapy.org)`
https://docs.scrapy.org/en/master/topics/settings.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests