-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser rules for bots #227
Comments
Haha, didn't think anyone was using my helper lib. Thanks for the heads up though. |
+1 |
In general I think this would be an awesome addition to the library, since we currently handle bots using a "sorted" list (at the moment going up to 2160 entries) outside / next to the UA-Parser lib. At the same time though I think we should NOT add (the vast amount of) bots to the parser, since I usually go this route: If ua-parser can identify it, it's (probably) a human, if not, it's a bot. Therefore I would refrain from adding it to the lib. Any other thoughts from you guys? I can imagine the "speed" of ua-recognition going downhill, but that's just an assumption without having real data to work with (e.g. extending a forked ua-parser with bots to see how fast it recognizes bots / non-bots) |
You make a great point @ebbmo. We don't want to bloat the size or the speed of the library with information that not everybody is going to use. I think a good compromise would be to create a set of bot rules that could optionally be added as an extension. It might make sense in its own repo or as a source file in this one that's only included optionally. However, you'd want the extension to be added at the end of the list, not the beginning. That way, in most cases you'd have a browser that would match earlier, so you'd only have to go through the longer list of bots in those rare cases with no browser match. This would require a change to the library to allow optionally adding extensions to the end of the regex list. |
Very good idea @brianchirls.
Any other options? |
I'm still considering on how to include any other non-browser agents (such as bots, apps, media players, libraries, cli, etc) but can still offer them as optional, maybe using something like option (2). To create extensions for option (1) without touching the existing code, you can already make use of it by defining your own regexes that will be added to the end of the selected list, then pass it when instantiating a new parser. Please refer to this example: var NAME = UAParser.BROWSER.NAME;
var VERSION = UAParser.BROWSER.VERSION;
var TYPE_BOT = ['type', 'bot'];
var botsRegExt = [
// google, bing, msn
[/((?:google|bing|msn)bot(?:\-[imagevdo]{5})?)\/([\w\.]+)/i], [NAME, VERSION, TYPE_BOT],
// bing preview
[/(bingpreview)\/([\w\.]+)/i], [NAME, VERSION, TYPE_BOT]
];
var agent1 = 'Googlebot-Video/1.0';
var agent2 = 'msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)';
var agent3 = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b';
var agent4 = 'Opera/8.5 (Macintosh; PPC Mac OS X; U; en)';
// try agent1
var parser = new UAParser(agent1, { browser: botsRegExt });
console.log(parser.getBrowser()); // {name: "Googlebot-Video", version: "1.0", type: "bot"}
// try agent2
parser.setUA(agent2);
console.log(parser.getBrowser()); // {name: "msnbot-media", version: "1.1", type: "bot"}
// try agent3
parser.setUA(agent3);
console.log(parser.getBrowser()); // {name: "BingPreview", version: "1.0b", type: "bot"}
// try agent4
parser.setUA(agent4);
console.log(parser.getBrowser()); // {name: "Opera", version: "8.5"} |
@faisalman Can you clarify please - do the extension regexes get added to the end or the beginning of the list? It could make a big difference for performance. Thanks. |
At this moment, you can only add new regexes to the end of the list (see |
Sorry for the misclick, reopening this issue again |
+1 |
I think that this will be very usefull for detecting bots browsers. |
Friendly ping 😄 |
Any updates? |
Wish this existed in the library ! :) |
Another very friendly ping! Chiming in with curl wget requests and scrapy |
|
Axios: `axios/VERSION` https://www.zenrows.com/blog/axios-user-agent#what-is-axios-user-agent JSDOM: `Mozilla/5.0 (${process.platform || "unknown OS"}) AppleWebKit/537.36 (KHTML, like Gecko) jsdom/${jsdomVersion}` https://github.com/jsdom/jsdom Scrapy: `Scrapy/VERSION (+https://scrapy.org)` https://docs.scrapy.org/en/master/topics/settings.html
Hi @faisalman! Thank you for putting out this very useful library! I'm wondering if you'd consider adding rules for bots as well, given that they are useful to know with server-rendering, etc.
Here is the latest from Google and Bing for their bots, if that helps:
Google: https://support.google.com/webmasters/answer/1061943?hl=en
Bing: https://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0
Thank you!
cc: @rossnoble if it would then make sense to add to your helper library!
The text was updated successfully, but these errors were encountered: