Skip to content

Commit

Permalink
Merge branch 'master' into feature/AG-26623-4
Browse files Browse the repository at this point in the history
  • Loading branch information
scripthunter7 committed Oct 30, 2023
2 parents b3ebae5 + 4b1a799 commit a7202d7
Show file tree
Hide file tree
Showing 52 changed files with 1,711 additions and 644 deletions.
18 changes: 18 additions & 0 deletions packages/css-tokenizer/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# CSS Tokenizer Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog][keepachangelog], and this project adheres to [Semantic Versioning][semver].

## [0.0.1] - 2023-10-30

### Added

- Initial release.

[0.0.1]: https://github.com/AdguardTeam/tsurlfilter/releases/tag/css-tokenizer-v0.0.1
<!-- TODO: Link tag diff later -->
<!-- [0.0.2]: https://github.com/AdguardTeam/tsurlfilter/compare/css-tokenizer-v0.0.1...v0.0.2 -->

[keepachangelog]: https://keepachangelog.com/en/1.0.0/
[semver]: https://semver.org/spec/v2.0.0.html
83 changes: 65 additions & 18 deletions packages/css-tokenizer/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
<!-- omit in toc -->
# CSS / Extended CSS Tokenizer

[![npm-badge]][npm-url] [![install-size-badge]][install-size-url] [![license-badge]][license-url]
Expand All @@ -9,6 +10,28 @@ This library provides two distinct CSS tokenizers:
1. **Extended CSS Tokenizer**: Designed to extend the capabilities of the standard tokenizer, this component introduces
support for special pseudo-classes like `:contains()` and `:xpath()`.

Table of contents:

- [Installation](#installation)
- [Motivation](#motivation)
- [What is Extended CSS?](#what-is-extended-css)
- [Why do we need a custom tokenizer?](#why-do-we-need-a-custom-tokenizer)
- [The solution: Custom function handlers](#the-solution-custom-function-handlers)
- [No new token types](#no-new-token-types)
- [Example usage](#example-usage)
- [API](#api)
- [Benchmark results](#benchmark-results)
- [Ideas \& Questions](#ideas--questions)
- [License](#license)

## Installation

You can install the library using

- [Yarn][yarn-pkg-manager-url] (recommended): `yarn add @adguard/css-tokenizer`
- [NPM][npm-pkg-manager-url]: `npm install @adguard/css-tokenizer`
- [PNPM][pnpm-pkg-manager-url]: `pnpm add @adguard/css-tokenizer`

## Motivation

To appreciate the necessity for a custom tokenizer, it's essential to understand the concept of Extended CSS, recognize
Expand All @@ -17,21 +40,22 @@ the challenges it poses, and discover how we can effectively address these issue
### What is Extended CSS?

Extended CSS is a superset of CSS used by adblockers to provide more robust filtering capabilities. In practical terms,
Extended CSS introduces additional pseudo-classes that are not defined in the CSS specification. Notable examples
include `:contains()` and `:xpath()`:
Extended CSS introduces additional pseudo-classes that are not defined in the CSS specification. For more information,
please refer to the following resources:

- `:contains()`: Empowers the selection of elements based on specific text within their `innerText` property.
- `:xpath()`: Enables selection based on an [XPath expression][xpath-mdn].
<!--markdownlint-disable MD013-->
- <img src="https://cdn.adguard.com/website/github.com/AGLint/adg_logo.svg" width="14px"> [AdGuard: *Extended CSS capabilities*][adg-ext-css]
- <img src="https://cdn.adguard.com/website/github.com/AGLint/ubo_logo.svg" width="14px"> [uBlock Origin: *Procedural cosmetic filters*][ubo-procedural]
- <img src="https://cdn.adguard.com/website/github.com/AGLint/abp_logo.svg" width="14px"> [Adblock Plus: *Extended CSS selectors*][abp-ext-css]
<!--markdownlint-enable MD013-->

### Why do we need a custom tokenizer?

The standard CSS tokenizer cannot consistently handle these special pseudo-classes. Therefore, a custom tokenizer is
required to manage them correctly.

For example, the `:contains()` pseudo-class can have the following syntax:
The standard CSS tokenizer cannot handle Extended CSS's pseudo-classes *in every case*. For example, the `:contains()`
pseudo-class can have the following syntax:

```css
div:contains(aaa'bbb)
div:contains(i'm a parameter)
```

A standard CSS tokenizer interprets the single quotation mark (`'`) as a string delimiter, causing an error due to the
Expand All @@ -43,7 +67,8 @@ The `:xpath()` pseudo-class poses a similar challenge for a standard CSS tokeniz
div:xpath(//*...)
```
A standard tokenizer mistakenly identifies the `/*` sequence as the start of a comment, leading to incorrect parsing.
A standard tokenizer mistakenly identifies the `/*` sequence as the start of a comment, leading to incorrect parsing,
however, the `/*` sequence is the part of the [XPath expression][xpath-mdn].
## The solution: Custom function handlers
Expand Down Expand Up @@ -71,7 +96,7 @@ library maintains compatibility and consistency with CSS-related tools and workf
By preserving the standard CSS token types, we aim to provide users with a reliable and seamless experience while
working with CSS, upholding the integrity of the language as defined by the W3C.
## Example
## Example usage
Here's a straightforward example of how to use the library:
Expand All @@ -81,19 +106,30 @@ Here's a straightforward example of how to use the library:
const { tokenize, tokenizeExtended, getFormattedTokenName } = require('@adguard/css-tokenizer');
// Input to tokenize
const css = `div:contains(aa'bb) { display: none !important; }`;
const CSS_SOURCE = `div:contains(aa'bb) { display: none !important; }`;
const COLUMNS = Object.freeze({
TOKEN: 'Token',
START: 'Start',
END: 'End',
FRAGMENT: 'Fragment'
});
// Prepare table
const rows = [];
rows.push(['Token', 'Start', 'End', 'Fragment']);
// Prepare the data array
const data = [];
// Tokenize the input - feel free to try `tokenize` and `tokenizeExtended`
tokenizeExtended(css, (token, start, end) => {
rows.push([getFormattedTokenName(token), start, end, css.substring(start, end)]);
tokenizeExtended(CSS_SOURCE, (token, start, end) => {
data.push({
[COLUMNS.TOKEN]: getFormattedTokenName(token),
[COLUMNS.START]: start,
[COLUMNS.END]: end,
[COLUMNS.FRAGMENT]: CSS_SOURCE.substring(start, end),
});
});
// Print the tokenization result as a table
console.table(rows);
console.table(data, Object.values(COLUMNS));
```
## API
Expand Down Expand Up @@ -122,6 +158,10 @@ manage pseudo-classes and have only one argument: the shared tokenizer context o
> plan to integrate this library into CSSTree via our [ECSSTree library][ecss-tree-repo], see
> [this issue][css-tree-issue] for more details.
## Benchmark results
You can find the benchmark results in the [benchmark/RESULTS.md][benchmark-results] file.
## Ideas & Questions
If you have any questions or ideas for new features, please [open an issue][new-issue-url] or a
Expand All @@ -131,6 +171,9 @@ If you have any questions or ideas for new features, please [open an issue][new-
This project is licensed under the MIT license. See the [LICENSE][license-url] file for details.
[abp-ext-css]: https://help.eyeo.com/adblockplus/how-to-write-filters#elemhide-emulation
[adg-ext-css]: https://github.com/AdguardTeam/ExtendedCss/blob/master/README.md
[benchmark-results]: https://github.com/AdguardTeam/tsurlfilter/blob/master/packages/css-tokenizer/benchmark/RESULTS.md
[css-syntax]: https://www.w3.org/TR/css-syntax-3/
[css-tree-issue]: https://github.com/csstree/csstree/issues/253
[css-tree-repo]: https://github.com/csstree/csstree
Expand All @@ -142,5 +185,9 @@ This project is licensed under the MIT license. See the [LICENSE][license-url] f
[license-url]: https://github.com/AdguardTeam/tsurlfilter/blob/master/packages/css-tokenizer/LICENSE
[new-issue-url]: https://github.com/AdguardTeam/tsurlfilter/issues/new
[npm-badge]: https://img.shields.io/npm/v/@adguard/css-tokenizer
[npm-pkg-manager-url]: https://www.npmjs.com/get-npm
[npm-url]: https://www.npmjs.com/package/@adguard/css-tokenizer
[pnpm-pkg-manager-url]: https://pnpm.js.org/
[ubo-procedural]: https://github.com/gorhill/uBlock/wiki/Procedural-cosmetic-filters
[xpath-mdn]: https://developer.mozilla.org/en-US/docs/Web/XPath
[yarn-pkg-manager-url]: https://yarnpkg.com/en/docs/install
10 changes: 10 additions & 0 deletions packages/css-tokenizer/benchmark/.eslintrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
/**
* @file ESLint configuration for the benchmark folder.
*/

module.exports = {
rules: {
'import/no-extraneous-dependencies': 'off',
'@typescript-eslint/no-loop-func': 'off',
},
};
34 changes: 31 additions & 3 deletions packages/css-tokenizer/benchmark/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,41 @@
# CSS Tokenizer benchmark

This benchmark is used to compare the performance of the CSS Tokenizers.
This benchmark serves as a tool for evaluating the performance of CSS Tokenizers.

The benchmark results can be found in [`benchmark/RESULTS.md`][results].

## Usage

Simply run the following command to run the benchmark:
To run the benchmark, simply execute the following command:

```sh
yarn benchmark
```

This will run the build for the library and then run the benchmark.
This command will build the library and initiate the benchmark. The results will be displayed on the console and saved
in [`benchmark/RESULTS.md`][results].

> [!NOTE]
> Please be aware that the benchmark may take several minutes to complete.
## Supported tokenizers

You can find the list of supported tokenizers in [`config/tokenizers.ts`][tokenizers-config].

We exclusively support tokenizers that adhere to the [CSS Syntax specification][css-specs]. For example, PostCSS is not
included in this benchmark because it utilizes a custom token set, making it difficult to perform a fair comparison with
other tokenizers.

## Adding a new tokenizer / resource

To incorporate a new tokenizer or resource, follow these steps:

1. Open the appropriate configuration file:
- To add a new tokenizer, edit [`config/tokenizers.ts`][tokenizers-config].
- For adding a new resource, access [`config/resources.ts`][resources-config].
2. Create a new entry, ensuring that it follows the same format as existing entries in the respective file.

[css-specs]: https://www.w3.org/TR/css-syntax-3/
[resources-config]: ./config/resources.ts
[results]: ./RESULTS.md
[tokenizers-config]: ./config/tokenizers.ts
Loading

0 comments on commit a7202d7

Please sign in to comment.