Update Rules file

Kevin R
2020-08-20 22:58:51 +00:00
parent 5be86c5d12
commit ab46b23422

@@ -1,9 +1,9 @@
At the moment ClearURLs knows three types of rules files. The first and the oldest one is the **[data.json](#datajson)** file. The second one is the **[data.min.json](#dataminjson)** file and the third one is the **[data.minify.json](#dataminifyjson)**. The differences are described below.
At the moment ClearURLs know three types of rules files. The first and the oldest one is the **[data.json](#datajson)** file. The second one is the **[data.min.json](#dataminjson)** file and the third one is the **[data.minify.json](#dataminifyjson)**. The differences are described below.
*Note: If you want to implement one of these files into your project and always retrieve the current files from the repository, please use the GitLab Pages URL to access the files to preserve the GitLab infrastructure. The files are available at https://kevinroebert.gitlab.io/ClearUrls/data/$fileName$.*
# data.json
The **data.json** file saves all rules, exceptions and redirections, that are maintained by the ClearURLs developers and the community. This file is hosted at the GitLab repository of ClearURLs and is saved with an additional hash file, named **[rules.hash](#ruleshash)**, to ensure, that the rules are not faulty after download.
The **data.json** file saves all rules, exceptions, and redirections, that are maintained by the ClearURLs developers and the community. This file is hosted at the GitLab repository of ClearURLs and is saved with an additional hash file, named **[rules.hash](#ruleshash)**, to ensure, that the rules are not faulty after download.
### Structure
@@ -12,16 +12,16 @@ Example of a **data.json** file:
{
"providers": {
"providers name": {
"urlPattern": "(https:\\/\\/|http:\\/\\/)([a-zA-Z0-9-]*\\.)?(domain name)(\\.[a-zA-Z]{2,})(.*\\?.*)",
"urlPattern": "^https?://(?:[a-z0-9-]+\\.)*?domainName\\.com",
"completeProvider": false,
"rules": [
"trackingField=[^\\/|\\?|&]*(\\/|&(amp;)?)?",
],
"exceptions": [
".*(domain name\\.).*(\\/re).*\\/redirector.html\\/.*"
"^https?://(?:[a-z0-9-]+\\.)*?domainName\\.com/re.*/redirector.html/"
],
"redirections": [
".*domain name\\..*\\/.*url\\?.*url=((https%3A%2F%2F|http%3A%2F%2F)[^&]*)"
"^https?://(?:[a-z0-9-]+\\.)*?domainName\\.com.*url\\?.*url=([^&]+)"
]
}
}
@@ -31,57 +31,57 @@ Example of a **data.json** file:
The **data.json** file is a typical JSON file. It is structured in **providers**. Providers are companies or website names, e.g. Google or Reddit. Every provider has the parent element **providers**, it holds the reference to all providers.
Every provider has the following five properties: **urlPattern**, **completeProvider**, **rules**, **exceptions** and **redirections**. The properties **urlPattern** and **completeProvider** must be set. The other properties are optional.
Every provider has the following five properties: **urlPattern**, **completeProvider**, **rules**, **exceptions**, and **redirections**. The properties of **urlPattern** and **completeProvider** must be set. The other properties are optional.
The **urlPattern** is a regular expression, that must match every URL that should be effected by the specified rules, exceptions or redirections of the provider. The typical structure is `<protocols><any subdomains><domain name><TLDs><any directories><?><any fields>`. The question mark expression (`<?>`) is useful, because typically we only want to work with URLs, that have fields.
In most cases the following expression covers all the needs, you only have to substitute the remaining `<>` fields: `(https:\\/\\/|http:\\/\\/)([a-zA-Z0-9-]*\\.)?(<domain name>)(\\.<TLD>)(.*\\?.*)`. If you want to match with every TLD, you can substitute the TLD field with `[a-zA-Z]{2,}`.
The **urlPattern** is a regular expression, that must match every URL that should be affected by the specified rules, exceptions, or redirections of the provider. The typical structure is `<protocols><any subdomains><domain name><TLDs><any directories>?<any fields>`.
In most cases the following expression covers all the needs, you only have to substitute the remaining `<>` fields: `^https?://(?:[a-z0-9-]+\\.)*?<domain name>\\.<TLD>`. If you want to match with every TLD, you can substitute the TLD field with `(?:[a-z]{2,}){1,}`.
The **completeProvider** is a boolean, that determines if every url, that matched the **urlPattern** will blocked. If you want to specify rules, exceptions and/or redirections, the value of **completeProvider** must be `false`.
The **completeProvider** is a boolean, that determines if every url, that matched the **urlPattern** will be blocked. If you want to specify rules, exceptions, and/or redirections, the value of **completeProvider** must be `false`.
The **rules** propertie is a JSON-Array. Ever element in this array is a regular expression, that matched a field. If ClearURLs found a field, that matched a rule, in a given URL, the field will deleted. Only URLs that matches the **url Pattern** will be checked for matching fields.
The **rules** property is a JSON-Array. Every element in this array is a regular expression, that matched a field. If ClearURLs found a field, that matched a rule, in a given URL, the field will be deleted. Only URLs that match the **url Pattern** will be checked for matching fields.
The **exceptions** propertie is also a JSON-Array. Every element in this array is a regular expression, that matched an URL. If ClearURLs found an URL, that matched an exception, then no futher processing on this URL is done.
The **exceptions** property is also a JSON-Array. Every element in this array is a regular expression, that matched an URL. If ClearURLs found an URL, that matched an exception, then no further processing on this URL is done.
The **redirections** propertie is also a JSON-Array. Every element in this array is a regular expression, that matched an URL.
But the regular expression must follow a specification. The first regular expression group (https://www.regular-expressions.info/brackets.html) must be the URL that shall be redirected. If ClearURLs find an URL that matched the redirection, than ClearURLs will `decodeURIComponent()` the first matching group and call this URL.
The **redirections** property is also a JSON-Array. Every element in this array is a regular expression, that matched an URL.
But the regular expression must follow a specification. The first regular expression group (https://www.regular-expressions.info/brackets.html) must be the URL that shall be redirected. If ClearURLs find an URL that matched the redirection than ClearURLs will `decodeURIComponent()` the first matching group and call this URL.
### rules.hash
The rules.hash file saves a SHA256 hash of the **data.json** file. It is used to ensure, that the rules are not faulty after download. **Note that the hash must be in lowercase.**
# data.min.json
The data.min.json file is mostly equals to the data.json file. But with the ClearURLs Version 1.5a, every rule is automatically expanded by the ClearURLs core. So it is no longer nessesary, to write the stuff, that is on every rule equals, into the data.min.json file.
The data.min.json file mostly equals to the data.json file. But with the ClearURLs Version 1.5a, every rule is automatically expanded by the ClearURLs core. So it is no longer necessary, to write the stuff, that is on every rule equals, into the data.min.json file.
From the version 1.5a, every rule has the following structure: `([\\/|\\?]|(&|&amp;))(<field name>=[^\\/|\\?|&]*)`. The only difference between the rules, are the **field names**. The **field names** can still be regular expressions, but they don't have to handle the "well formed field check".
From the version 1.5a, every rule has the following structure: `(?:&amp;|[/?#&])(?:<field name>=[^&]*)`. The only differences between the rules are the **field names**. The **field names** can still be regular expressions, but they don't have to handle the "well-formed field check".
**rawRules**: Because there are other elements in a URL that should be filtered, besides "well formed fields", there is also the **rawRules** property. Regular expressions formulated in this property can refer to the entire URL and not just to individual fields. They are therefore applied directly to the URL.
**rawRules**: Because other elements in a URL that should be filtered, besides "well-formed fields", there is also the **rawRules** property. Regular expressions formulated in this property can refer to the entire URL and not just to individual fields. They are therefore applied directly to the URL.
**referralMarketing**: Since version 1.9.0 you can allow **referral marketing** in ClearURLs. Previously these fields were always removed. Since some people want to support other people through referral marketing, referral marketing can be allowed in the settings. If referral marketing is allowed, the regular expressions in this array are no longer filtered. **By default referral marketing is not allowed**.
**forceRedirection**: Some websites, such as Google, have manipulated URLs (`<a>` tags) in such a way that the URL is no longer called normally, but via a built-in script. The result is that a simple redirect of the URL does no longer works. So that the URL can still be forwarded by ClearURLs, there is the property **forceRedirection**, which writes the URL into the main_frame object of the browser.
Every other propertie is equals to the **data.json** file.
Every other property equals to the **data.json** file.
The example from the **data.json** file looks like the following as a **data.min.json** file:
```json
{
"providers": {
"providers name": {
"urlPattern": "(https:\\/\\/|http:\\/\\/)([a-zA-Z0-9-]*\\.)?(domain name)(\\.[a-zA-Z]{2,})(.*\\?.*)",
"urlPattern": "^https?://(?:[a-z0-9-]+\\.)*?domainName\\.com",
"completeProvider": false,
"rules": [
"trackingField",
],
"rawRules": [
"\\/ref=[^\\/|\\?]*"
"/ref=[^/|?]*"
],
"referralMarketing": [
"([\\/|\\?|#]|(&|&amp;))+(tag=[^\\/|\\?|&]*)"
"(?:&amp;|[/?#&])(?:tag=[^/|?|&]*)"
],
"exceptions": [
".*(domain name\\.).*(\\/re).*\\/redirector.html\\/.*"
"^https?://(?:[a-z0-9-]+\\.)*?domainName\\.com/re.*/redirector.html/"
],
"redirections": [
".*domain name\\..*\\/.*url\\?.*url=((https%3A%2F%2F|http%3A%2F%2F)[^&]*)"
"^https?://(?:[a-z0-9-]+\\.)*?domainName\\.com.*url\\?.*url=([^&]+)"
],
"forceRedirection": false
}
@@ -89,10 +89,10 @@ The example from the **data.json** file looks like the following as a **data.min
}
```
It is **highly recommended** to use the **data.min.json** file, because this file will be used by all new versions of ClearURLs.
It is **highly recommended** to use the **data.min.json** file because this file will be used by all new versions of ClearURLs.
### rules.min.hash
For the **data.min.json** file, the GitLab CI-Runner automatically generates the nessesary [**rules.min.hash** file](https://gitlab.com/KevinRoebert/ClearUrls/-/jobs/artifacts/master/raw/rules.min.hash?job=hash%20rules). **Note that the hash must be in lowercase.**
For the **data.min.json** file, the GitLab CI-Runner automatically generates the necessary [**rules.min.hash** file](https://gitlab.com/KevinRoebert/ClearUrls/-/jobs/artifacts/master/raw/rules.min.hash?job=hash%20rules). **Note that the hash must be in lowercase.**
# data.minify.json
The [**data.minify.json**](https://gitlab.com/KevinRoebert/ClearUrls/-/jobs/artifacts/master/raw/data.minify.json?job=hash%20rules) file is automatically generated by the GitLab CI-Runner from the **data.min.json** file and also the nessesary [**rules.minify.hash** file](https://gitlab.com/KevinRoebert/ClearUrls/-/jobs/artifacts/master/raw/rules.minify.hash?job=hash%20rules). The **data.minify.json** is a minimal version of the **data.min.json** file where all line breaks and spaces, as well as default values and empty lists are removed, to save bandwidth.
The [**data.minify.json**](https://gitlab.com/KevinRoebert/ClearUrls/-/jobs/artifacts/master/raw/data.minify.json?job=hash%20rules) file is automatically generated by the GitLab CI-Runner from the **data.min.json** file and also the necessary [**rules.minify.hash** file](https://gitlab.com/KevinRoebert/ClearUrls/-/jobs/artifacts/master/raw/rules.minify.hash?job=hash%20rules). The **data.minify.json** is a minimal version of the **data.min.json** file where all line breaks and spaces, as well as default values and empty lists, are removed, to save bandwidth.