Allow filtering by remote addresses (#52)

* Added the possibility to define rules for remote addresses

* Added change in changelog

* Added check for X-Real-Ip and X-Forwarded-For when checking for remote address filtering

* cmd/anubis: refine IP filtering logic

* Optimize the configuration so that the IP trie is created once at
  application start instead of dynamically being created every request.
* Document the changes in the changelog and docs site.
* Allow pure IP range filtering.
* Allow user agent based IP range filtering.
* Allow path based IP range filtering.
* Create --debug-x-real-ip-default flag for testing Anubis locally
  without a HTTP load balancer.

---------

Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
Remilia Da Costa Faro 2025-03-21 20:39:34 +01:00 committed by GitHub
parent e7b9b17b92
commit d6d879133e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 554 additions and 27 deletions

View file

@ -40,6 +40,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [KagiBot](https://kagi.com/bot) is allowed through the filter [#44](https://github.com/TecharoHQ/anubis/pull/44)
- Fixed hang when navigator.hardwareConcurrency is undefined
- Support Unix domain sockets [#45](https://github.com/TecharoHQ/anubis/pull/45)
- Allow filtering by remote addresses:
```json
{
"name": "qwantbot",
"user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/",
"action": "ALLOW",
"remote_addresses": ["91.242.162.0/24"]
}
```
This also works at an IP range level:
```json
{
"name": "internal-network",
"action": "ALLOW",
"remote_addresses": ["100.64.0.0/10"]
}
```
## 1.13.0

View file

@ -68,6 +68,8 @@ There are three actions that can be returned from a rule:
Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.
### Challenge configuration
Rules can also have their own challenge settings. These are customized using the `"challenge"` key. For example, here is a rule that makes challenges artificially hard for connections with the substring "bot" in their user agent:
```json
@ -91,6 +93,33 @@ Challenges can be configured with these settings:
| `report_as` | `4` | What difficulty the UI should report to the user. Useful for messing with industrial-scale scraping efforts. |
| `algorithm` | `"fast"` | The algorithm used on the client to run proof-of-work calculations. This must be set to `"fast"` or `"slow"`. See [Proof-of-Work Algorithm Selection](./algorithm-selection) for more details. |
### Remote IP based filtering
The `remote_addresses` field of a Bot rule allows you to set the IP range that this ruleset applies to.
For example, you can allow a search engine to connect if and only if its IP address matches the ones they published:
```json
{
"name": "qwantbot",
"user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/",
"action": "ALLOW",
"remote_addresses": ["91.242.162.0/24"]
}
```
This also works at an IP range level without any other checks:
```json
{
"name": "internal-network",
"action": "ALLOW",
"remote_addresses": ["100.64.0.0/10"]
}
```
## Risk calculation for downstream services
In case your service needs it for risk calculation reasons, Anubis exposes information about the rules that any requests match using a few headers:
| Header | Explanation | Example |