add docs site based on docusarus (#35)
* add docs site based on docusarus Closes #2 Signed-off-by: Xe Iaso <me@xeiaso.net> * docs: deploy to aeacus Signed-off-by: Xe Iaso <me@xeiaso.net> * ready for merge Signed-off-by: Xe Iaso <me@xeiaso.net> * docs: fix anubis port Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
parent
240159e921
commit
c47347ff76
46 changed files with 20879 additions and 284 deletions
31
docs/docs/CHANGELOG.md
Normal file
31
docs/docs/CHANGELOG.md
Normal file
|
@ -0,0 +1,31 @@
|
|||
---
|
||||
sidebar_position: 999
|
||||
---
|
||||
|
||||
# Changelog
|
||||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
- Documentation has been moved to https://anubis.techaro.lol/ with sources in docs/
|
||||
|
||||
## 1.13.0
|
||||
|
||||
- Proof-of-work challenges are drastically sped up [#19](https://github.com/TecharoHQ/anubis/pull/19)
|
||||
- Docker images are now built with the timestamp set to the commit timestamp
|
||||
- The README now points to TecharoHQ/anubis instead of Xe/x
|
||||
- Images are built using ko instead of `docker buildx build`
|
||||
[#13](https://github.com/TecharoHQ/anubis/pull/13)
|
||||
|
||||
## 1.12.1
|
||||
|
||||
- Phrasing in the `<noscript>` warning was replaced from its original placeholder text to
|
||||
something more suitable for general consumption
|
||||
([fd6903a](https://github.com/TecharoHQ/anubis/commit/fd6903aeed315b8fddee32890d7458a9271e4798)).
|
||||
- Footer links on the check page now point to Techaro's brand
|
||||
([4ebccb1](https://github.com/TecharoHQ/anubis/commit/4ebccb197ec20d024328d7f92cad39bbbe4d6359))
|
||||
- Anubis was imported from [Xe/x](https://github.com/Xe/x).
|
8
docs/docs/admin/_category_.json
Normal file
8
docs/docs/admin/_category_.json
Normal file
|
@ -0,0 +1,8 @@
|
|||
{
|
||||
"label": "Administrative guides",
|
||||
"position": 40,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Tradeoffs and considerations you may want to keep in mind when using Anubis."
|
||||
}
|
||||
}
|
34
docs/docs/admin/caveats-gitea-forgejo.mdx
Normal file
34
docs/docs/admin/caveats-gitea-forgejo.mdx
Normal file
|
@ -0,0 +1,34 @@
|
|||
---
|
||||
title: When using Caddy with Gitea/Forgejo
|
||||
---
|
||||
|
||||
Gitea/Forgejo relies on the reverse proxy setting the `X-Real-Ip` header. Caddy does not do this out of the gate. Modify your Caddyfile like this:
|
||||
|
||||
```python
|
||||
ellenjoe.int.within.lgbt {
|
||||
# ...
|
||||
# diff-remove
|
||||
reverse_proxy http://localhost:3000
|
||||
# diff-add
|
||||
reverse_proxy http://localhost:3000 {
|
||||
# diff-add
|
||||
header_up X-Real-Ip {remote_host}
|
||||
# diff-add
|
||||
}
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
Ensure that Gitea/Forgejo have `[security].REVERSE_PROXY_TRUSTED_PROXIES` set to the IP ranges that Anubis will appear from. Typically this is sufficient:
|
||||
|
||||
```ini
|
||||
[security]
|
||||
REVERSE_PROXY_TRUSTED_PROXIES = 127.0.0.0/8,::1/128
|
||||
```
|
||||
|
||||
However if you are running Anubis in a separate Pod/Deployment in Kubernetes, you may have to adjust this to the IP range of the Pod space in your Container Networking Interface plugin:
|
||||
|
||||
```ini
|
||||
[security]
|
||||
REVERSE_PROXY_TRUSTED_PROXIES = 10.192.0.0/12
|
||||
```
|
8
docs/docs/design/_category_.json
Normal file
8
docs/docs/design/_category_.json
Normal file
|
@ -0,0 +1,8 @@
|
|||
{
|
||||
"label": "Design",
|
||||
"position": 10,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "How Anubis is designed and the tradeoffs it makes."
|
||||
}
|
||||
}
|
121
docs/docs/design/how-anubis-works.mdx
Normal file
121
docs/docs/design/how-anubis-works.mdx
Normal file
|
@ -0,0 +1,121 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: How Anubis works
|
||||
---
|
||||
|
||||
Anubis uses a proof-of-work challenge to ensure that clients are using a modern browser and are able to calculate SHA-256 checksums. Anubis has a customizable difficulty for this proof-of-work challenge, but defaults to 5 leading zeroes.
|
||||
|
||||
```mermaid
|
||||
---
|
||||
title: Challenge generation and validation
|
||||
---
|
||||
|
||||
flowchart TD
|
||||
Backend("Backend")
|
||||
Fail("Fail")
|
||||
|
||||
style PresentChallenge color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
|
||||
style ValidateChallenge color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
|
||||
style Backend color:#FFFFFF, stroke:#00C853, fill:#00C853
|
||||
style Fail color:#FFFFFF, stroke:#FF2962, fill:#FF2962
|
||||
|
||||
subgraph Server
|
||||
PresentChallenge("Present Challenge")
|
||||
ValidateChallenge("Validate Challenge")
|
||||
end
|
||||
|
||||
subgraph Client
|
||||
Main("main.mjs")
|
||||
Worker("Worker")
|
||||
end
|
||||
|
||||
Main -- Request challenge --> PresentChallenge
|
||||
PresentChallenge -- Return challenge & difficulty --> Main
|
||||
Main -- Spawn worker --> Worker
|
||||
Worker -- Successful challenge --> Main
|
||||
Main -- Validate challenge --> ValidateChallenge
|
||||
ValidateChallenge -- Return cookie --> Backend
|
||||
ValidateChallenge -- If anything is wrong --> Fail
|
||||
```
|
||||
|
||||
### Challenge presentation
|
||||
|
||||
Anubis decides to present a challenge using this logic:
|
||||
|
||||
- User-Agent contains `"Mozilla"`
|
||||
- Request path is not in `/.well-known`, `/robots.txt`, or `/favicon.ico`
|
||||
- Request path is not obviously an RSS feed (ends with `.rss`, `.xml`, or `.atom`)
|
||||
|
||||
This should ensure that git clients, RSS readers, and other low-harm clients can get through without issue, but high-risk clients such as browsers and AI scraper bots will get blocked.
|
||||
|
||||
```mermaid
|
||||
---
|
||||
title: Challenge presentation logic
|
||||
---
|
||||
|
||||
flowchart LR
|
||||
Request("Request")
|
||||
Backend("Backend")
|
||||
%%Fail("Fail")
|
||||
PresentChallenge("Present
|
||||
challenge")
|
||||
HasMozilla{"Is browser
|
||||
or scraper?"}
|
||||
HasCookie{"Has cookie?"}
|
||||
HasExpired{"Cookie expired?"}
|
||||
HasSignature{"Has valid
|
||||
signature?"}
|
||||
RandomJitter{"Secondary
|
||||
screening?"}
|
||||
POWPass{"Proof of
|
||||
work valid?"}
|
||||
|
||||
style PresentChallenge color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF
|
||||
style Backend color:#FFFFFF, stroke:#00C853, fill:#00C853
|
||||
%%style Fail color:#FFFFFF, stroke:#FF2962, fill:#FF2962
|
||||
|
||||
Request --> HasMozilla
|
||||
HasMozilla -- Yes --> HasCookie
|
||||
HasMozilla -- No --> Backend
|
||||
HasCookie -- Yes --> HasExpired
|
||||
HasCookie -- No --> PresentChallenge
|
||||
HasExpired -- Yes --> PresentChallenge
|
||||
HasExpired -- No --> HasSignature
|
||||
HasSignature -- Yes --> RandomJitter
|
||||
HasSignature -- No --> PresentChallenge
|
||||
RandomJitter -- Yes --> POWPass
|
||||
RandomJitter -- No --> Backend
|
||||
POWPass -- Yes --> Backend
|
||||
PowPass -- No --> PresentChallenge
|
||||
PresentChallenge -- Back again for another cycle --> Request
|
||||
```
|
||||
|
||||
### Proof of passing challenges
|
||||
|
||||
When a client passes a challenge, Anubis sets an HTTP cookie named `"within.website-x-cmd-anubis-auth"` containing a signed [JWT](https://jwt.io/) (JSON Web Token). This JWT contains the following claims:
|
||||
|
||||
- `challenge`: The challenge string derived from user request metadata
|
||||
- `nonce`: The nonce / iteration number used to generate the passing response
|
||||
- `response`: The hash that passed Anubis' checks
|
||||
- `iat`: When the token was issued
|
||||
- `nbf`: One minute prior to when the token was issued
|
||||
- `exp`: The token's expiry week after the token was issued
|
||||
|
||||
This ensures that the token has enough metadata to prove that the token is valid (due to the token's signature), but also so that the server can independently prove the token is valid. This cookie is allowed to be set without triggering an EU cookie banner notification; but depending on facts and circumstances, you may wish to disclose this to your users.
|
||||
|
||||
### Challenge format
|
||||
|
||||
Challenges are formed by taking some user request metadata and using that to generate a SHA-256 checksum. The following request headers are used:
|
||||
|
||||
- `Accept-Encoding`: The content encodings that the requestor supports, such as gzip.
|
||||
- `Accept-Language`: The language that the requestor would prefer the server respond in, such as English.
|
||||
- `X-Real-Ip`: The IP address of the requestor, as set by a reverse proxy server.
|
||||
- `User-Agent`: The user agent string of the requestor.
|
||||
- The current time in UTC rounded to the nearest week.
|
||||
- The fingerprint (checksum) of Anubis' private ED25519 key.
|
||||
|
||||
This forms a fingerprint of the requestor using metadata that any requestor already is sending. It also uses time as an input, which is known to both the server and requestor due to the nature of linear timelines. Depending on facts and circumstances, you may wish to disclose this to your users.
|
||||
|
||||
### JWT signing
|
||||
|
||||
Anubis uses an ed25519 keypair to sign the JWTs issued when challenges are passed. Anubis will generate a new ed25519 keypair every time it starts. At this time, there is no way to share this keypair between instance of Anubis, but that will be addressed in future versions.
|
28
docs/docs/index.mdx
Normal file
28
docs/docs/index.mdx
Normal file
|
@ -0,0 +1,28 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: Anubis
|
||||
---
|
||||
|
||||
<img
|
||||
width={256}
|
||||
src="/img/happy.webp"
|
||||
alt="A smiling chibi dark-skinned anthro jackal with brown hair and tall ears looking victorious with a thumbs-up"
|
||||
/>
|
||||
|
||||

|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
Anubis [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using a sha256 proof-of-work challenge in order to protect upstream resources from scraper bots.
|
||||
|
||||
This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.
|
||||
|
||||
Anubis is a bit of a nuclear response. This will result in your website being blocked from smaller scrapers and may inhibit "good bots" like the Internet Archive. You can configure [bot policy definitions](./policies) to explicitly allowlist them and we are working on a curated set of "known good" bots to allow for a compromise between discoverability and uptime.
|
||||
|
||||
## Support
|
||||
|
||||
If you run into any issues running Anubis, please [open an issue](https://github.com/TecharoHQ/anubis/issues/new?template=Blank+issue) and tag it with the Anubis tag. Please include all the information I would need to diagnose your issue.
|
||||
|
||||
For live chat, please join the [Patreon](https://patreon.com/cadey) and ask in the Patron discord in the channel `#anubis`.
|
135
docs/docs/installation.mdx
Normal file
135
docs/docs/installation.mdx
Normal file
|
@ -0,0 +1,135 @@
|
|||
---
|
||||
sidebar_position: 20
|
||||
title: Setting up Anubis
|
||||
---
|
||||
|
||||
Anubis is meant to sit between your reverse proxy (such as Nginx or Caddy) and your target service. One instance of Anubis must be used per service you are protecting.
|
||||
|
||||
Anubis is shipped in the Docker repo [`ghcr.io/techarohq/anubis`](https://github.com/TecharoHQ/anubis/pkgs/container/anubis). The following tags exist for your convenience:
|
||||
|
||||
| Tag | Meaning |
|
||||
| :------------------ | :--------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `latest` | The latest [tagged release](https://github.com/TecharoHQ/anubis/releases), if you are in doubt, start here. |
|
||||
| `v<version number>` | The Anubis image for [any given tagged release](https://github.com/TecharoHQ/anubis/tags) |
|
||||
| `main` | The current build on the `main` branch. Only use this if you need the latest and greatest features as they are merged into `main`. |
|
||||
| `pr-<number>` | The build associated with PR `#<number>`. Only use this for debugging issues fixed by a PR. |
|
||||
|
||||
Other methods to install Anubis may exist, but the Docker image is currently the only supported method.
|
||||
|
||||
The Docker image runs Anubis as user ID 1000 and group ID 1000. If you are mounting external volumes into Anubis' container, please be sure they are owned by or writable to this user/group.
|
||||
|
||||
Anubis has very minimal system requirements. I suspect that 128Mi of ram may be sufficient for a large number of concurrent clients. Anubis may be a poor fit for apps that use WebSockets and maintain open connections, but I don't have enough real-world experience to know one way or another.
|
||||
|
||||
Anubis uses these environment variables for configuration:
|
||||
|
||||
| Environment Variable | Default value | Explanation |
|
||||
| :------------------- | :------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `BIND` | `:8923` | The TCP port that Anubis listens on. |
|
||||
| `DIFFICULTY` | `5` | The difficulty of the challenge, or the number of leading zeroes that must be in successful responses. |
|
||||
| `METRICS_BIND` | `:9090` | The TCP port that Anubis serves Prometheus metrics on. |
|
||||
| `POLICY_FNAME` | `/data/cfg/botPolicy.json` | The file containing [bot policy configuration](./policies.md). See the bot policy documentation for more details. |
|
||||
| `SERVE_ROBOTS_TXT` | `false` | If set `true`, Anubis will serve a default `robots.txt` file that disallows all known AI scrapers by name and then additionally disallows every scraper. This is useful if facts and circumstances make it difficult to change the underlying service to serve such a `robots.txt` file. |
|
||||
| `TARGET` | `http://localhost:3923` | The URL of the service that Anubis should forward valid requests to. |
|
||||
|
||||
## Docker compose
|
||||
|
||||
Add Anubis to your compose file pointed at your service:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
anubis-nginx:
|
||||
image: ghcr.io/techarohq/anubis:latest
|
||||
environment:
|
||||
BIND: ":8080"
|
||||
DIFFICULTY: "5"
|
||||
METRICS_BIND: ":9090"
|
||||
SERVE_ROBOTS_TXT: "true"
|
||||
TARGET: "http://nginx"
|
||||
ports:
|
||||
- 8080:8080
|
||||
nginx:
|
||||
image: nginx
|
||||
volumes:
|
||||
- "./www:/usr/share/nginx/html"
|
||||
- "./botPolicy.json:/data/cfg/botPolicy.json"
|
||||
```
|
||||
|
||||
## Kubernetes
|
||||
|
||||
This example makes the following assumptions:
|
||||
|
||||
- Your target service is listening on TCP port `5000`.
|
||||
- Anubis will be listening on port `8080`.
|
||||
|
||||
Attach Anubis to your Deployment:
|
||||
|
||||
```yaml
|
||||
containers:
|
||||
# ...
|
||||
- name: anubis
|
||||
image: ghcr.io/techarohq/anubis:latest
|
||||
imagePullPolicy: Always
|
||||
env:
|
||||
- name: "BIND"
|
||||
value: ":8080"
|
||||
- name: "DIFFICULTY"
|
||||
value: "5"
|
||||
- name: "METRICS_BIND"
|
||||
value: ":9090"
|
||||
- name: "SERVE_ROBOTS_TXT"
|
||||
value: "true"
|
||||
- name: "TARGET"
|
||||
value: "http://localhost:5000"
|
||||
resources:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 128Mi
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 128Mi
|
||||
securityContext:
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
runAsNonRoot: true
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
```
|
||||
|
||||
Then add a Service entry for Anubis:
|
||||
|
||||
```yaml
|
||||
# ...
|
||||
spec:
|
||||
ports:
|
||||
# diff-add
|
||||
- protocol: TCP
|
||||
# diff-add
|
||||
port: 8080
|
||||
# diff-add
|
||||
targetPort: 8080
|
||||
# diff-add
|
||||
name: anubis
|
||||
```
|
||||
|
||||
Then point your Ingress to the Anubis port:
|
||||
|
||||
```yaml
|
||||
rules:
|
||||
- host: git.xeserv.us
|
||||
http:
|
||||
paths:
|
||||
- pathType: Prefix
|
||||
path: "/"
|
||||
backend:
|
||||
service:
|
||||
name: git
|
||||
port:
|
||||
# diff-remove
|
||||
name: http
|
||||
# diff-add
|
||||
name: anubis
|
||||
```
|
81
docs/docs/policies.md
Normal file
81
docs/docs/policies.md
Normal file
|
@ -0,0 +1,81 @@
|
|||
---
|
||||
sidebar_position: 30
|
||||
---
|
||||
|
||||
# Policy Definitions
|
||||
|
||||
Out of the box, Anubis is pretty heavy-handed. It will aggressively challenge everything that might be a browser (usually indicated by having `Mozilla` in its user agent). However, some bots are smart enough to get past the challenge. Some things that look like bots may actually be fine (IE: RSS readers). Some resources need to be visible no matter what. Some resources and remotes are fine to begin with.
|
||||
|
||||
Bot policies let you customize the rules that Anubis uses to allow, deny, or challenge incoming requests. Currently you can set policies by the following matches:
|
||||
|
||||
- Request path
|
||||
- User agent string
|
||||
|
||||
Here's an example rule that denies [Amazonbot](https://developer.amazon.com/en/amazonbot):
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "amazonbot",
|
||||
"user_agent_regex": "Amazonbot",
|
||||
"action": "DENY"
|
||||
}
|
||||
```
|
||||
|
||||
When this rule is evaluated, Anubis will check the `User-Agent` string of the request. If it contains `Amazonbot`, Anubis will send an error page to the user saying that access is denied, but in such a way that makes scrapers think they have correctly loaded the webpage.
|
||||
|
||||
Right now the only kinds of policies you can write are bot policies. Other forms of policies will be added in the future.
|
||||
|
||||
Here is a minimal policy file that will protect against most scraper bots:
|
||||
|
||||
```json
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"name": "well-known",
|
||||
"path_regex": "^/.well-known/.*$",
|
||||
"action": "ALLOW"
|
||||
},
|
||||
{
|
||||
"name": "favicon",
|
||||
"path_regex": "^/favicon.ico$",
|
||||
"action": "ALLOW"
|
||||
},
|
||||
{
|
||||
"name": "robots-txt",
|
||||
"path_regex": "^/robots.txt$",
|
||||
"action": "ALLOW"
|
||||
},
|
||||
{
|
||||
"name": "generic-browser",
|
||||
"user_agent_regex": "Mozilla",
|
||||
"action": "CHALLENGE"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
This allows requests to [`/.well-known`](https://en.wikipedia.org/wiki/Well-known_URI), `/favicon.ico`, `/robots.txt`, and challenges any request that has the word `Mozilla` in its User-Agent string. The [default policy file](https://github.com/TecharoHQ/anubis/blob/main/cmd/anubis/botPolicies.json) is a bit more cohesive, but this should be more than enough for most users.
|
||||
|
||||
If no rules match the request, it is allowed through.
|
||||
|
||||
## Writing your own rules
|
||||
|
||||
There are three actions that can be returned from a rule:
|
||||
|
||||
| Action | Effects |
|
||||
| :---------- | :-------------------------------------------------------------------------------- |
|
||||
| `ALLOW` | Bypass all further checks and send the request to the backend. |
|
||||
| `DENY` | Deny the request and send back an error message that scrapers think is a success. |
|
||||
| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
|
||||
|
||||
Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.
|
||||
|
||||
In case your service needs it for risk calculation reasons, Anubis exposes information about the rules that any requests match using a few headers:
|
||||
|
||||
| Header | Explanation | Example |
|
||||
| :---------------- | :--------------------------------------------------- | :--------------- |
|
||||
| `X-Anubis-Rule` | The name of the rule that was matched | `bot/lightpanda` |
|
||||
| `X-Anubis-Action` | The action that Anubis took in response to that rule | `CHALLENGE` |
|
||||
| `X-Anubis-Status` | The status and how strict Anubis was in its checks | `PASS-FULL` |
|
||||
|
||||
Policy rules are matched using [Go's standard library regular expressions package](https://pkg.go.dev/regexp). You can mess around with the syntax at [regex101.com](https://regex101.com), make sure to select the Golang option.
|
Loading…
Add table
Add a link
Reference in a new issue