Skip to content

Deploying

TheTechRobo edited this page Jan 1, 2026 · 2 revisions

Self-hosting mnbot isn't too difficult.

NOTE! These instructions will change a lot soon, as a rewrite in PostgreSQL is in progress. Buyer beware!

Basic instructions

Debugging

RethinkDB inspection

Each docker-compose.yml has its own RethinkDB instance. For debugging purposes, these can be accessed at:

  • Client: localhost:4186
  • Target: localhost:4184
  • Tracker: localhost:4183

Be careful with the client's RethinkDB instance, as it hosts Warcprox's dedup DB. Deleting records is fine (mnbot does it after 7 days) but if you modify them to point to the wrong record, you'll make warcprox write incorrect revisit records. That's bad.

Outside of Docker, it is ok if you make them use the same RethinkDB instance.

Tracker, IRC bot, dashboard

Outside of docker (recommended for dev)

Requirements:

  • Python 3.11+
  • RethinkDB
  • rethinkdb Python module (latest or else you will run into Python compatibility issues)
  • Latest version of https://github.com/TheTechRobo/rue (it is very much in flux right now)
    • You must initialise the database using util/create.py mnbot <MAX_TRIES>
      • Set MAX_TRIES to 1 during debugging for quicker feedback on failures
    • Then create a server_secrets table in the mnbot database
  • https://github.com/TheTechRobo/bot2h (ideally latest version)
  • validators module

Required environment variables:

  • H2IBOT_GET_URL: The http2irc URL to use for receiving messages.
  • H2IBOT_POST_URL: The http2irc URL to use for sending messages.
  • TRACKER_BASE_URL: Base URL of the dashboard, so it can be referred to in IRC
  • DOCUMENTATION_URL: URL to the documentation, so it can be referred to in IRC
  • INFO_URL: info URL; will be added to the user agent in all jobs that do not use --stealth-ua

In Docker (recommended for prod)

docker-compose is in tracker_dc. Copy .env-example to .env and make changes as necessary. It will initialise the database for you.

Before your first client

Go to the RethinkDB web UI for the tracker, click Data Explorer, then run:

r.db("mnbot").tableCreate("server_secrets")

Now generate a secret in some way. It should be fixed-length and hard to brute-force. A hash or UUID of some sort is recommended.

Then run:

r.db("mnbot").table("server_secrets").insert({"id": "an_interesting_pipeline_id", "val": "your_hash_or_uuid"})

IMPORTANT: It is imperative that each client (EVEN ONES ON THE SAME SERVER) use a different id. The server and IRC bot both assume that this is the case.

Scaling up: In future, it will be possible to make one client run multiple jobs in parallel. The code is there, but it has not yet been audited for race conditions.

Client

Because of all the dependencies, it is recommended to run this in Docker, even during development. This means you will have to docker compose build && docker compose up whenever you make a change.

Copy .env-example to .env and make changes as necessary. You should not need to modify docker-compose.yml, it is self-contained.

(Note: the tracker URL needs basic authentication corresponding to the server_secrets table. This can be done like https://username:password@hostname. Username is the id, password is the val.)

Before spinning up

Try docker compose run chrome_test /bin/chromium.

If Chrome spits out a bunch of dbus warnings, but otherwise starts up, check the VNC output to make sure it really did. You can skip this next section.

If Chrome fails to start with Operation not permitted or Permission denied, check dmesg for conflicts with AppArmor. If AppArmor is complaining, especially if it's about the usage of userns_create, awesome! Read on.

Fixing conflicts with AppArmor

AppArmor sometimes is configured to disable userspace namespacing. Chromium relies on namespacing for sandboxing. (If you're worried about disabling AppArmor protections, remember that the whole reason we're doing this is so the untrusted code can be sandboxed away.)

Many distros offer an existing configuration for Chromium. On my system, these are stored in ls /etc/apparmor.d, and the one we want is called chromium. In case this isn't on your system, either find an existing profile or create a new one similar to this one stored on my system:

abi <abi/4.0>,
include <tunables/global>

profile chromium flags=(unconfined) {
  userns,

  # Site-specific additions and overrides. See local/README for details.
  include if exists <local/chromium>
}

...and either globally install it in your AppArmor directory or load it temporarily with apparmor_parser -r -W /path/to/your_profile. If globally installing it, make sure to reload the configuration; on my system sudo systemctl reload apparmor.service works.)

(For systems that don't support ABI 4.0, like Debian bookworm, click here for one that worked for me.)
abi <abi/3.0>,
include <tunables/global>

profile chromium flags=(unconfined) {
}

Then change the value of APPARMOR_PROFILE in .env. (Changing chromium to something else to avoid conflicting with another profile is fine as long as whatever is in .env reflects the correct name.)

Target

The target is in target. Copy .env-example to .env and make changes as necessary. Build times are slow because Rust, so for debugging, you'll probably want to do it without the container. If any Rust magicians know how to improve this please let me know.

Clone this wiki locally