Skip to content

Conversation

@7layermagik
Copy link

Summary

  • server-setup.sh: Automated Ubuntu 24.04 installation for bare metal servers (tested on Hetzner AX102)

    • install mode: Fresh OS installation from rescue/recovery environment
    • harden mode: User creation, SSH key setup, security hardening
    • status mode: Audit current security configuration
  • disk-setup.sh: NVMe storage configuration for Mithril

    • --setup: Interactive wizard for formatting/mounting drives (offers benchmarks)
    • --benchmark: Standalone 4K random IOPS testing
    • --info: Display current disk layout
    • Supports multi-disk configurations (separate AccountsDB and ledger drives)
  • performance-tune.sh: Kernel and I/O performance optimizations

    • CPU governor tuning
    • Memory and swap optimizations
    • Block device I/O scheduling
  • README.md: Comprehensive documentation with Quick Start guide

Test plan

  • Tested server-setup.sh install mode on Hetzner AX102 rescue environment
  • Tested server-setup.sh harden mode on fresh Ubuntu installation
  • Tested disk-setup.sh --benchmark on real NVMe drives (Micron 2TB, Samsung 512GB)
  • Tested disk-setup.sh --setup wizard flow
  • Verified passwordless sudo works after install
  • Verified SSH key authentication works

🤖 Generated with Claude Code

7layermagik and others added 30 commits December 26, 2025 20:02
Add comprehensive beginner-friendly scripts for Mithril node operators:

- server-setup.sh: Fresh Ubuntu 24.04 install (rescue mode) or security
  hardening for existing systems. Configures SSH keys, fail2ban, UFW,
  unattended-upgrades, chrony, haveged, journald limits.

- disk-setup.sh: NVMe benchmarking (fio random 4K IOPS), drive formatting
  with optimal settings, directory structure creation, and data reset
  commands.

- performance-tune.sh: Kernel parameter tuning, I/O scheduler optimization
  (none for NVMe), CPU performance mode, huge pages, and Go runtime tips.

- scripts/README.md: Comprehensive documentation explaining storage concepts,
  partition layouts, over-provisioning, and step-by-step usage guides.

Also updates main README.md with hardware requirements, getting started
guide, configuration examples, and troubleshooting tips.

Removes deprecated lvm_setup scripts in favor of the new unified approach.
- Fix part_path() to detect NVMe by name pattern instead of checking if
  block device exists (fails during partitioning when device not yet created)
- Change /mnt/mithril-data to /mnt/mithril (aligns with scratch_directory concept)
- Add RAM-based swap size recommendations
- Simplify prompts with clear defaults and '[press Enter for...]' guidance
- Remove IPv6 DHCP prompt (always enabled, harmless if unsupported)
- Clarify UFW allows all outgoing connections (RPC, Overcast, snapshots)
After running server-setup.sh install in rescue mode and rebooting, users
need clear guidance on how to continue. Added a "What To Do Next" section
explaining:
- SSH in as the admin user (not root)
- Clone the mithril repo
- Run disk-setup.sh and performance-tune.sh with sudo
- Why sudo is needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When selecting the OS disk, remind users with multiple drives that Ubuntu
can go on a slower disk - save the fastest NVMe for AccountsDB which will
be configured later in disk-setup.sh.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When user enters an invalid disk path or a disk with mounted partitions,
show the error and re-prompt instead of exiting the script entirely.
Also add "(Press Ctrl+C to exit)" hint so users know how to quit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change "Hostname for this server [mithril-node]:" to
"Hostname for this server [e.g. mithril-node]:" to make it clearer
that the value in brackets is a suggestion/example.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explanation that the SSH key allows remote login from another computer
(like a personal laptop), and rephrase the instruction to be clearer about
where to run the command to get the public key.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Show users how to check for an existing key (id_ed25519.pub or id_rsa.pub),
and if they don't have one, provide the ssh-keygen command to create one.
Also mention they can optionally set a passphrase to protect the key.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Debian's debootstrap package doesn't include Ubuntu 24.04 (noble) scripts.
This adds a check that downloads the script or creates a symlink fallback.
…storage

Everyone runs disk-setup.sh after install regardless of drive count.
This removes the confusing prompt about single-drive vs multi-drive setup.
…recated haveged

- Add universe repository to sources.list before apt-get install
  (debootstrap only sets up main by default, but fail2ban is in universe)
- Remove haveged package (deprecated in Ubuntu 24.04 - kernel has built-in jitterentropy)
- Fix disk-setup.sh get_root_disk() for rescue mode (overlay/tmpfs detection)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show installed SSH key before reboot so users can verify
- Add simple 'reboot' command (Hetzner rescue is one-boot-only)
- Remove stale haveged reference from completion output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add /dev/pts mount for proper PTY support in chroot
- Make apt-get output quieter with -qq flag
- Add DISK LAYOUT section showing lsblk output at completion
- Fix color code rendering with echo -e
- Streamline SSH key verification display
- Update NEXT STEPS to match README Quick Start format
The script unmounts /mnt after completion, so the manual cat command
wouldn't work. Now shows the remount step in both the script output
and README.
When users SSH into rescue mode before installing Ubuntu, their
~/.ssh/known_hosts remembers the rescue system's host key. After
installing Ubuntu with new host keys, SSH shows a scary "REMOTE HOST
IDENTIFICATION HAS CHANGED" warning.

Add ssh-keygen -R step to script output and README to help users
clear the old key before connecting.
Since the user is created with --disabled-password (SSH key only),
adding them to the sudo group isn't enough - sudo still prompts for
a password that doesn't exist. Add /etc/sudoers.d/<user> with NOPASSWD.
The --setup wizard already offers to run benchmarks, so there's no need
to run --benchmark as a separate step. This simplifies the Quick Start
instructions and reduces user friction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed secondary mount point from /mnt/blockstore to /mnt/ledger
  (contains both snapshots and blockstore subdirectories)
- Added path_on_root_disk() safety function to prevent accidental
  deletion of data on the OS disk
- Fixed color escape codes in summary display (use echo -e)
- Updated "Data:" label to "Ledger:" in summary output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Follow Agave's directory naming pattern for consistency:
- /mnt/mithril-accounts (AccountsDB on fast drive)
- /mnt/mithril-ledger (blockstore + snapshots on secondary drive)

Updated all scripts, documentation, and config examples to use
the new naming convention.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added check_disk_deps() to verify parted, mkfs.ext4, mkfs.xfs are installed
- Offers to install missing tools automatically via apt
- Improved ask_filesystem() to accept direct input (ext4/xfs) instead of select
- Now accepts both "ext4"/"xfs" and numbers "1"/"2"
- Invalid input shows helpful error instead of silently returning empty
- Defaults to ext4 if user just presses Enter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When recommending a drive for AccountsDB, now considers the minimum
size requirement (700GB) in addition to speed. If the fastest drive
is too small, recommends the fastest drive that meets the size
requirement and explains why.

This prevents recommending a slightly faster but too-small drive
(e.g., 476GB Samsung vs 1.9TB Micron with <1% IOPS difference).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Separate interrupt handler that explicitly exits with code 130
(standard for SIGINT). Previously the cleanup trap didn't exit,
so the script could continue after Ctrl+C.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7layermagik and others added 5 commits December 26, 2025 20:02
Before: /mnt/mithril-accounts/accountsdb/mithril_db...
After:  /mnt/mithril-accounts/mithril_db...

The mount name already indicates it's for accounts, no need for
a redundant subdirectory. This also updates:

- disk-setup.sh to not create the accountsdb subdir
- clean_accountsdb() to find artifacts at the mount root
- clean_all() to handle the new structure
- Config hints at end of setup to show correct paths

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Simplifies terminology for better user understanding:
- CLI flag: --delete-accountsdb -> --delete-accounts
- Function: clean_accountsdb() -> clean_accounts()
- Messages updated to use "accounts" instead of "AccountsDB"

Technical comments explaining the AccountsDB concept are preserved.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- STEP 1 (AccountsDB): Now shows OS drive option if >= 100GB free
  (was 700GB, matching single-drive mode threshold)
- STEP 2 (Snapshots/Blockstore): Added OS drive option when not
  used for AccountsDB
- OS drive creates partition instead of wiping (safe)
- Summary correctly shows partition vs erase warning per drive
- Confirmation handles mixed scenarios (partition + erase)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When configuring noatime, now displays detected Mithril mount points
(/mnt/mithril-accounts, /mnt/mithril-ledger) as recommended options.
Makes it easier for users to know which mounts to configure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The part_path function was checking if the partition block device
existed to determine the naming format. When creating a new partition,
it doesn't exist yet, so it would incorrectly fall back to the
non-NVMe format (e.g., /dev/nvme1n11 instead of /dev/nvme1n1p1).

Now correctly determines format based on device name pattern:
- NVMe (nvme*): /dev/nvme0n1p1
- MMC/SD (mmcblk*): /dev/mmcblk0p1
- Loop (loop*): /dev/loop0p1
- Traditional (sd*): /dev/sda1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@7layermagik 7layermagik force-pushed the setup-scripts-and-docs branch from 7f2e818 to 288ef0b Compare December 27, 2025 02:02
7layermagik and others added 23 commits December 26, 2025 20:07
The format_disk function was not waiting for the kernel to recognize
the new partition before attempting to format it. This could cause
mkfs to fail silently, resulting in mount failures.

Added partprobe and sleep (matching create_partition_on_disk), plus
verification that the partition block device exists before formatting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The format_disk and create_partition_on_disk functions are called via
command substitution to capture the partition path. Any stdout output
(from info(), echo, mkfs, parted) was being captured along with the
path, causing mount to fail with "bad option" errors.

Now all status messages, command output, and formatting output go to
stderr, leaving only the partition path on stdout.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The spinner message with disk model names can be 55+ characters,
but the clearing only wrote 40 characters. This left partial text
visible on the previous line.

Increased clear width to 80 characters.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Automatically installs fio when running --benchmark if not present,
rather than falling back to less accurate hdparm. Only falls back
to hdparm if fio installation fails.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous per-line >&2 redirects weren't fully preventing stdout
pollution. Using command groups { ... } >&2 ensures ALL output
from partition creation goes to stderr, so only the partition path
is captured when called via command substitution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stdout pollution issue was causing mount failures because garbage
was getting captured into the partition path variable.

Instead of returning the partition path via stdout (which required
complex stderr redirection), format_disk and create_partition_on_disk
now set a global variable FORMATTED_PARTITION that callers read directly.

This completely avoids the command substitution problem.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Run systemctl daemon-reload after adding fstab entries so systemd
immediately picks up the new mount configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of asking for individual mountpoints, detect all Mithril mounts
and apply noatime to all of them with a single Y/n confirmation.
This is cleaner since noatime is a safe, low-risk optimization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add optional default parameter to yesno function
- All BASIC and ADVANCED optimizations now default to Yes
- RISKY ext4 options (barrier=0, data=writeback) still default to No
- Makes it easier to accept all recommended settings by pressing Enter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "none" in the hint looked like it meant "n" for No.
The actual scheduler choice happens in a numbered menu inside the function.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Helps users identify which disk is AccountsDB vs snapshots/blockstore.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When read is inside a 'while read <<< $var' loop, stdin is the
here-string, not the terminal. Must explicitly read from /dev/tty.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The scheduler is typically already set to 'none' (optimal).
Users can skip by pressing Enter, or say 'y' to configure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- I/O scheduler choice now loops until valid input (1/2/3)
- Read-ahead per-device choice now loops until valid (1-6)
- Read-ahead all-devices choice now loops until valid (1-5)
- THP now prompts for choice instead of auto-applying madvise
- Advanced mount options now loops until valid (1-4)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Go runtime tuning is safe and recommended, so it should appear
before the risky experimental ext4 options. New order:
- BASIC: TRIM, sysctl, CPU, noatime
- ADVANCED: I/O scheduler, read-ahead, THP, Go tuning
- EXPERIMENTAL: ext4 barrier/writeback (last, with warning)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merged BASIC and ADVANCED into single RECOMMENDED section since
they're all standard safe optimizations. Only truly risky options
(ext4 barrier=0, data=writeback) remain in EXPERIMENTAL.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
THP options are all variations of "disable or limit" - not a positive
optimization. Modern kernels default to "madvise" which is reasonable.
The --hugepages flag still works for those who want to explicitly
configure it, but it's no longer part of interactive mode or --all.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Go build-time optimizations (GOAMD64) are foundational and should
be shown early. New order groups CPU-related items together:
1. TRIM, sysctl, CPU perf, Go tuning
2. noatime, I/O scheduler, read-ahead
3. EXPERIMENTAL: ext4 options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use case statement for cleaner arch detection
- Set GO_VERSION variable at top for easy updates
- Restore Go 1.25 references (green tea GC improvements)
- Fixed issue where GOARCH wasn't being set before wget

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "Mithril's Simple RPC Server" section with supported methods
- Add "Updating Mithril" section noting no resume support yet
- Move directory structure to Step 2 (shows what scripts create)
- Fix config field name: endpoints -> rpc
- Add remote RPC access example with IP placeholder
- Add upcoming RPC methods: simulation, send tx, leader schedule
- Remove GOAMD64 section (handled by performance-tune.sh)
- Add common utilities (vim, htop, etc) to server-setup.sh
- Update Discord invite link
- Update stage2 defaults for home internet (3s warmup/measure)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add mithril.starter.toml with paths pre-configured for setup scripts
- Update README to mention starter config option
- Keep example config as detailed reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mithril requires CGO for the DataDog/zstd package, which needs a C compiler.
Added Step 3 to install build-essential before Go installation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The disk-setup script runs as root and creates directories owned by root,
but Mithril runs as a regular user. Added instruction to chown the
directories after running disk-setup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@7layermagik 7layermagik merged commit 9d42cbb into dev Dec 27, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants