forked from firedancer-io/radiance
-
Notifications
You must be signed in to change notification settings - Fork 19
Add server setup and disk management scripts #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+5,636
−796
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add comprehensive beginner-friendly scripts for Mithril node operators: - server-setup.sh: Fresh Ubuntu 24.04 install (rescue mode) or security hardening for existing systems. Configures SSH keys, fail2ban, UFW, unattended-upgrades, chrony, haveged, journald limits. - disk-setup.sh: NVMe benchmarking (fio random 4K IOPS), drive formatting with optimal settings, directory structure creation, and data reset commands. - performance-tune.sh: Kernel parameter tuning, I/O scheduler optimization (none for NVMe), CPU performance mode, huge pages, and Go runtime tips. - scripts/README.md: Comprehensive documentation explaining storage concepts, partition layouts, over-provisioning, and step-by-step usage guides. Also updates main README.md with hardware requirements, getting started guide, configuration examples, and troubleshooting tips. Removes deprecated lvm_setup scripts in favor of the new unified approach.
- Fix part_path() to detect NVMe by name pattern instead of checking if block device exists (fails during partitioning when device not yet created) - Change /mnt/mithril-data to /mnt/mithril (aligns with scratch_directory concept) - Add RAM-based swap size recommendations - Simplify prompts with clear defaults and '[press Enter for...]' guidance - Remove IPv6 DHCP prompt (always enabled, harmless if unsupported) - Clarify UFW allows all outgoing connections (RPC, Overcast, snapshots)
After running server-setup.sh install in rescue mode and rebooting, users need clear guidance on how to continue. Added a "What To Do Next" section explaining: - SSH in as the admin user (not root) - Clone the mithril repo - Run disk-setup.sh and performance-tune.sh with sudo - Why sudo is needed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When selecting the OS disk, remind users with multiple drives that Ubuntu can go on a slower disk - save the fastest NVMe for AccountsDB which will be configured later in disk-setup.sh. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When user enters an invalid disk path or a disk with mounted partitions, show the error and re-prompt instead of exiting the script entirely. Also add "(Press Ctrl+C to exit)" hint so users know how to quit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change "Hostname for this server [mithril-node]:" to "Hostname for this server [e.g. mithril-node]:" to make it clearer that the value in brackets is a suggestion/example. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explanation that the SSH key allows remote login from another computer (like a personal laptop), and rephrase the instruction to be clearer about where to run the command to get the public key. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Show users how to check for an existing key (id_ed25519.pub or id_rsa.pub), and if they don't have one, provide the ssh-keygen command to create one. Also mention they can optionally set a passphrase to protect the key. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Debian's debootstrap package doesn't include Ubuntu 24.04 (noble) scripts. This adds a check that downloads the script or creates a symlink fallback.
…storage Everyone runs disk-setup.sh after install regardless of drive count. This removes the confusing prompt about single-drive vs multi-drive setup.
…recated haveged - Add universe repository to sources.list before apt-get install (debootstrap only sets up main by default, but fail2ban is in universe) - Remove haveged package (deprecated in Ubuntu 24.04 - kernel has built-in jitterentropy) - Fix disk-setup.sh get_root_disk() for rescue mode (overlay/tmpfs detection) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show installed SSH key before reboot so users can verify - Add simple 'reboot' command (Hetzner rescue is one-boot-only) - Remove stale haveged reference from completion output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add /dev/pts mount for proper PTY support in chroot - Make apt-get output quieter with -qq flag - Add DISK LAYOUT section showing lsblk output at completion - Fix color code rendering with echo -e - Streamline SSH key verification display - Update NEXT STEPS to match README Quick Start format
The script unmounts /mnt after completion, so the manual cat command wouldn't work. Now shows the remount step in both the script output and README.
When users SSH into rescue mode before installing Ubuntu, their ~/.ssh/known_hosts remembers the rescue system's host key. After installing Ubuntu with new host keys, SSH shows a scary "REMOTE HOST IDENTIFICATION HAS CHANGED" warning. Add ssh-keygen -R step to script output and README to help users clear the old key before connecting.
Since the user is created with --disabled-password (SSH key only), adding them to the sudo group isn't enough - sudo still prompts for a password that doesn't exist. Add /etc/sudoers.d/<user> with NOPASSWD.
The --setup wizard already offers to run benchmarks, so there's no need to run --benchmark as a separate step. This simplifies the Quick Start instructions and reduces user friction. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed secondary mount point from /mnt/blockstore to /mnt/ledger (contains both snapshots and blockstore subdirectories) - Added path_on_root_disk() safety function to prevent accidental deletion of data on the OS disk - Fixed color escape codes in summary display (use echo -e) - Updated "Data:" label to "Ledger:" in summary output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Follow Agave's directory naming pattern for consistency: - /mnt/mithril-accounts (AccountsDB on fast drive) - /mnt/mithril-ledger (blockstore + snapshots on secondary drive) Updated all scripts, documentation, and config examples to use the new naming convention. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This reverts commit 1f8a3d8.
- Added check_disk_deps() to verify parted, mkfs.ext4, mkfs.xfs are installed - Offers to install missing tools automatically via apt - Improved ask_filesystem() to accept direct input (ext4/xfs) instead of select - Now accepts both "ext4"/"xfs" and numbers "1"/"2" - Invalid input shows helpful error instead of silently returning empty - Defaults to ext4 if user just presses Enter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When recommending a drive for AccountsDB, now considers the minimum size requirement (700GB) in addition to speed. If the fastest drive is too small, recommends the fastest drive that meets the size requirement and explains why. This prevents recommending a slightly faster but too-small drive (e.g., 476GB Samsung vs 1.9TB Micron with <1% IOPS difference). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Separate interrupt handler that explicitly exits with code 130 (standard for SIGINT). Previously the cleanup trap didn't exit, so the script could continue after Ctrl+C. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Before: /mnt/mithril-accounts/accountsdb/mithril_db... After: /mnt/mithril-accounts/mithril_db... The mount name already indicates it's for accounts, no need for a redundant subdirectory. This also updates: - disk-setup.sh to not create the accountsdb subdir - clean_accountsdb() to find artifacts at the mount root - clean_all() to handle the new structure - Config hints at end of setup to show correct paths 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Simplifies terminology for better user understanding: - CLI flag: --delete-accountsdb -> --delete-accounts - Function: clean_accountsdb() -> clean_accounts() - Messages updated to use "accounts" instead of "AccountsDB" Technical comments explaining the AccountsDB concept are preserved. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- STEP 1 (AccountsDB): Now shows OS drive option if >= 100GB free (was 700GB, matching single-drive mode threshold) - STEP 2 (Snapshots/Blockstore): Added OS drive option when not used for AccountsDB - OS drive creates partition instead of wiping (safe) - Summary correctly shows partition vs erase warning per drive - Confirmation handles mixed scenarios (partition + erase) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When configuring noatime, now displays detected Mithril mount points (/mnt/mithril-accounts, /mnt/mithril-ledger) as recommended options. Makes it easier for users to know which mounts to configure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The part_path function was checking if the partition block device existed to determine the naming format. When creating a new partition, it doesn't exist yet, so it would incorrectly fall back to the non-NVMe format (e.g., /dev/nvme1n11 instead of /dev/nvme1n1p1). Now correctly determines format based on device name pattern: - NVMe (nvme*): /dev/nvme0n1p1 - MMC/SD (mmcblk*): /dev/mmcblk0p1 - Loop (loop*): /dev/loop0p1 - Traditional (sd*): /dev/sda1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7f2e818 to
288ef0b
Compare
The format_disk function was not waiting for the kernel to recognize the new partition before attempting to format it. This could cause mkfs to fail silently, resulting in mount failures. Added partprobe and sleep (matching create_partition_on_disk), plus verification that the partition block device exists before formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The format_disk and create_partition_on_disk functions are called via command substitution to capture the partition path. Any stdout output (from info(), echo, mkfs, parted) was being captured along with the path, causing mount to fail with "bad option" errors. Now all status messages, command output, and formatting output go to stderr, leaving only the partition path on stdout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The spinner message with disk model names can be 55+ characters, but the clearing only wrote 40 characters. This left partial text visible on the previous line. Increased clear width to 80 characters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Automatically installs fio when running --benchmark if not present, rather than falling back to less accurate hdparm. Only falls back to hdparm if fio installation fails. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous per-line >&2 redirects weren't fully preventing stdout
pollution. Using command groups { ... } >&2 ensures ALL output
from partition creation goes to stderr, so only the partition path
is captured when called via command substitution.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stdout pollution issue was causing mount failures because garbage was getting captured into the partition path variable. Instead of returning the partition path via stdout (which required complex stderr redirection), format_disk and create_partition_on_disk now set a global variable FORMATTED_PARTITION that callers read directly. This completely avoids the command substitution problem. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Run systemctl daemon-reload after adding fstab entries so systemd immediately picks up the new mount configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of asking for individual mountpoints, detect all Mithril mounts and apply noatime to all of them with a single Y/n confirmation. This is cleaner since noatime is a safe, low-risk optimization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add optional default parameter to yesno function - All BASIC and ADVANCED optimizations now default to Yes - RISKY ext4 options (barrier=0, data=writeback) still default to No - Makes it easier to accept all recommended settings by pressing Enter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "none" in the hint looked like it meant "n" for No. The actual scheduler choice happens in a numbered menu inside the function. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Helps users identify which disk is AccountsDB vs snapshots/blockstore. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When read is inside a 'while read <<< $var' loop, stdin is the here-string, not the terminal. Must explicitly read from /dev/tty. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The scheduler is typically already set to 'none' (optimal). Users can skip by pressing Enter, or say 'y' to configure. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- I/O scheduler choice now loops until valid input (1/2/3) - Read-ahead per-device choice now loops until valid (1-6) - Read-ahead all-devices choice now loops until valid (1-5) - THP now prompts for choice instead of auto-applying madvise - Advanced mount options now loops until valid (1-4) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Go runtime tuning is safe and recommended, so it should appear before the risky experimental ext4 options. New order: - BASIC: TRIM, sysctl, CPU, noatime - ADVANCED: I/O scheduler, read-ahead, THP, Go tuning - EXPERIMENTAL: ext4 barrier/writeback (last, with warning) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merged BASIC and ADVANCED into single RECOMMENDED section since they're all standard safe optimizations. Only truly risky options (ext4 barrier=0, data=writeback) remain in EXPERIMENTAL. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
THP options are all variations of "disable or limit" - not a positive optimization. Modern kernels default to "madvise" which is reasonable. The --hugepages flag still works for those who want to explicitly configure it, but it's no longer part of interactive mode or --all. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Go build-time optimizations (GOAMD64) are foundational and should be shown early. New order groups CPU-related items together: 1. TRIM, sysctl, CPU perf, Go tuning 2. noatime, I/O scheduler, read-ahead 3. EXPERIMENTAL: ext4 options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use case statement for cleaner arch detection - Set GO_VERSION variable at top for easy updates - Restore Go 1.25 references (green tea GC improvements) - Fixed issue where GOARCH wasn't being set before wget 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "Mithril's Simple RPC Server" section with supported methods - Add "Updating Mithril" section noting no resume support yet - Move directory structure to Step 2 (shows what scripts create) - Fix config field name: endpoints -> rpc - Add remote RPC access example with IP placeholder - Add upcoming RPC methods: simulation, send tx, leader schedule - Remove GOAMD64 section (handled by performance-tune.sh) - Add common utilities (vim, htop, etc) to server-setup.sh - Update Discord invite link - Update stage2 defaults for home internet (3s warmup/measure) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add mithril.starter.toml with paths pre-configured for setup scripts - Update README to mention starter config option - Keep example config as detailed reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mithril requires CGO for the DataDog/zstd package, which needs a C compiler. Added Step 3 to install build-essential before Go installation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The disk-setup script runs as root and creates directories owned by root, but Mithril runs as a regular user. Added instruction to chown the directories after running disk-setup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
server-setup.sh: Automated Ubuntu 24.04 installation for bare metal servers (tested on Hetzner AX102)
installmode: Fresh OS installation from rescue/recovery environmenthardenmode: User creation, SSH key setup, security hardeningstatusmode: Audit current security configurationdisk-setup.sh: NVMe storage configuration for Mithril
--setup: Interactive wizard for formatting/mounting drives (offers benchmarks)--benchmark: Standalone 4K random IOPS testing--info: Display current disk layoutperformance-tune.sh: Kernel and I/O performance optimizations
README.md: Comprehensive documentation with Quick Start guide
Test plan
installmode on Hetzner AX102 rescue environmenthardenmode on fresh Ubuntu installation--benchmarkon real NVMe drives (Micron 2TB, Samsung 512GB)--setupwizard flow🤖 Generated with Claude Code