Remove advertisement sections from the Rest is History podcast episodes, and serve an RSS feed and ad-free episodes using Caddy. The impetus for this project was more exploratory than anything else, but it works well.
The methods employed are fast Fourier transform and gap analysis using scipy and librosa, respectively. This implementation is not particularly generalizable to other broadcasts.
This code is not intended to be used for any purpose other than personal use and is not intended to harm any creators in any way.
Because my favorite podcast app does not support authorization, the following measures are taken to make the files extremely difficult to find for anyone but the intended audience.
Note that the ${DL_PATH_PARENT} is effectively a password and should be treated appropriately.
- The caddy file server serves a root directory with only one subdirectory
- The subdirectory is where the episodes are stored and must be named the
${DL_PATH_PARENT}environment variable - The name of the episode directory is practically impossible to guess (e.g., a UUID)
- Because the directory name is obscure, the file names can be normal
For example:
/ root served by Caddy
└── bingo-bango-bongo-uuid
├── feed.xml
└── episode_1.mp3
The resulting feed url is http(s)://${DOMAIN}/${DL_PATH_PARENT}/feed.xml
Do not use the browse directive in the Caddyfile. If you do, the super-secret directory name will be visible for all peering eyes 👀.
A command-line interface is provided to run the script and each argument has a corresponding environment variable. See the docker-compose file for an example of how to set these variables.
Usage: python -m rm_ads [OPTIONS]
Download and process the latest episodes from the podcast RSS feed.
Options:
--log-level [DEBUG|INFO|WARNING|ERROR|CRITICAL]
Set log level.
--log-path PATH File path to write logs to. If a directory
is provided, rotating log files will be
created there.
--processed DIRECTORY Directory where episodes are saved.
[required]
--jingles DIRECTORY [required]
--max-episodes INTEGER The maximum number of episodes to process
right now. Episodes are processed in order
of publish date (newest first).
--max-on-disk INTEGER The maximum number of episodes to have saved
in the processed directory. Episodes are
prioritized by publish date (newest first).
--feed-url TEXT URL of the podcast RSS feed. [required]
--https Whether to use https for the replaced links.
If not provided, http will be used.
--domain TEXT Domain where you will be hosting the files.
Do not include the path. [required]
--dl-path-parent TEXT Parent directory of the download directory
to use in feed urls. This is the super-
secret path.
--run-interval INTEGER How often to run the script in minutes. If
not provided or value <0, the script will
run once.
--version Show the version and exit.
--help Show this message and exit.