Docker

The Jailcrawl runs in a docker container. To run this, you first need to install docker. Then, follow the steps below:

Create a file conf.env in your project root directory. Define 2 variables like:

BUCKET=jailcrawlsample
FAILURE_SNS_TOPIC=arn:aws:sns:us-east-1:153598194566:jailcrawl_errors

The BUCKET variable denotes the Amazon s3 bucket where the documents will be stored. The FAILURE_SNS_TOPIC variable is the Amazon Simple Notification Service "Topic" where error messages will be emailed.

In a file called credentials in your project root, put the following, using your AWS credentials:

[default]
aws_access_key_id=AKXXXXX
aws_secret_access_key=dxsQXXXXXX

Build the docker container:

docker build -t jailcrawl .

Run the container. There are two modes to run. To run a single file, do run_one with a pointer to the specific file you'd like to run:

docker run -i -t  jailcrawl run_one ./Arkansas_marion.py

To run all files, do:

docker run -i -t  jailcrawl run_all

Code format

Common files and functions are in jailcrawl/common.py. These include utilities for saving to S3, logging to AWS CloudWatch and persisting errors.

Working Scrapers

All working scrapers in the project directory ./working_scrapers/ are deployed and will be run. For instance, you can run a scraper at ./working_scrapers/Arkansas_greene.py by running docker run -i -t test run_one ./Arkansas_marion.py. Scrapers that are not working are in the ./to_fix/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
david_scrapers		david_scrapers
dbc_api_python3		dbc_api_python3
historical_scrapers		historical_scrapers
jailscrape		jailscrape
parsers		parsers
to_fix		to_fix
ui		ui
working_scrapers		working_scrapers
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Pennsylvania_lycoming.py		Pennsylvania_lycoming.py
README.md		README.md
docker-entrypoint.sh		docker-entrypoint.sh
example_markup.py		example_markup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docker

Code format

Working Scrapers

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

jborowitz/jailcrawl

Folders and files

Latest commit

History

Repository files navigation

Docker

Code format

Working Scrapers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages