A data journalism project documenting the US war on bitcoin, Operation Chokepoint 2.0, and the campaign of crypto-related 18 USC 1960 prosecutions in the USA.
A comprehensive system for scraping, analyzing, and exploring Department of Justice press releases, with a focus on 18 USC 1960 (money transmission) and cryptocurrency-related cases.
This project is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0). See the LICENSE file for details.
You are free to:
Under the terms of Attribution and ShareAlike.
1. Clone the repository:
bash
git clone
cd ocp2-project
2. Install dependencies:
bash
pip install -r requirements.txt
3. Set up environment variables:
bash
cp env.example .env
4. Edit .env
and add your Venice AI API key:
VENICE_API_KEY=your_actual_api_key_here
The following environment variables can be configured in .env
:
VENICE_API_KEY
: Your Venice AI API key (required)DATABASE_NAME
: Database filename (default: doj_cases.db
)FLASK_DEBUG
: Enable Flask debug mode (default: False
)FLASK_HOST
: Flask server host (default: 0.0.0.0
)FLASK_PORT
: Flask server port (default: 5000
)FILE_SERVER_PORT
: File server port (default: 8000
)FILE_SERVER_DIRECTORY
: Directory to serve files from (default: .
)First, populate the database with cases that mention our keywords of interest.
bash
python scraper.py
The scraper will:
cases
table in the database.Next, use the AI-powered enrichment script to perform detailed data extraction on the scraped cases. This script populates a series of relational tables with structured data pulled from the press release text.
Key Features:
You must specify which table you want to populate using the --table
argument.
bash
Example: Enrich data for the 'case_metadata' table
python enrich_cases.py --table case_metadataExample: Enrich data for the 'participants' table for up to 10 cases
python enrich_cases.py --table participants --limit 10Example: Run with verbose logging to see detailed processing
python enrich_cases.py --table case_metadata --limit 5 --verboseExample: Set up database tables only (no processing)
python enrich_cases.py --setup-only
Available tables for enrichment are:
case_metadata
participants
case_agencies
charges
financial_actions
victims
quotes
themes
For better maintainability and testing, use the modular enrichment script:
bash
Example: Enrich data for the 'case_metadata' table using modular architecture
python enrich_cases_modular.py --table case_metadataExample: Enrich data for the 'participants' table for up to 10 cases
python enrich_cases_modular.py --table participants --limit 10Example: Run with verbose logging to see detailed processing
python enrich_cases_modular.py --table case_metadata --limit 5 --verboseExample: Set up database tables only (no processing)
python enrich_cases_modular.py --setup-only
The verification script checks if cases are related to 18 USC 1960 violations:
bash
Example: Verify up to 10 cases
python 1960-verify.py --limit 10Example: Run with verbose logging
python 1960-verify.py --limit 5 --verboseExample: Dry run to test without making changes
python 1960-verify.py --dry-run --limit 3
For better maintainability, use the modular verification script:
bash
Example: Verify up to 10 cases using modular architecture
python 1960-verify_modular.py --limit 10Example: Run with verbose logging
python 1960-verify_modular.py --limit 5 --verboseExample: Dry run to test without making changes
python 1960-verify_modular.py --dry-run --limit 3
bash
python app.py
Visit http://localhost:5000
to access the web interface.
Enrichment Dashboard Features:
bash
python file_server.py
Serves files from the current directory on port 8000.
ocp2-project/
├── scraper.py # DOJ press release scraper
├── enrich_cases.py # AI-powered data extraction and enrichment (legacy)
├── enrich_cases_modular.py # AI-powered data extraction (modular architecture)
├── run_enrichment.py # Batch enrichment runner with verbose support
├── 1960-verify.py # Legacy AI verification script
├── 1960-verify_modular.py # AI verification script (modular architecture)
├── app.py # Flask web application
├── file_server.py # Simple file server
├── doj_cases.db # SQLite database
├── requirements.txt # Python dependencies
├── .env # Environment variables (create from env.example)
├── .gitignore # Git ignore rules
├── env.example # Environment template
├── LICENSE # CC BY-SA 4.0 license
├── CHANGELOG.md # Development history
├── templates/ # Flask templates
│ ├── base.html
│ ├── index.html
│ ├── cases.html
│ ├── case_detail.html
│ └── enrichment.html
├── utils/ # Core utilities (modular architecture)
│ ├── __init__.py
│ ├── config.py # Centralized configuration management
│ ├── database.py # Database connection and schema management
│ ├── api_client.py # Venice API client abstraction
│ ├── json_parser.py # JSON parsing utilities
│ └── logging_config.py # Logging setup and configuration
├── modules/ # Domain-specific modules (modular architecture)
│ ├── __init__.py
│ ├── enrichment/ # Enrichment domain logic
│ │ ├── __init__.py
│ │ ├── prompts.py # AI prompt templates
│ │ ├── schemas.py # Database schema definitions
│ │ └── storage.py # Data storage operations
│ └── verification/ # Verification domain logic
│ ├── __init__.py
│ └── classifier.py # 1960 verification logic
├── orchestrators/ # Process orchestration (modular architecture)
│ ├── __init__.py
│ ├── enrichment_orchestrator.py # Enrichment process coordination
│ └── verification_orchestrator.py # Verification process coordination
├── docs/ # Documentation
│ ├── modularization-plan.md # Modularization architecture plan
│ ├── modularization-todos.md # Implementation TODOs and constraints
│ └── project-plan.md # Original project planning document
└── README.md # This file
The project has been refactored into a clean, modular architecture with the following layers:
utils/
)modules/
)orchestrators/
)enrich_cases_modular.py
and 1960-verify_modular.py
The project uses a relational database schema to store extracted data. The main cases
table holds the raw press release content, and several satellite tables hold the structured data extracted by the AI.
cases (Primary Table)
│
├─ case_metadata (1-to-1: Core details like district, judge, case number)
├─ participants (1-to-many: Defendants, prosecutors, agents, etc.)
├─ case_agencies (1-to-many: Investigating agencies like FBI, IRS-CI)
├─ charges (1-to-many: Specific legal charges and statutes)
├─ financial_actions (1-to-many: Forfeitures, fines, restitution amounts)
├─ victims (1-to-many: Details about victims mentioned)
├─ quotes (1-to-many: Pull-quotes from officials)
├─ themes (1-to-many: Thematic tags like 'romance_scam', 'darknet')
└─ enrichment_activity_log (Audit trail of enrichment operations)
cases
: Stores the original press release data (title, date, body, URL).cases
table via a case_id
and contains specific, structured fields extracted by the AI from the press release text. This relational model allows for complex queries and detailed analysis.enrichment_activity_log
: Tracks all enrichment operations with timestamps, status, and notes for debugging and monitoring.