A robust, headless browser-based CLI tool to convert HTML files, online URLs, and entire directories of HTML documents into high-quality PDFs.
Built with Playwright to ensure that complex CSS, background colors, and asynchronous JavaScript content are fully rendered before PDF generation.
- Single URL Conversion: Convert any live web page to a PDF.
- Local File Conversion: Convert locally saved HTML files with full asset rendering.
- Batch Processing: Point the tool at a directory, and it will recursively find and convert all
.htmland.htmfiles. - High Fidelity: Retains background colors, images, and waits for network requests to finish (
networkidle) to prevent missing assets. - Memory Efficient: Creates fresh page contexts for batch processing to prevent memory leaks during heavy workloads.
git clone https://github.com/yourusername/HTML2PDF.git
cd HTML2PDFIt is recommended to use a virtual environment.
pip install -r requirements.txtHTML2PDF requires the Chromium browser binary to render pages.
playwright install chromiumHTML2PDF is operated via a Command Line Interface (CLI).
Use the -u (or --url) flag to specify the web address, and -o (or --output) for the output PDF path.
python HTML2PDF.py -u "https://github.com" -o "github_homepage.pdf"Use the -f (or --file) flag to specify the local file path.
python HTML2PDF.py -f "./sample.html" -o "local_output.pdf"Use the -b (or --batch) flag to specify the input directory containing HTML files. The -o flag will act as the output directory.
python HTML2PDF.py -b "./html_inputs" -o "./pdf_outputs"Note: If the output directory does not exist, the script will create it automatically.
Unlike traditional converters (e.g., wkhtmltopdf), HTML2PDF spins up a headless Chromium instance. It waits for the networkidle event, ensuring all external CSS, CDN images, and asynchronous JavaScript payloads are fully loaded before capturing the layout as a vector PDF.
This project is open-sourced under the MIT License.