Skip to content

SebastianZzzz/HTML2PDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HTML2PDF 📄

A robust, headless browser-based CLI tool to convert HTML files, online URLs, and entire directories of HTML documents into high-quality PDFs.

Built with Playwright to ensure that complex CSS, background colors, and asynchronous JavaScript content are fully rendered before PDF generation.

✨ Features

  • Single URL Conversion: Convert any live web page to a PDF.
  • Local File Conversion: Convert locally saved HTML files with full asset rendering.
  • Batch Processing: Point the tool at a directory, and it will recursively find and convert all .html and .htm files.
  • High Fidelity: Retains background colors, images, and waits for network requests to finish (networkidle) to prevent missing assets.
  • Memory Efficient: Creates fresh page contexts for batch processing to prevent memory leaks during heavy workloads.

🚀 Installation

1. Clone the repository

git clone https://github.com/yourusername/HTML2PDF.git
cd HTML2PDF

2. Install Python Dependencies

It is recommended to use a virtual environment.

pip install -r requirements.txt

3. Install Playwright Browsers

HTML2PDF requires the Chromium browser binary to render pages.

playwright install chromium

💻 Usage

HTML2PDF is operated via a Command Line Interface (CLI).

1. Convert an Online URL

Use the -u (or --url) flag to specify the web address, and -o (or --output) for the output PDF path.

python HTML2PDF.py -u "https://github.com" -o "github_homepage.pdf"

2. Convert a Single Local HTML File

Use the -f (or --file) flag to specify the local file path.

python HTML2PDF.py -f "./sample.html" -o "local_output.pdf"

3. Batch Convert a Directory

Use the -b (or --batch) flag to specify the input directory containing HTML files. The -o flag will act as the output directory.

python HTML2PDF.py -b "./html_inputs" -o "./pdf_outputs"

Note: If the output directory does not exist, the script will create it automatically.

⚙️ How it Works

Unlike traditional converters (e.g., wkhtmltopdf), HTML2PDF spins up a headless Chromium instance. It waits for the networkidle event, ensuring all external CSS, CDN images, and asynchronous JavaScript payloads are fully loaded before capturing the layout as a vector PDF.

📄 License

This project is open-sourced under the MIT License.

About

A playwright powered CLI tool to convert HTML files, online URLs, and entire directories of HTML documents into high-quality PDFs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages