We often encounter the need to create PDFs based on content. While there is no right or wrong way to generate PDFs, some approaches are more efficient and quicker to build than others.
Previously, we had to write all the boilerplate code to generate PDFs in our applications.
However, now we have many great libraries and tools that can help us quickly implement this feature.
The most important part of generating PDFs is the input data. The most common and useful approach is to generate PDFs from HTML content or based on a website URL.
In this article, we will look into some approaches that we can take to generate PDFs from HTML
Before we move on to the libraries, first let’s see why we prefer HTML as input data for generating PDFs. Some of the reasons are as follows
In summary, converting PDFs from HTML combines the best of both worlds: the flexibility, accessibility, and interactivity of HTML with the portability and Standardization of PDFs.
There are many libraries available in Python that allow the generation of PDFs from HTML content, some of them are explained below.
When generating HTML to PDF in Python, we need libraries and solutions which does not compromise the formatting of the PDF. With the following Open Source libraries you don’t need to worry about losing formatting because all the below solutions take care of the formatting when generating HTML to PDF using Python.
Pyppeteer is a Python port of the Node library Puppeteer, which provides a high-level API over the Chrome DevTools Protocol. it’s like you are running a browser in your code that can do similar things that your browser can do. Puppeteer can be used to scrap data from websites, take screenshots for a website, and much more. Let’s see how we can utilize pyppeteer to generate PDFs from HTML.
First, we need to install pyppeteer with the following command:
pip install pyppeteer
Generate PDF from a website URL
import asyncio from pyppeteer import launch async def generate_pdf(url, pdf_path): browser = await launch() page = await browser.newPage() await page.goto(url) await page.pdf() await browser.close() # Run the function asyncio.get_event_loop().run_until_complete(generate_pdf('https://example.com', 'example.pdf'))
In the above code, if you see the generate_pdf method, we are doing the following things
Generate PDF from Custom HTML content
import asyncio from pyppeteer import launch async def generate_pdf_from_html(html_content, pdf_path): browser = await launch() page = await browser.newPage() await page.setContent(html_content) await page.pdf() await browser.close() # HTML content html_content = ''' PDF Example Hello, world!
''' # Run the function asyncio.get_event_loop().run_until_complete(generate_pdf_from_html(html_content, 'from_html.pdf'))
Above is another example using Pyppeteer on how we can use our own custom HTML content to generate PDFs. Let’s see what is happening in the method generate_pdf_from_html
xhtml2pdf is another Python library that lets you generate PDFs from HTML content. Let’s see xhtml2pdf in action.
The following command is to install xhtml2pdf:
pip install xhtml2pdf requests
To generate PDF from a website URL
Note that xhtml2pdf does not have an in-built feature to parse the URL, but we can use requests in Python to get the content from a URL.
from xhtml2pdf import pisa import requests def convert_url_to_pdf(url, pdf_path): # Fetch the HTML content from the URL response = requests.get(url) if response.status_code != 200: print(f"Failed to fetch URL: ") return False html_content = response.text # Generate PDF with open(pdf_path, "wb") as pdf_file: pisa_status = pisa.CreatePDF(html_content, dest=pdf_file) return not pisa_status.err # URL to fetch url_to_fetch = "https://google.com" # PDF path to save pdf_path = "google.pdf" # Generate PDF if convert_url_to_pdf(url_to_fetch, pdf_path): print(f"PDF generated and saved at ") else: print("PDF generation failed")
In the above code, we are doing the following things in our method convert_url_to_pdf
Generating PDF from custom HTML content
from xhtml2pdf import pisa def convert_html_to_pdf(html_string, pdf_path): with open(pdf_path, "wb") as pdf_file: pisa_status = pisa.CreatePDF(html_string, dest=pdf_file) return not pisa_status.err # HTML content html_content = ''' PDF Example Hello, world!
''' # Generate PDF pdf_path = "example.pdf" if convert_html_to_pdf(html_content, pdf_path): print(f"PDF generated and saved at ") else: print("PDF generation failed")
Generating PDF from custom HTML content is also similar to what we have done for the URL part, the only change here is, that we are passing the actual HTML content to our generating method. Now it will use our custom HTML content and generate PDF from it.
python-pdfkit is a python wrapper for wkhtmltopdf utility to convert HTML to PDF using Webkit.
First, we need to install python-pdfkit with pip:
pip install pdfkit
To generate PDF from website URL
import pdfkit def convert_url_to_pdf(url, pdf_path): try: pdfkit.from_url(url, pdf_path) print(f"PDF generated and saved at ") except Exception as e: print(f"PDF generation failed: ") # URL to fetch url_to_fetch = 'https://example.com' # PDF path to save pdf_path = 'example_from_url.pdf' # Generate PDF convert_url_to_pdf(url_to_fetch, pdf_path)
pdfkit supports generating PDFs from website URLs out of the box just like Pyppeteer.
In the above code, as you can see, pdfkit is generating pdf just from one line code. pdfkit.from_url is all you need to generate a PDF.
Generating PDF from custom HTML content
import pdfkit def convert_html_to_pdf(html_content, pdf_path): try: pdfkit.from_string(html_content, pdf_path) print(f"PDF generated and saved at ") except Exception as e: print(f"PDF generation failed: ") # HTML content html_content = ''' PDF Example Hello, world!
''' # PDF path to save pdf_path = 'example_from_html.pdf' # Generate PDF convert_html_to_pdf(html_content, pdf_path)
For generating PDF from custom HTML content, we only need to use pdfkit.from_string and provide HTML content and a pdf file path.
Playwright is a modern and lightweight library for using a headless browser in your application. Playwright is primarily used in automation testing with its powerful offering of integrations with modern browsers. Currently, Playwright supports Firefox, Chromium, Edge & Safari. Playwright is Cross-platform, cross-browser, and cross-language.
In this article, we will look into how we can convert HTML to PDF in Python using Playwright without losing formatting and quality.
the first step is to install the Playwright library
pip install playwright playwright install
playwright install command will make sure to install a headless browser in your system which will be used to convert HTML to PDF in Python.
Generate PDF from website URL
import asyncio from playwright.async_api import async_playwright async def url_to_pdf(url, output_path): async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page() await page.goto(url) await page.pdf(path=output_path) await browser.close() # Example usage url = 'https://google.com' output_path = 'html-to-pdf-output.pdf' asyncio.run(url_to_pdf(url, output_path))
In the above code, we have url_to_pdf method which takes the URL of the website and output path as input parameters. It creates an async function to run a headless browser.
we use chromium.launch() to launch a new instance of the browser and create a new page with browser.new_page()
once we have a new page ready, we then load our URL which will be the source for HTML to PDF in this case. Once everything is done, we call browser.close() to close the browser instance.
Generate PDF from custom HTML content
We can also use Playwright to generate custom HTML to PDF along with directly loading HTML content from the website URL. To generate HTML to PDF from your custom HTML, we will do the following
import asyncio from playwright.async_api import async_playwright async def html_to_pdf(html_content, output_path): async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page() await page.set_content(html_content) await page.pdf(path=output_path) await browser.close() html_content = ''' Sample HTML Hello, World!
This is a sample HTML content to be converted to PDF.
''' output_path = 'custom-html-to-pdf-output.pdf' asyncio.run(html_to_pdf(html_content, output_path))
In the above code snippet, we are passing our custom HTML content to html_to_pdf method. We use Playwright to load that custom HTML in a new tab of the headless browser. and then generate PDFs using our custom HTML.
WeasyPrint is a visual rendering engine that converts HTML and CSS into PDFs, focusing on adhering to web standards for printing. It is freely available under a BSD license.
Unlike some other libraries which rely on browsers such as Chrome or Firefox, it is built on several libraries but does not use a full rendering engine.
In this section, we will look into how we can use WeasyPrint to convert HTML to PDF using Python.
pip install WeasyPrint
Generate PDF from website URL
To generate a PDF from the website URL, we will use .HTML() method given by the WeasyPrint library.
from weasyprint import HTML def url_to_pdf(url, output_path): HTML(url).write_pdf(output_path) # Example usage url = 'https://google.com' output_path = 'output_url.pdf' url_to_pdf(url, output_path)
In the above code snippet, we are using the WeasyPrint .write_pdf() method which converts the Website URL to PDF in a simple step.
While using WeasyPrint, we don’t need to worry about setting up any other things and we can just call the method and get PDFs directly from the website URL.
Generate PDF from custom HTML content
from weasyprint import HTML def html_to_pdf(html_content, output_path): HTML(string=html_content).write_pdf(output_path) html_content = ''' Sample HTML Hello, World!
This is a sample HTML content to be converted to PDF.
''' output_path = 'output_html.pdf' html_to_pdf(html_content, output_path)
Same as generating PDFs from website URL, we can use .HTML() method to load the custom HTML and then use .write_pdf() method to create PDF from our custom HTML.
While all of these tools serve the primary purpose of HTML to PDF conversion, each brings its unique features and approaches to the table.
Below is a detailed tabular comparison to help you choose the right tool for your needs.
Feature/Aspect | Pyppeteer | xhtml2pdf | python-pdfkit | Playwright | WeasyPrint |
---|---|---|---|---|---|
Nature | Browser automation library | HTML/CSS to PDF converter | Wrapper around wkhtmltopdf | Lightweight and Modern Browser automation library | Document factory to create PDF documents easily |
Based On | Puppeteer (Chromium headless) | ReportLab & html5lib | wkhtmltopdf | Playwright (supports Firefox, Chromium, Edge & Safari) | Written in Python and based on various libraries and not a full rendering engine |
Dependencies | Requires Chrome/Chromium | Python libraries | Requires wkhtmltopdf | Cross-browser support | Python libraries |
Language | Python | Python | Python | Python | Python |
Javascript Support | Yes (full browser environment) | No | Yes (limited, via wkhtmltopdf) | Yes (full browser environment) | No |
CSS Support | Full (as in Chrome) | Limited | Good (via wkhtmltopdf) | Full (as in browser) | Yes |
Performance | May be slower (full browser) | Moderate | Fast (native conversion) | Fast | Fast |
Ease of Setup | Moderate (need Chromium) | Easy (pure Python) | Moderate (requires wkhtmltopdf) | Easy to setup | Easy to Setup |
API Flexibility | High (full browser automation) | Moderate (focused on PDF) | Moderate (wrapper around tool) | One API for cross-browser support | Simple Python API |
Usage | Good for complex web content, SPA, dynamic JS content | Good for simpler HTML/CSS docs | Common for various HTML to PDF tasks | Good for complex automation testing with support of all modern browser | Good for creating PDF documents with rich styling |
In conclusion, the choice between Pyppeteer, Playwright, xhtml2pdf, WeasyPrint, and python-pdfkit hinges on the specific needs of a project.
While Pyppeteer excels at handling dynamic content due to its full browser automation capabilities, xhtml2pdf offers a straightforward, Python-centric solution for basic conversions.
Playwright is a serious alternative to Pyppeteer as Playwright solves the problems that developers face while using Pupeteer. With the Playwright you get to experience a lightweight and fast library that has support for all the modern browsers.
WeasyPrint is a developer-friendly Python library that does not let you sweat over configurations and setup. One thing to take note is that WeasyPrint is based on various libraries and not a full rendering engine like WebKit or Gecko, so it doesn’t support Javasacript.
Python-pdfkit, wrapping around wkhtmltopdf, stands as a versatile middle ground. Developers should weigh the features, setup complexities, and performance of each library against their project’s demands to determine the best fit.
In summary, if you need to render complex HTML, CSS, and JavaScript with full browser compatibility, consider using Pyppeteer, Playwright, or python-pdfkit. For simpler tasks, xhtml2pdf and WeasyPrint are more suitable options.
Above are some examples of how we can use libraries to convert HTML to PDF and web pages to PDF. but when it comes to generating PDFs using templates or keeping track of generated PDFs, we need to do a lot of extra things to handle all those.
We need to have our own generating pdfs tracker for tracking the files generated. Or if we want to use custom templates such as Invoice generators, we need to create and manage those templates.
APITemplate.io is an API-based platform for PDF generation, perfect for the use cases mentioned. Our PDF generation API uses a Chromium-based rendering engine, which fully supports JavaScript, CSS, and HTML.
Let’s see how we can utilize APITemplate.io to handle generating PDFs
APITemplate.io allows you to manage your templates. Go to Manage Templates from the dashboard
From Manage Template, You can create your own templates. Following is the sample Invoice template. There are lots of templates available that you can choose and customize based on your requirements.
To start using APITemplate.io APIs, You need to get your API Key which you can get from the API Integration Tab
Now that you have your APITemplate account ready, let’s get to some actions and integrate it with our application. We will be using the template to generate PDFs.
import requests import json # Initialize HTTP client client = requests.Session() # API URL url = "https://rest.apitemplate.io/v2/create-pdf?template_id=YOUR_TEMPLATE_ID" # Payload data payload = < "date": "15/05/2022", "invoice_no": "435568799", "sender_address1": "3244 Jurong Drive", "sender_address2": "Falmouth Maine 1703", "sender_phone": "255-781-6789", "sender_email": "[email protected]", "rece_addess1": "2354 Lakeside Drive", "rece_addess2": "New York 234562", "rece_phone": "34333-84-223", "rece_email": "[email protected]", "items": [ , , , , , , , , ], "total": "total", "footer_email": "[email protected]", > # Serialize payload to JSON json_payload = json.dumps(payload) # Set headers headers = < "X-API-KEY": "YOUR_API_KEY", "Content-Type": "application/json", ># Make the POST request response = client.post(url, data=json_payload, headers=headers) # Read the response response_string = response.text # Print the response print(response_string)
and If we check response_string we have the following
In the above code, it’s very easy to use APITemplate to convert html to pdf because we don’t need to install any other library. Just need to call one simple API and use our data as a request body and that’s it!
You can use the download_url from the response to download or distribute the generated PDF.
APITemplate also supports generating PDFs from website URLs.
import requests, json def main(): api_key = "YOUR_API_KEY" template_id = "YOUR_TEMPLATE_ID" data = < "url": "https://en.wikipedia.org/wiki/Sceloporus_malachiticus", "settings": < "paper_size": "A4", "orientation": "1", "header_font_size": "9px", "margin_top": "40", "margin_right": "10", "margin_bottom": "40", "margin_left": "10", "print_background": "1", "displayHeaderFooter": true, "custom_header": "#header, #footer \n\n \n \n
", "custom_footer": "#header, #footer \n\n \n \n
" > > response = requests.post( F"https://rest.apitemplate.io/v2/create-pdf-from-url", headers = ">, json= data ) if __name__ == "__main__": main()
In the above code, we can provide the URL in the request body along with the settings for the PDF. APITemplate will use this request body to generate a PDF and will return a download URL for your PDF.
If you want to generate PDFs using your own custom HTML content, APITemplate supports that as well.
import requests, json def main(): api_key = "YOUR_API_KEY" template_id = "YOUR_TEMPLATE_ID" data = < "body": "hello world >
", "css": "", "data": < "name": "This is a title" >, "settings": < "paper_size": "A4", "orientation": "1", "header_font_size": "9px", "margin_top": "40", "margin_right": "10", "margin_bottom": "40", "margin_left": "10", "print_background": "1", "displayHeaderFooter": true, "custom_header": "