HTML to PDF conversion using Node.js runtime as AWS Lambda Functions

Keyur Bhole
4 min readMar 4, 2020

--

In this post, I will cover how to generate PDF from HTML, CSS using Node.js runtime on AWS Lambda Functions.

Generating PDF on the server-side can be pretty hard sometimes. Since most of the libraries require the low-level executable files, fonts and other dependencies. Especially when you are using Serverless architecture where you cannot have the low-level executables and dependencies installed during runtime. I have spent 2 days in finding a clean solution, so let’s get started.

We will be using an html-pdf library to convert HTML to PDF. The html-pdf library uses PhantomJS internally so this executable must be installed on your system. Follow the steps to install PhantomJS(https://phantomjs.org/download.html).

Prerequisite:

  1. Nodejs
  2. Install Serverless globally
npm i -g serverless

Creating Lambda Layers:

As AWS Lambda manages memory, CPU, network and other resources, this is in exchange for flexibility, which means you cannot log in to compute instances, or customize the operating system on provided runtimes. In short, you cannot have PhantomJS installed during the runtime.

To solve this problem we will be creating a lambda layer with necessary low-level binaries, fonts and the compiled PhantomJS executable file.

The best way would be keeping your layer and your function in separate stacks.

Steps to create Layer:

mkdir layers
cd layers/
mkdir executables/

Download all the executables here and copy all the files to the executable folder. Then create serverless.yml file in layers folder and with the following content. You can find more information about configuring layer in Serverless Layer Guide

touch serverless.yml

Then deploy the layer

sls deploy --stage dev

After successful deployment, your layer is now ready to use with the function

Lambda Function

We will be creating a simple function that takes plain html or any one of the Template Engine files as input and converts it to PDF. In this example, we will be using HBS template engine. We will be uploading the generated pdf file to S3. We are using the html-pdf library for converting HTML to PDF. To get more info. about the configuration of html-pdf visit (https://www.npmjs.com/package/html-pdf).

Let’s get started with the function.

mkdir htmlToPdf
cd htmlToPdf
touch handler.js

Write the following code in the handler.js file

In the code, you can see that we have set some environment variables before the function. It is very important to set this environment variables to work properly. If not set the pdf will not be generated or else the pdf will contain black dots. You can find more info. about AWS Environment Variables.

Layers are extracted to the /opt directory in the function execution environment. Each runtime looks for libraries in a different location under /opt depending on the language. Structure your layer so that function code can access libraries without additional configuration. We have created the layer previously, so all the executables that were deployed are accessible in function like /opt/phantomjs_linux-x86_64. That is why we have updated environment variable path before function.

process.env.PATH = `${process.env.PATH}:/opt`
process.env.FONTCONFIG_PATH = '/opt'
process.env.LD_LIBRARY_PATH = '/opt'

Another important configuration in inside exportHtmlToPdf function is the phantomPath is set to /opt/phantomjs_linux-x86_64. This path is important else you will get an error saying PhantomJS not found.

Our function is now ready. Let us now set up the serverless.yml file

touch serverless.yml

Use the following code in serverless.yml

In this file under function, you can see the layer associated with the function, this is necessary else /opt directory will be empty

htmlToPdf:
handler: handler.htmlToPdf
layers:
- ${cf:executables-layer-${self:provider.stage}.HtmlToPdfLayerExport}
events:
- http:
path: api/htmltopdf
method: get
cors: true
integration: lambda

You can have a max 5 layers for a function.

We are now ready for deployment. Deploy the function with the following command

sls deploy --stage dev

The function is now ready to use. After Deployment, you will get an endpoint like https://xxxxxxxx.execute-api.ap-south-1.amazonaws.com/dev/api/htmltopdf. The region can be different for you. Try to invoke the API with any tool you like.

Local Development

To test the API locally, some changes should be done. Comment the “phantomPath” in exportHtmlToPdf function

htmlToPdf/handler.jspdf.create(html, {
format: "Letter",
orientation: "portrait",
// phantomPath: '/opt/phantomjs_linux-x86_64'
}).toBuffer((err, buffer) => {
if (err) {
reject(err)
} else {
resolve(buffer)
}
});

Also, comment the layer in serverless.yml

htmlToPdf/serverless.ymlhtmlToPdf:
handler: handler.htmlToPdf
# layers:
# - ${cf:executables-layer-${self:provider.stage}.HtmlToPdfLayerExport}
events:
- http:
path: api/htmltopdf
method: get
cors: true
integration: lambda

Start the offline server now

sls offline start

You can now test the API on http://localhost:3000.

References

In the spirit of open-source, feel free to use my code as mentioned in the below github account. I hope this will save your time.

https://www.youtube.com/watch?v=lNU5L96E8tc

--

--