HTML to PDF conversion using Node.js runtime as AWS Lambda Functions
In this post, I will cover how to generate PDF from HTML, CSS using Node.js runtime on AWS Lambda Functions.
Generating PDF on the server-side can be pretty hard sometimes. Since most of the libraries require the low-level executable files, fonts and other dependencies. Especially when you are using Serverless architecture where you cannot have the low-level executables and dependencies installed during runtime. I have spent 2 days in finding a clean solution, so let’s get started.
We will be using an html-pdf library to convert HTML to PDF. The html-pdf library uses PhantomJS internally so this executable must be installed on your system. Follow the steps to install PhantomJS(https://phantomjs.org/download.html).
Prerequisite:
- Nodejs
- Install Serverless globally
npm i -g serverless
Creating Lambda Layers:
As AWS Lambda manages memory, CPU, network and other resources, this is in exchange for flexibility, which means you cannot log in to compute instances, or customize the operating system on provided runtimes. In short, you cannot have PhantomJS installed during the runtime.
To solve this problem we will be creating a lambda layer with necessary low-level binaries, fonts and the compiled PhantomJS executable file.
The best way would be keeping your layer and your function in separate stacks.
Steps to create Layer:
mkdir layers
cd layers/
mkdir executables/
Download all the executables here and copy all the files to the executable folder. Then create serverless.yml file in layers folder and with the following content. You can find more information about configuring layer in Serverless Layer Guide
touch serverless.yml
Then deploy the layer
sls deploy --stage dev
After successful deployment, your layer is now ready to use with the function
Lambda Function
We will be creating a simple function that takes plain html or any one of the Template Engine files as input and converts it to PDF. In this example, we will be using HBS template engine. We will be uploading the generated pdf file to S3. We are using the html-pdf library for converting HTML to PDF. To get more info. about the configuration of html-pdf visit (https://www.npmjs.com/package/html-pdf).
Let’s get started with the function.
mkdir htmlToPdf
cd htmlToPdf
touch handler.js
Write the following code in the handler.js file
In the code, you can see that we have set some environment variables before the function. It is very important to set this environment variables to work properly. If not set the pdf will not be generated or else the pdf will contain black dots. You can find more info. about AWS Environment Variables.
Layers are extracted to the /opt directory in the function execution environment. Each runtime looks for libraries in a different location under /opt depending on the language. Structure your layer so that function code can access libraries without additional configuration. We have created the layer previously, so all the executables that were deployed are accessible in function like /opt/phantomjs_linux-x86_64. That is why we have updated environment variable path before function.
process.env.PATH = `${process.env.PATH}:/opt`
process.env.FONTCONFIG_PATH = '/opt'
process.env.LD_LIBRARY_PATH = '/opt'
Another important configuration in inside exportHtmlToPdf function is the phantomPath is set to /opt/phantomjs_linux-x86_64. This path is important else you will get an error saying PhantomJS not found.
Our function is now ready. Let us now set up the serverless.yml file
touch serverless.yml
Use the following code in serverless.yml
In this file under function, you can see the layer associated with the function, this is necessary else /opt directory will be empty
htmlToPdf:
handler: handler.htmlToPdf
layers:
- ${cf:executables-layer-${self:provider.stage}.HtmlToPdfLayerExport}
events:
- http:
path: api/htmltopdf
method: get
cors: true
integration: lambda
You can have a max 5 layers for a function.
We are now ready for deployment. Deploy the function with the following command
sls deploy --stage dev
The function is now ready to use. After Deployment, you will get an endpoint like https://xxxxxxxx.execute-api.ap-south-1.amazonaws.com/dev/api/htmltopdf. The region can be different for you. Try to invoke the API with any tool you like.
Local Development
To test the API locally, some changes should be done. Comment the “phantomPath” in exportHtmlToPdf function
htmlToPdf/handler.jspdf.create(html, {
format: "Letter",
orientation: "portrait",
// phantomPath: '/opt/phantomjs_linux-x86_64'
}).toBuffer((err, buffer) => {
if (err) {
reject(err)
} else {
resolve(buffer)
}
});
Also, comment the layer in serverless.yml
htmlToPdf/serverless.ymlhtmlToPdf:
handler: handler.htmlToPdf
# layers:
# - ${cf:executables-layer-${self:provider.stage}.HtmlToPdfLayerExport}
events:
- http:
path: api/htmltopdf
method: get
cors: true
integration: lambda
Start the offline server now
sls offline start
You can now test the API on http://localhost:3000.
References
In the spirit of open-source, feel free to use my code as mentioned in the below github account. I hope this will save your time.