Need to merge PDF files stored in SharePoint? Look no further!
In this article, I will show you how to create an Azure Function to merge PDF files stored in SharePoint. The Function will be a generic service, which receives a list of file paths to merge. This means that you can trigger a request from SPFx, Power Automate, Logic Apps… Or anything else really. we are going to use the PFDsharp library, so our code will be super simple!
Update 01-02-2022
As I was getting some requests to publish the full solution, I asked my client and I am happy to announce that they have agreed to let me publish the source code!
Any feedback is welcome, and please let me know if you find issues.
The inconvenience of Power Automate
If you are familiar with Power Automate, you may already know that you can use third-party actions to merge PDF files. But they may impose some significant disadvantages that can prevent you from using them:
- License costs – third-party providers will typically charge you per user or for the number of executions
- Data transfer – when using a remote service to merge your files, you are sending your data to that service. While this may not be a problem for files without commercial value, the same is not true for confidential information. The service provider may have strict security arrangements in place, but sometimes, the risk may be just too high.
Function advantages
Using an Azure Function, ultimately means that you are in full control. When compared with the inconveniences of third-party solutions in Power Automate:
- Cost – the merging process is super fast, which means that you can use a consumption plan for the function if you really want to. Yes, it’s almost FREE!
- Data transfer – The information will flow between your Office 365 tenant and your Azure subscription. And all can be done using memory streams, so no temporary files stored that need to be deleted at the end.
Merge PDF files
NuGet packages
- PDFsharp
- SharePointPnPCoreOnline
Code
The code to merge files is actually very simple. First, we create a class that will represent a request to the function. In my case, I used an Azure storage queue as the entry point for my function and the messages of the queue had to respect this interface.
internal class QueueItem
{
public string SiteUrl { get; set; }
public string FolderPath { get; set; }
public string FileName { get; set; }
public string[] FilesPathArray { get; set; }
}
And I have created a method that does all the work:
internal static async void MergePDFs(ClientContext ctx, QueueItem queueItem, TraceWriter log)
{
log.Info($"Creating blank PDF file...");
// instantiate new file
using (PdfDocument targetDoc = new PdfDocument())
{
Microsoft.SharePoint.Client.File file = null;
ClientResult<Stream> fileStream = null;
// parse all files in array
log.Info($"Parsing {queueItem.FilesPathArray.Length} PDF files");
foreach (string filePath in queueItem.FilesPathArray)
{
log.Info($"Parsing PDF file: {filePath}");
// get file from SharePoint
file = ctx.Web.GetFileByUrl(filePath);
fileStream = file.OpenBinaryStream();
ctx.Load(file);
await ctx.ExecuteQueryRetryAsync();
// open file and get pages
using (PdfDocument pdfDoc = PdfReader.Open(fileStream.Value, PdfDocumentOpenMode.Import))
{
for (int i = 0; i < pdfDoc.PageCount; i++)
{
targetDoc.AddPage(pdfDoc.Pages[i]);
}
}
}
log.Info($"PDF files parsed successfully");
// create result file
using (Stream newFileStream = new MemoryStream())
{
targetDoc.Save(newFileStream);
// upload to SharePoint
var destinationFolder = ctx.Web.GetFolderByServerRelativeUrl(queueItem.FolderPath);
ctx.Load(destinationFolder);
await ctx.ExecuteQueryRetryAsync();
destinationFolder.UploadFile(queueItem.FileName, newFileStream, true);
await ctx.ExecuteQueryRetryAsync();
log.Info($"Final PDF file added to SharePoint: {queueItem.FolderPath}/{queueItem.FileName}");
}
}
}
Now, in your main Function file, simply:
– deserialize the queue message,
– instantiate the SharePoint context (PnP Core package make authentication really simple)
– call the MergePDFs function.
QueueItem queueItem = JsonConvert.DeserializeObject<QueueItem>(myQueueItem);
using (ClientContext ctx = new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret))
{
MergePDFs(ctx, queueItem, log);
}
It’s this simple! Now to use the service, just send it an object that matches the following format
{
"SiteUrl": "https://contoso.sharepoint.com/sites/testsite",
"FolderPath": "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test",
"FileName": "MergeResult.pdf",
"FilesPathArray": [
"https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file1.pdf",
"https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file2.pdf"
]
}