How to extract images from PDF using iText7 in [C#] .Net

No Comment

In today’s digital world, PDF (Portable Document Format) files have become a common medium for sharing and storing documents. Often, these PDF files contain important images that we may need to extract for various purposes. In this article, we will explore how to extract images from PDF files using iText7, a powerful and popular PDF library, in C# .NET.

Install iText7 package in .Net Application

Step 1: Setting up the Project:

Create a new C# .NET project in your preferred development environment. Ensure that you have added the iText7 library to your project.

Extract images from PDF using iText7 in C# code

or use the following command line in Console to install the package

Install-Package itext7 -Version 7.2.5

iText7 library in C# – code Implementation

Step 2: Implement IEventListener

Use ImageRenderInfo to capture images in IEventListener class.
Create a new class called “ImageEventListener” and implement the IEventListener event as shown in the below code. While processing PDF pages, this event will get fired if the EventData is an Image type. Implement logic as per your requirement. Sample code shows saving all images in an output folder.

using iText.Kernel.Pdf.Canvas.Parser.Data;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Xobject;

public class ImageEventListener : IEventListener
{
	public void EventOccurred(IEventData eventData, EventType type)
	{
		if (eventData is ImageRenderInfo imageRenderInfo)
		{
			try
			{
				string extractImagToDir = "C:\\Users\\Desktop\\OutputFiles\\";

				if (imageRenderInfo.GetImage() != null)
				{
					PdfImageXObject imageXObject = imageRenderInfo.GetImage();
					
					File.WriteAllBytes(extractImagToDir + DateTime.Now.ToString("yyyyMMddHHmmssfff") + ".jpg", imageXObject.GetImageBytes());
				}
			}
			catch (Exception ex)
			{
				//LogError(ex.Message);
			}
		}
	}

	public ICollection<EventType> GetSupportedEvents()
	{
		return null;
	}
}

How to invoke IEventListener in C# code

Step 3: Load the PDF Document & Extracting Images via Call IEventListener.

The below code shows how to open & read a PDF document and process each page’s contents using PdfCanvasProcessor which will trigger ImageEventListener class.

using iText.Kernel.Geom;
using iText.Kernel.Pdf.Xobject;
using iText.Kernel.Pdf;
using iText.Layout.Element;
using iText.Kernel.Pdf.Canvas.Parser; 

public static void ExtractAllImagesFromPDF(string filePath)
{
	//filePath= "C:\\Users\\Desktop\\TestFile.PDF";

	using (FileStream fs = File.Open(filePath, FileMode.Open))
	{
		PdfReader pdfReader = new PdfReader(fs);
		PdfDocument pdfDocument = new PdfDocument(pdfReader);

		var eventListener = new ImageEventListener();
		PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(eventListener);

		for (int pageNumber = 1; pageNumber <= pdfDocument.GetNumberOfPages(); pageNumber++)
		{
			// this will invoke ImageEventListener
			canvasProcessor.ProcessPageContent(pdfDocument.GetPage(pageNumber));
		}
	}
}

Code Explanation:

Open a PDF file using FileStream
Create an instance of PdfReader and PdfDocument class to read a file.
Create an EventListener and pass it as a parameter in PdfCanvasProcessor class instance.
go through each page via for loop and process the page’s content which will invoke the EventListener event.
In ImageEventListener class, check if EventData is an Image type
If it is ImageRenderInfo type then get the image using PdfImageXObject
get all image bytes by GetImageBytes() method and save it in the output directory.

iText7 uses an AGPL license which is a free/open source software (F/OSS). See also GNU General Public License (GNU GPL) Supported by the Free Software Foundation. However, a commercial license is also available if you want to commercialize your project.

read also:

– What is Dapper micro-ORM in C#?

– How to Integrate MailChimp v3.0 API in DotNet Core C#

– Export DataSet To Excel in C# .Net Core – OpenXml

– Dynamically generate anchor tag in div HTML javascript

– Easy Integration of WordPress API in C# .Net

– Check SiteMaps XML URL Nodes for broken links C#

– Insert tab spaces characters in text – HTML CSS

– Display an Image tag after countdown timer finishes JavaScript

– How To Use Dynamic Parameters In Dapper 2.0 In C# .Net Core?

– What is an API – Application Programming Interface

– SonarLint Code Clean-up for MS Visual Studio & Cyclomatic Complexity Explained