Home » C# » How to extract images from PDF using iText7 in [C#] .Net

How to extract images from PDF using iText7 in [C#] .Net


In today’s digital world, PDF (Portable Document Format) files have become a common medium for sharing and storing documents. Often, these PDF files contain important images that we may need to extract for various purposes. In this article, we will explore how to extract images from PDF files using iText7, a powerful and popular PDF library, in C# .NET.




Install iText7 package in .Net Application


Step 1: Setting up the Project:


Create a new C# .NET project in your preferred development environment. Ensure that you have added the iText7 library to your project.

Extract images from PDF using iText7 in C# code

or use the following command line in Console to install the package

Install-Package itext7 -Version 7.2.5



iText7 library in C# – code Implementation


Step 2: Implement IEventListener


Use ImageRenderInfo to capture images in IEventListener class.
Create a new class called “ImageEventListener” and implement the IEventListener event as shown in the below code. While processing PDF pages, this event will get fired if the EventData is an Image type. Implement logic as per your requirement. Sample code shows saving all images in an output folder.

using iText.Kernel.Pdf.Canvas.Parser.Data;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Xobject;

public class ImageEventListener : IEventListener
{
	public void EventOccurred(IEventData eventData, EventType type)
	{
		if (eventData is ImageRenderInfo imageRenderInfo)
		{
			try
			{
				string extractImagToDir = "C:\\Users\\Desktop\\OutputFiles\\";

				if (imageRenderInfo.GetImage() != null)
				{
					PdfImageXObject imageXObject = imageRenderInfo.GetImage();
					
					File.WriteAllBytes(extractImagToDir + DateTime.Now.ToString("yyyyMMddHHmmssfff") + ".jpg", imageXObject.GetImageBytes());
				}
			}
			catch (Exception ex)
			{
				//LogError(ex.Message);
			}
		}
	}

	public ICollection<EventType> GetSupportedEvents()
	{
		return null;
	}
} 


How to invoke IEventListener in C# code


Step 3: Load the PDF Document & Extracting Images via Call IEventListener.


The below code shows how to open & read a PDF document and process each page’s contents using PdfCanvasProcessor which will trigger ImageEventListener class.

using iText.Kernel.Geom;
using iText.Kernel.Pdf.Xobject;
using iText.Kernel.Pdf;
using iText.Layout.Element;
using iText.Kernel.Pdf.Canvas.Parser; 

public static void ExtractAllImagesFromPDF(string filePath)
{
	//filePath= "C:\\Users\\Desktop\\TestFile.PDF";

	using (FileStream fs = File.Open(filePath, FileMode.Open))
	{
		PdfReader pdfReader = new PdfReader(fs);
		PdfDocument pdfDocument = new PdfDocument(pdfReader);

		var eventListener = new ImageEventListener();
		PdfCanvasProcessor canvasProcessor = new PdfCanvasProcessor(eventListener);

		for (int pageNumber = 1; pageNumber <= pdfDocument.GetNumberOfPages(); pageNumber++)
		{
			// this will invoke ImageEventListener
			canvasProcessor.ProcessPageContent(pdfDocument.GetPage(pageNumber));
		}
	}
} 

Code Explanation:

  • Open a PDF file using FileStream
  • Create an instance of PdfReader and PdfDocument class to read a file.
  • Create an EventListener and pass it as a parameter in PdfCanvasProcessor class instance.
  • go through each page via for loop and process the page’s content which will invoke the EventListener event.
  • In ImageEventListener class, check if EventData is an Image type
  • If it is ImageRenderInfo type then get the image using PdfImageXObject
  • get all image bytes by GetImageBytes() method and save it in the output directory.

iText7 uses an AGPL license which is a free/open source software (F/OSS). See also GNU General Public License (GNU GPL) Supported by the Free Software Foundation. However, a commercial license is also available if you want to commercialize your project.




Leave a Reply

Your email address will not be published. Required fields are marked *