Home » C# » Check SiteMaps XML URL Nodes for broken links C#

Check SiteMaps XML URL Nodes for broken links C#


How to check broken links on website using C#

Sometimes, you want to check and confirm if every link or URL listed in the SiteMap of the website is valid (exists) or not. Basic idea is to find any broken links on your website using C# .net code. Below is the sample code which checks the response of an URL and returns a boolean value as URLStatus.

Basic Steps:
Add System.Net namespace
Use Head method request
Check the response of an URL
if status is OK then valid else not exists

public bool IsURLExists(string url)
{
	System.Net.WebRequest webRequest = System.Net.WebRequest.Create(url);
	webRequest.Method = "HEAD";
	try
	{
		using (System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)webRequest.GetResponse())
		{
			if (response.StatusCode.ToString() == "OK")
			{
				return true;
			}
			return false;
		}
	}
	catch
	{
		return false;
	}
} 


Read XML Sitemap and validate URLs using C#

Usually, there are 4 sub-nodes inside the SiteMap <url> node i.e., <loc>, <priority>, <lastmod>, and <changefreq>.
So, Read all SiteMap’s XML and get a list of URLs that can be found in <loc> node. Check the response of each URL through code and generate a report or list of broken links.

The below steps explain how to read SiteMap and validate each URL in the XML, in the .Net application using C# code.

Step 1: Create Property Class

Use this property class to store all URLs and their status as a list.


public class SiteMapURLReport
{
	public string SiteMapURL { get; set; }
	public string PostURL { get; set; } 
	public bool IsURLExist { get; set; }
} 

Step 2: Get SiteMaps XML List

Usually, websites contain one or more SiteMaps (incremental number at the end).
Examples:
https://yourdomain.com/post-sitemap1.xml
https://yourdomain.com/post-sitemap2.xml
and so on…..
The below code shows how ‘for loop’ reads the SiteMap file by incrementing the number at the end each time and stores the response as an XML string until it gets an exception.
If you have different naming convention then pass it manually.

private Dictionary<string, string> SitemapXMLList()
{
	Dictionary<string, string> sitemapLists = new Dictionary<string, string>();
	int count = 1;
	for (int i = 1; i <= count; i++)
	{
		try
		{
			string sitemapURL = "https://YourDomainName.com/post-sitemap" + i + ".xml";

			// Create a new instance of the System.Net Webclient
			WebClient wc = new WebClient();

			// Set the Encodeing on the Web Client
			wc.Encoding = System.Text.Encoding.UTF8;

			// Download the document as a string
			sitemapLists.Add(sitemapURL, wc.DownloadString(sitemapURL));

			count += 1;
		}
		catch{ }
	}
	return sitemapLists;
} 

Step 3: Read and validate URLs in each XML

go through each XML string – Parent loop
go through each URL nodes in the XML – Nested Child loop
Check if link is valid / exists or not
store all URL status in list and display output

public List<SiteMapURLReport> CheckWPSiteMapURLExists()
{
	List<SiteMapURLReport> urlReport = new List<SiteMapURLReport>();

	// Get SiteMapXML list
	var sitemapList = SitemapXMLList();

	// loop through the SiteMapXML list
	foreach (var sitemap in sitemapList)
	{
		try
		{
			/*Create a new xml document*/
			XmlDocument xmldoc = new XmlDocument();
			/*Load the downloaded string as XML*/
			xmldoc.LoadXml(sitemap.Value);
			/*Create an list of XML nodes from the url nodes in the sitemap*/
			XmlNodeList xmlSitemapList = xmldoc.GetElementsByTagName("url");

			/*Loops through the node list and store the URL status*/
			foreach (XmlNode node in xmlSitemapList)
			{
				if (node["loc"] != null)
				{
					// Get URL 
					string currURL = Convert.ToString(node["loc"].InnerText);

					// Add result with url status in the list 
					urlReport.Add(new SiteMapURLReport
					{
						SiteMapURL = sitemap.Key,
						PostURL = currURL,
						IsURLExist = IsURLExists(currURL)
					});
				}
			}
		}
		catch { }
	}

	return urlReport;
} 
Code to check whether the webpage link exists or not.

using System.Net;

public bool IsURLExists(string url)
{
	WebRequest webRequest = WebRequest.Create(url);
	webRequest.Method = "HEAD";
	try
	{
		using (HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse())
		{
			if (response.StatusCode.ToString() == "OK")
			{
				return true;
			}
			return false;
		}
	}
	catch
	{
		return false;
	}
} 

Final Output:
sitemap-url-check-is-exists