Check SiteMaps XML URL Nodes for broken links C#
How to check broken links on website using C#
Sometimes, you want to check and confirm if every link or URL listed in the SiteMap of the website is valid (exists) or not. Basic idea is to find any broken links on your website using C# .net code. Below is the sample code which checks the response of an URL and returns a boolean value as URLStatus.
– Use Head method request
– Check the response of an URL
– if status is OK then valid else not exists
public bool IsURLExists(string url)
{
System.Net.WebRequest webRequest = System.Net.WebRequest.Create(url);
webRequest.Method = "HEAD";
try
{
using (System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)webRequest.GetResponse())
{
if (response.StatusCode.ToString() == "OK")
{
return true;
}
return false;
}
}
catch
{
return false;
}
}
Read XML Sitemap and validate URLs using C#
Usually, there are 4 sub-nodes inside the SiteMap <url> node i.e., <loc>, <priority>, <lastmod>, and <changefreq>.
So, Read all SiteMap’s XML and get a list of URLs that can be found in <loc> node. Check the response of each URL through code and generate a report or list of broken links.
The below steps explain how to read SiteMap and validate each URL in the XML, in the .Net application using C# code.
Step 1: Create Property Class
Use this property class to store all URLs and their status as a list.
public class SiteMapURLReport
{
public string SiteMapURL { get; set; }
public string PostURL { get; set; }
public bool IsURLExist { get; set; }
}
Step 2: Get SiteMaps XML List
Examples:
– https://yourdomain.com/post-sitemap2.xml
and so on…..
If you have different naming convention then pass it manually.
private Dictionary<string, string> SitemapXMLList()
{
Dictionary<string, string> sitemapLists = new Dictionary<string, string>();
int count = 1;
for (int i = 1; i <= count; i++)
{
try
{
string sitemapURL = "https://YourDomainName.com/post-sitemap" + i + ".xml";
// Create a new instance of the System.Net Webclient
WebClient wc = new WebClient();
// Set the Encodeing on the Web Client
wc.Encoding = System.Text.Encoding.UTF8;
// Download the document as a string
sitemapLists.Add(sitemapURL, wc.DownloadString(sitemapURL));
count += 1;
}
catch{ }
}
return sitemapLists;
}
Step 3: Read and validate URLs in each XML
– go through each URL nodes in the XML – Nested Child loop
– Check if link is valid / exists or not
– store all URL status in list and display output
public List<SiteMapURLReport> CheckWPSiteMapURLExists()
{
List<SiteMapURLReport> urlReport = new List<SiteMapURLReport>();
// Get SiteMapXML list
var sitemapList = SitemapXMLList();
// loop through the SiteMapXML list
foreach (var sitemap in sitemapList)
{
try
{
/*Create a new xml document*/
XmlDocument xmldoc = new XmlDocument();
/*Load the downloaded string as XML*/
xmldoc.LoadXml(sitemap.Value);
/*Create an list of XML nodes from the url nodes in the sitemap*/
XmlNodeList xmlSitemapList = xmldoc.GetElementsByTagName("url");
/*Loops through the node list and store the URL status*/
foreach (XmlNode node in xmlSitemapList)
{
if (node["loc"] != null)
{
// Get URL
string currURL = Convert.ToString(node["loc"].InnerText);
// Add result with url status in the list
urlReport.Add(new SiteMapURLReport
{
SiteMapURL = sitemap.Key,
PostURL = currURL,
IsURLExist = IsURLExists(currURL)
});
}
}
}
catch { }
}
return urlReport;
}
using System.Net;
public bool IsURLExists(string url)
{
WebRequest webRequest = WebRequest.Create(url);
webRequest.Method = "HEAD";
try
{
using (HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse())
{
if (response.StatusCode.ToString() == "OK")
{
return true;
}
return false;
}
}
catch
{
return false;
}
}
