How to Find (And Fix) Orphan Pages
What is an Orphan Page?
An orphan page is a page on a website that is not linked to by any other page on the site. Think of the internet like a perfectly built spider web, each strand connected to another. Now imagine, a couple feet away from the web, a strand of silk hanging mid-air, all by itself. It’s still a piece of web, and would be helpful to a spider if the spider could reach it, but this spider can’t jump, and the strand of silk is useless. This strand of silk is an orphan page.
Orphan pages are rarely stumbled upon by users. This is because a user would have to access the page directly (via URL search) or via sitemap, which doesn’t tend to happen.
Some orphan pages are orphaned intentionally. These are private pages used by webmasters that aren’t intended for users to stumble upon. But we won’t worry about these pages in this post.
Why Should I Care?
At Mockingbird, checking for orphan pages is part of our technical audit. It’s one of the many indicators we use at the very beginning of an engagement to asses a client’s website health. Lots of orphan pages = website health could be improved. Why is this the case?
- You might have valuable pages orphaned. Sometimes this happens accidentally. This could mean that you have great content on your site, but, as it isn’t linked to, a user will never find it naturally. This is bad for the user, but not only this, you’re missing out on the potential online credibility coming from your valuable content. People don’t link to pages that they can’t find. Search engines wont have the opportunity to recognize you as an online authority on any subject if your best pages aren’t getting seen, linked to externally, or talked about.
- Orphan pages might bring penalties. This is a debated point among SEOs. Some speculate that, upon discovering orphan pages on a site, search engines will treat these pages as doorway pages (unnatural pages intended to rank artificially high for certain search terms to bring in users), and penalize the site. Most disagree, but in this case it’s worthwhile to error on the side of caution.
How Do I Identify Orphan Pages?
There are plenty of ways to identify orphan pages on your site, but no matter how you get the it, all you need is:
- A complete list of every page on your site
- A complete list of every crawlable page on your site.
For (1.) I use the xml sitemap*. If this sitemap is working correctly, it should be updating automatically each time a page is added to your site, regardless of whether or not it’s orphaned.
For (2.) I use Screaming Frog. Screaming Frog crawls the site as a Googlebot/Bingbot would. This means it starts at the homepage and works down, exploring each link it encounters on its way. Because Screaming Frog works in this way, it excludes pages that are not linked to on any other page. You called it, orphan pages.
Now that you have both a list of every page on your site, and a list of every crawlable page on your site, it’s time to compare. Bring both lists into an excel spreadsheet and run a duplicate check. All pages that don’t appear in your spreadsheet twice (these should be the pages that appear in your sitemap, but not Screaming Frog) are orphan pages.
What Do I Do Once I find Them?
This is the easy part. If you’ve found unintentionally orphaned pages on your site, assess their value. If an orphaned page has thin content, duplicate content, or is outdated, you’re better off without it. Noindex these pages. For valuable, relevant orphaned pages that you find, link to them from a natural page. Put yourself in the user’s shoes and imagine where your orphaned page would be the most helpful. If you discover an orphan page on your auto website called “Everything You Need to Know About Pistons”, your “Engine Parts” page would be a great candidate as a page to link from.
*In order to access this, just tack “/sitemap_index.xml/” on to the end of your homepage URL.