Bit of background, we manage a network of 25 different WP sites that all run off the same codebase. Recently we had an SEO analyst join and he noticed a couple of the sites had weird 404 issues, for URLs like this:
**/category/featured-article/ryan-mcnamara-new-different/news/page/2/**
So I disabled all plugins and hooks, tried on a fresh install, and this stuff is still happening. Turns out it only happens the site’s permalink structure ends in .html. So I delved into the rewrite code and turns out this is whats happening for the URL: **/category/featured-article/ryan-mcnamara-new-different/news**
- If the permalink structure is
**/%category%/%postname%/**
, then of the
available rewrite rules ($wp_rewrite->rewrite_rules()
) then this rule
is matched:**(.+?)/([^/]+)(/[0-9]+)?/?$**
, causing a 404 as expected. - If the permalink structure is
**/%category%/%postname%.html**
, then this
rule is matched:**(.+?)/?$**
, which maps to
**index.php?category_name=$matches[1]**
, hence why the category is
rendered When users/bots find these category pages and click the
pagination links they get taken to
**/category/featured-article/ryan-mcnamara-new-different/news/page/2/**
, which
causes a 404.
The first question is how are people finding these pages in the first place, which is an issue I can deal with. The question for this forum is this a bug with the rewrites WP has as default OR should the paginate_links function be smarter about creating pagination URLs? Has anybody ever seen this problem before?
Caveats: No, I can’t force all sites to drop the .html, and no I don’t have the ability to change core WP code for this problem