404 regular-expression advice needed

I have been working hard trying to understand regular-expressions to use in my web site using the redirection plugin. I have changed my permalinks a few times in the past year because I did not know better or had incorrect advice. The upshot of this is that Google web master tools shows hundreds of 404’s.

I decided to tackle fixing the 404 issue with Redirections and know that I’m past my experience to fix, and this website is the best I have come across in the past few days that seems to have knowledgeable and helpful experts on the subject.

Read More

I’d like to fix myself because I know my permalinks structures of the 404 errors based on the webmaster tools download, and am comfortable using regular expressions in the redirection plugin, but if this is beyond a scope of a DIY, I’d be very interested in paying someone to correct.

Here is the variety of former permalinks.

String = /2010/10/02/abcd-abcd-124/
String = /2010/10/02/abcd-abcd-124
String = /2010/10/02/abcd-abcd-abcd-2-1234/
String = /2010/10/abcd-abcd-abcd-2-1234/
String = /2010/10/02/abcd-abcd-abcd-2-123
String = /2010/10/02/abcd-abcd-abcd-2-123/
String = /2010/10/02/abcd-abcd/
String = /2010/10/02/abcd-abcd-2/
String = /2010/10/02/abcd-abcd/feed
String = abcd
String = /websitename/abcd  (recent, web crawl)

The bulk of the 404 errors are with this string
year/month/day/postname-postID/ (with and without forward slash)

What I want to do is strip the post ID from permalinks, redirect to a new permalink structure %year%%postname%

I have had some success with this regex:
string:/2010/01/02/green-reach-22-9137
regex: /2010/(d+)/(d+)/((.*W+))
result:/2010/01/02/green-reach-22-

however I cannot figure how to strip dash – before postID

I would be very grateful for any advice, I have tried for hours using regex tools to find an expression that works , but I’m stumped…..THANKS !!

Related posts

Leave a Reply

3 comments

  1. It seems you can have both post ID’s and numbers in the page slug. So my-post-2 could be my-post with post ID 2 or my-post-2 with no post ID. In theory there is no way to solve this, but in practice maybe we can assume that you won’t have ten similar slugs, so the slug number is at most one digit, and the post ID is at least two digits. This way you can add the few posts with IDs 1-9 with a complete non-regex redirect.

    This regex should match anything of the form /[year]/[month]/[optional-day]/[post-slug-with-optional single digit]-[optional post id][optional /]:

    ^/(d{4})/d{2}/(d{2}/)?([^/]+?)(-d{2,})?/?$
    
    ^/                                              Start of the string, one slash
      (d{4})                                       Four digits, saved for later
             /                                      One slash
              d{2}                                 Two digits, not saved
                   /                                One slash
                    (d{2}/)?                       Two digits and a slash, optional
                             ([^/]+?)               Everything that is not a slash, but not greedy
                                     (-d{2,})?     A dash and two or more digits, optional
                                               /?$  Optional slash and end of the string
    

    The year is in the first captured group, the post slug in the third.

    The “not greedy” is important because it makes sure that part of the regex doesn’t “eat up” the post-ID part that is matched after it. It is created by appending the ? to the +. I believe this does not work under Apache 1.3, so WordPress strips it out before it creates the mod_rewrite rules. So if it does not work when you put it in the Apache part of Redirection, try it in the WordPress part.

  2. Expanding on Jan’s answer:

    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(d{4})/(d{2})/(d{2}/)?(.+?)(-d{2,})?(/.+)?/?$ /$1/$4$5 [L,R=301]
    

    The $5 part is to catch occasional trailing /page/2 or /feed. Optionally toss in QSA in the rule’s parameters (i.e. [L,QSA,R=301]) to make it append the query string when one is present.

    Be sure to place the above rule before the stuff inserted by WP, too, else it’ll get ignored altogether.