Improve Regex To Identify WordPress Site

So I am writing a little snippet to identify a wordpress site first by regex then will try accessing the login page etc.

Could this be optimized any better? Should I account for blank spaces in between attributes?

Regex wordPressPattern = new Regex("(<meta name="generator" content="WordPress)| (Powered by <a href="http://www.wordpress.org")+", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);

Related posts

Leave a Reply

1 comment

  1. A few improvements:

    • Take account for spaces
    • Remove the + at the end because that doesn’t make sense
    • Make www. optional
    • Make httpS (the S optional)
    • What if there is a single quote instead of double quote (which is valid HTML) ? We will use ("|')
    • name="generator" I think it isn’t relevant so we’ll use .*? and add [^>]*> at the end

    To sum it up:

    (<meta.*?contents*=s*("|')WordPress[^>]*>)|(Powereds+bys+<as+hrefs*=s*("|')http(s)?://(www.)?wordpress.org("|'))