Warning: preg_replace(): Unknown modifier

I have the following error:

Warning: preg_replace(): Unknown modifier ‘]’ in xxx.php on line 38

Read More

This is the code on line 38:

<?php echo str_replace("</ul></div>", "", preg_replace("<div[^>]*><ul[^>]*>", "", wp_nav_menu(array('theme_location' => 'nav', 'echo' => false)) )); ?>

How can I fix this problem?

Related posts

Leave a Reply

3 comments

  1. Why the error occurs

    In PHP, a regular expression needs to be enclosed within a pair of delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character; /, #, ~ are the most commonly used ones. Note that it is also possible to use bracket style delimiters where the opening and closing brackets are the starting and ending delimiter, i.e. <pattern_goes_here>, [pattern_goes_here] etc. are all valid.

    The “Unknown modifier X” error usually occurs in the following two cases:

    • When your regular expression is missing delimiters.

    • When you use the delimiter inside the pattern without escaping it.

    In this case, the regular expression is <div[^>]*><ul[^>]*>. The regex engine considers everything from < to > as the regex pattern, and everything afterwards as modifiers.

    Regex: <div[^>  ]*><ul[^>]*>
           │     │  │          │
           └──┬──┘  └────┬─────┘
           pattern    modifiers
    

    ] here is an unknown modifier, because it appears after the closing > delimiter. Which is why PHP throws that error.

    Depending on the pattern, the unknown modifier complaint might as well have been about *, +, p, / or ) or almost any other letter/symbol. Only imsxeADSUXJu are valid PCRE modifiers.

    How to fix it

    The fix is easy. Just wrap your regex pattern with any valid delimiters. In this case, you could chose ~ and get the following:

    ~<div[^>]*><ul[^>]*>~
    │                   │
    │                   └─ ending delimiter
    └───────────────────── starting delimiter
    

    If you’re receiving this error despite having used a delimiter, it might be because the pattern itself contains unescaped occurrences of the said delimiter.

    Or escape delimiters

    /foo[^/]+bar/i would certainly throw an error. So you can escape it using a backslash if it appears anywhere within the regex:

    /foo[^/]+bar/i
    │      │     │
    └──────┼─────┴─ actual delimiters
           └─────── escaped slash(/) character
    

    This is a tedious job if your regex pattern contains so many occurrences of the delimiter character.

    The cleaner way, of course, would be to use a different delimiter altogether. Ideally a character that does not appear anywhere inside the regex pattern, say ##foo[^/]+bar#i.

    More reading:

  2. Other examples

    The reference answer already explains the reason for “Unknown modifier” warnings. This is just a comparison of other typical variants.

    • When forgetting to add regex /delimiters/, the first non-letter symbol will be assumed to be one. Therefore the warning is often about what follows a grouping (…), […] meta symbol:

      preg_match("[a-zA-Z]+:s*.$"
                  ↑      ↑⬆
      
    • Sometimes your regex already uses a custom delimiter (: here), but still contains the same character as unescaped literal. It’s then mistaken as premature delimiter. Which is why the very next symbol receives the “Unknown modifier ❌” trophy:

      preg_match(":[[d:/]+]:"
                  ↑     ⬆     ↑
      
    • When using the classic / delimiter, take care to not have it within the regex literally. This most frequently happens when trying to match unescaped filenames:

      preg_match("/pathname/filename/i"
                  ↑        ⬆         ↑
      

      Or when matching angle/square bracket style tags:

      preg_match("/<%tmpl:id>(.*)</%tmpl:id>/Ui"
                  ↑               ⬆         ↑
      
    • Templating-style (Smarty or BBCode) regex patterns often require {…} or […] brackets. Both should usually be escaped. (An outermost {} pair being the exception though).

      They also get misinterpreted as paired delimiters when no actual delimiter is used. If they’re then also used as literal character within, then that’s, of course … an error.

      preg_match("{bold[^}]+}"
                  ↑      ⬆  ↑
      
    • Whenever the warning says “Delimiter must not be alphanumeric or backslash” then you also entirely forgot delimiters:

      preg_match("ab?c*"
                  ↑
      
    • Unkown modifier ‘g’” often indicates a regex that was copied verbatimly from JavaScript or Perl.

      preg_match("/abc+/g"
                        ⬆
      

      PHP doesn’t use the /g global flag. Instead the preg_replace function works on all occurences, and preg_match_all is the “global” searching pendant to the one-occurence preg_match.

      So, just remove the /g flag.

      See also:
      · Warning: preg_replace(): Unknown modifier ‘g’
      · preg_replace: bad regex == ‘Unknown Modifier’?

    • A more peculiar case pertains the PCRE_EXTENDED /x flag. This is often (or should be) used for making regexps more lofty and readable.

      This allows to use inline # comments. PHP implements the regex delimiters atop PCRE. But it doesn’t treat # in any special way. Which is how a literal delimiter in a # comment can become an error:

      preg_match("/
         ab?c+  # Comment with / slash in between
      /x"
      

      (Also noteworthy that using # as #abc+#x delimiter can be doubly inadvisable.)

    • Interpolating variables into a regex requires them to be pre-escaped, or be valid regexps themselves. You can’t tell beforehand if this is gonna work:

       preg_match("/id=$var;/"
                   ↑    ↺   ↑
      

      It’s best to apply $var = preg_quote($var, "/") in such cases.

      See also:
      · Unknown modifier ‘/’ in …? what is it?

      Another alternative is using Q…E escapes for unquoted literal strings:

       preg_match("/id=Q{$var}E;/mix");
      

      Note that this is merely a convenience shortcut for meta symbols, not dependable/safe. It would fall apart in case that $var contained a literal 'E' itself (however unlikely). And it does not mask the delimiter itself.

    • Deprecated modifier /e is an entirely different problem. This has nothing to do with delimiters, but the implicit expression interpretation mode being phased out. See also: Replace deprecated preg_replace /e with preg_replace_callback

    Alternative regex delimiters

    As mentioned already, the quickest solution to this error is just picking a distinct delimiter. Any non-letter symbol can be used. Visually distinctive ones are often preferred:

    • ~abc+~
    • !abc+!
    • @abc+@
    • #abc+#
    • =abc+=
    • %abc+%

    Technically you could use $abc$ or |abc| for delimiters. However, it’s best to avoid symbols that serve as regex meta characters themselves.

    The hash # as delimiter is rather popular too. But care should be taken in combination with the x/PCRE_EXTENDED readability modifier. You can’t use # inline or (?#…) comments then, because those would be confused as delimiters.

    Quote-only delimiters

    Occassionally you see " and ' used as regex delimiters paired with their conterpart as PHP string enclosure:

      preg_match("'abc+'"
      preg_match('"abc+"'
    

    Which is perfectly valid as far as PHP is concerned. It’s sometimes convenient and unobtrusive, but not always legible in IDEs and editors.

    Paired delimiters

    An interesting variation are paired delimiters. Instead of using the same symbol on both ends of a regex, you can use any <...> (...) [...] {...} bracket/braces combination.

      preg_match("(abc+)"   # just delimiters here, not a capture group
    

    While most of them also serve as regex meta characters, you can often use them without further effort. As long as those specific braces/parens within the regex are paired or escaped correctly, these variants are quite readable.

    Fancy regex delimiters

    A somewhat lazy trick (which is not endorsed hereby) is using non-printable ASCII characters as delimiters. This works easily in PHP by using double quotes for the regex string, and octal escapes for delimiters:

     preg_match("01 abc+ 01mix"
    

    The 01 is just a control character ␁ that’s not usually needed. Therefore it’s highly unlikely to appear within most regex patterns. Which makes it suitable here, even though not very legible.

    Sadly you can’t use Unicode glyps ❚ as delimiters. PHP only allows single-byte characters. And why is that? Well, glad you asked:

    PHPs delimiters atop PCRE

    The preg_* functions utilize the PCRE regex engine, which itself doesn’t care or provide for delimiters. For resemblence with Perl the preg_* functions implement them. Which is also why you can use modifier letters /ism instead of just constants as parameter.

    See ext/pcre/php_pcre.c on how the regex string is preprocessed:

    • First all leading whitespace is ignored.

    • Any non-alphanumeric symbol is taken as presumed delimiter. Note that PHP only honors single-byte characters:

      delimiter = *p++;
      if (isalnum((int)*(unsigned char *)&delimiter) || delimiter == '') {
              php_error_docref(NULL,E_WARNING, "Delimiter must not…");
              return NULL;
      }
      
    • The rest of the regex string is traversed left-to-right. Only backslash \-escaped symbols are ignored. Q and E escaping is not honored.

    • Should the delimiter be found again, the remainder is verified to only contain modifier letters.

    • If the delimiter is one of the ([{< )]}> )]}> pairable braces/brackets, then the processing logic is more elaborate.

      int brackets = 1;   /* brackets nesting level */
      while (*pp != 0) {
              if (*pp == '' && pp[1] != 0) pp++;
              else if (*pp == end_delimiter && --brackets <= 0)
                      break;
              else if (*pp == start_delimiter)
                      brackets++;
              pp++;
      }
      

      It looks for correctly paired left and right delimiter, but ignores other braces/bracket types when counting.

    • The raw regex string is passed to the PCRE backend only after delimiter and modifier flags have been cut out.

    Now this is all somewhat irrelevant. But explains where the delimiter warnings come from. And this whole procedure is all to have a minimum of Perl compatibility. There are a few minor deviations of course, like the […] character class context not receiving special treatment in PHP.

    More references

  3. If you would like to get an exception (MalformedPatternException), instead of warnings or using preg_last_error() – consider using T-Regx library:

    <?php
    try 
    {
        return pattern('invalid] pattern')->match($s)->all();
    }
    catch (MalformedPatternException $e) 
    {
        // your pattern was invalid
    }