PHP Regex expression excluding <pre> tag

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.

The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.

Read More

Could you modify this expression so that it won’t replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?

The PHP code is:

$text = preg_replace(
    "|(?!<[^<>]*?)(?<![?.&])b$acronymb(?!:)(?![^<>]*?>)|msU"
  , "<acronym title="$fulltext">$acronym</acronym>"
  , $text
);

Related posts

Leave a Reply

2 comments

  1. You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:

    (?s)<pre[^<]*>.*?</pre>(*SKIP)(*F)|b$acronymb
    

    This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.

    See demo on regex101.com

    Here is a sample PHP demo:

    <?php
    $acronym = "ASCII";
    $fulltext = "American Standard Code for Information Interchange";
    $re = "/(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b/"; 
    $str = "<pre>ASCIInSometextnMoretext</pre>More text nASCIInMore text<pre>MorenlinesnASCIInlines</pre>"; 
    $subst = "<acronym title="$fulltext">$acronym</acronym>"; 
    $result = preg_replace($re, $subst, $str);
    echo $result;
    

    Output:

    <pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
    
  2. It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:

    function replace($s) {
        return str_replace('"', '&quot;', $s); // do something with `$s`
    }
    
    $text = 'Your text goes here...';
    $parts = preg_split('#(</?[-:w]+(?:s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    $text = "";
    $x = 0;
    foreach ($parts as $v) {
        if (trim($v) === "") {
            $text .= $v;
            continue;
        }
        if ($v[0] === '<' && substr($v, -1) === '>') {
            if (preg_match('#^<(/)?(?:code|pre)(?:s[^<>]+?)?>$#', $v, $m)) {
                $x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
            }
            $text .= $v; // this is a HTML tag…
        } else {
            $text .= !$x ? replace($v) : $v; // process or skip…
        }
    }
    
    return $text;
    

    Taken from here.