Find all <pre> tags in PHP (with attributes)

I was following this question on how to retrieve all tags in PHP.

Specifically (under wordpress), I’d like to find all <pre> tags, with all the available information (attributes and text). However, it seems that I’m not that skilled in preg_match, so I’m turning to you.

Read More

My text does contain various <pre> tags, some with attributes, some with just text. My function is this:

function getPreTags($string) {
    $pattern = "/<pres?(.*)>(.*)</pre>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

I’ve reduced to a test with just one <pre> tag, but I get count(getPreTags(myHTMLbody)) = 0, and I don’t know why. This is the test string:

<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever &lt;</span> I've written &gt;&gt; here <span class="something">should be taken care of</span></pre>

Any hint?

Cheers!

Related posts

Leave a Reply

3 comments

  1. As ever, parsing HTML with regex is never going to cut it. There are so many things to take into account (tag-soup, spacing: <pre>==< pre >==<ntsPrEnn>…), any regex will fail you at some point. That’s why there are such things as parsers, readily available.

    That said: I have no idea why the other answers go through the trouble of using an instance of DOMXPath, when you need all pre tags, including those without attributes.
    I’d go for something more simple, like:

    $dom = new DOMDocument;
    $dom->loadHTML($htmlString);
    $preTags = $dom->getElementsByTagName('pre');
    foreach($preTags as $pre)
    {
        echo $pre->nodeValue, PHP_EOL;
        if ($pre->hasAttributes())
        {//if there are attributes
            foreach($pre->attributes as $attribute)
            {
                //do something with attribute
                echo 'Attribute: ', $attribute->name, ' = ', $attribute->value, PHP_EOL;
            }
        }
    }
    

    What methods and properties are available to you can be found easily on these pages:

  2. You should better use DOM parser for parsing out HTML. Consider this code:

    $html = <<< EOF
    <a href="http://example.com/foo.htm" class="curPage">Click link1</a> morestuff
    <pre>A    B    C</pre>
    <a href="http://notexample.com/foo/bar">notexample.com</a> morestuff
    <pre id="pre1">X    Y    Z</pre>
    <a href="http://example.com/foo.htm">Click link1</a>
    <pre id="pre2">1    2    3</pre>
    EOF;
    
    // create a new DOM object
    $doc = new DOMDocument();
    libxml_use_internal_errors(true);
    $doc->loadHTML($html); // loads your html
    $xpath = new DOMXPath($doc);
    
    // select all pre tags with attributes
    $nodelist = $xpath->query("//pre[@*]");
    
    // iterate through selected nodes and print them
    for($i=0; $i < $nodelist->length; $i++) {
        $node = $nodelist->item($i);
        var_dump($node->nodeValue);
    }
    

    OUTPUT:

    string(11) "X    Y    Z"
    string(11) "1    2    3"
    
  3. If the data is XML-conform, you could maybe use a XPATH expression.

    Just a very quick one:

    <?xml version="1.0" encoding="UTF-8"?>
    <html>
      <head>
        <title>Test</title>
      </head>
      <body>
        <pre>1</pre>
        <pre>2</pre>
        <pre>3</pre>
      </body>
    </html>
    

    And then a PHP like this:

    <?php
            $xmldoc = new DOMDocument();
            $xmldoc->load('test.xml');
    
            $xpathvar = new Domxpath($xmldoc);
    
    echo $xpathvar->evaluate('count(*//pre)');
    ?>
    

    This should also work with html/xml snippets.