Find all <pre> tags in PHP (with attributes)

March 16, 20231 Views

I was following this question on how to retrieve all tags in PHP.

Specifically (under wordpress), I’d like to find all <pre> tags, with all the available information (attributes and text). However, it seems that I’m not that skilled in preg_match, so I’m turning to you.

My text does contain various <pre> tags, some with attributes, some with just text. My function is this:

function getPreTags($string) {
    $pattern = "/<pres?(.*)>(.*)</pre>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

I’ve reduced to a test with just one <pre> tag, but I get count(getPreTags(myHTMLbody)) = 0, and I don’t know why. This is the test string:

<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever &lt;</span> I've written &gt;&gt; here <span class="something">should be taken care of</span></pre>

Any hint?

Cheers!

Post Views: 1

3 comments

Anonymous says:

March 16, 2023 at 11:45 pm
As ever, parsing HTML with regex is never going to cut it. There are so many things to take into account (tag-soup, spacing: <pre>==< pre >==<ntsPrEnn>…), any regex will fail you at some point. That’s why there are such things as parsers, readily available.

That said: I have no idea why the other answers go through the trouble of using an instance of DOMXPath, when you need all pre tags, including those without attributes.
I’d go for something more simple, like:
```
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$preTags = $dom->getElementsByTagName('pre');
foreach($preTags as $pre)
{
    echo $pre->nodeValue, PHP_EOL;
    if ($pre->hasAttributes())
    {//if there are attributes
        foreach($pre->attributes as $attribute)
        {
            //do something with attribute
            echo 'Attribute: ', $attribute->name, ' = ', $attribute->value, PHP_EOL;
        }
    }
}
```
What methods and properties are available to you can be found easily on these pages:
- Attributes: DOMAttr class docs
- Nodes: DOMNode class docs
- Document: DOMDocument class docs
Log in to Reply

You should better use DOM parser for parsing out HTML. Consider this code:

$html = <<< EOF
<a href="http://example.com/foo.htm" class="curPage">Click link1</a> morestuff
<pre>A    B    C</pre>
<a href="http://notexample.com/foo/bar">notexample.com</a> morestuff
<pre id="pre1">X    Y    Z</pre>
<a href="http://example.com/foo.htm">Click link1</a>
<pre id="pre2">1    2    3</pre>
EOF;

// create a new DOM object
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// select all pre tags with attributes
$nodelist = $xpath->query("//pre[@*]");

// iterate through selected nodes and print them
for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    var_dump($node->nodeValue);
}

OUTPUT:

string(11) "X    Y    Z"
string(11) "1    2    3"

If the data is XML-conform, you could maybe use a XPATH expression.

Just a very quick one:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <pre>1</pre>
    <pre>2</pre>
    <pre>3</pre>
  </body>
</html>

And then a PHP like this:

<?php
        $xmldoc = new DOMDocument();
        $xmldoc->load('test.xml');

        $xpathvar = new Domxpath($xmldoc);

echo $xpathvar->evaluate('count(*//pre)');
?>

This should also work with html/xml snippets.

Find all <pre> tags in PHP (with attributes)

Leave a Reply Cancel reply

3 comments

Social Network

Related posts

Leave a Reply Cancel reply

3 comments

Social Network