I was following this question on how to retrieve all tags in PHP.
Specifically (under wordpress), I’d like to find all <pre>
tags, with all the available information (attributes and text). However, it seems that I’m not that skilled in preg_match, so I’m turning to you.
My text does contain various <pre>
tags, some with attributes, some with just text. My function is this:
function getPreTags($string) {
$pattern = "/<pres?(.*)>(.*)</pre>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
I’ve reduced to a test with just one <pre>
tag, but I get count(getPreTags(myHTMLbody)) = 0
, and I don’t know why. This is the test string:
<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever <</span> I've written >> here <span class="something">should be taken care of</span></pre>
Any hint?
Cheers!
As ever, parsing HTML with regex is never going to cut it. There are so many things to take into account (tag-soup, spacing:
<pre>
==< pre >
==<ntsPrEnn>
…), any regex will fail you at some point. That’s why there are such things as parsers, readily available.That said: I have no idea why the other answers go through the trouble of using an instance of
DOMXPath
, when you need allpre
tags, including those without attributes.I’d go for something more simple, like:
What methods and properties are available to you can be found easily on these pages:
DOMAttr
class docsDOMNode
class docsDOMDocument
class docsYou should better use DOM parser for parsing out HTML. Consider this code:
OUTPUT:
If the data is XML-conform, you could maybe use a XPATH expression.
Just a very quick one:
And then a PHP like this:
This should also work with html/xml snippets.