In the process of making a WordPress post parser for my personal website but hitting some behaviour I can’t explain.
Here’s the code:
// WordPress uses <p></p> sections for new lines
$sections = $doc->getElementsByTagName('p');
foreach ($sections as $section)
{
$hasChilderen = $section->hasChildNodes();
$contents = $section->nodeValue;
// If we have text, assume we are a paragraph (for the time being)
if (!empty($contents))
{
$section->setAttribute('class', 'post-inner-content-paragraph');
}
elseif ($hasChilderen)
{
$section->setAttribute('class', 'post-inner-content-media');
$section = change_tag_name($section, 'div');
$imgs = $section->getElementsByTagName('img');
foreach ($imgs as $img)
{
$img->removeAttribute('class');
}
}
else
{
$section->setAttribute('class', 'post-inner-content-empty');
}
}
change_tag_name
:
function change_tag_name($node, $name)
{
$doc = $node->ownerDocument;
$newnode = $doc->createElement($name);
foreach ($node->childNodes as $child)
{
$child = $doc->importNode($child, true);
$newnode->appendChild($child);
}
if ($node->hasAttributes())
{
foreach ($node->attributes as $attr)
{
$name = $attr->nodeName;
$value = $attr->nodeValue;
$newnode->setAttribute($name, $value);
}
}
$node->parentNode->replaceChild($newnode, $node);
return $newnode;
}
There’s no way for a <p>
block to passed as a section and NOT get an attribute assigned to it, however:
The highlighted <p>
block doesn’t have a class!
Here’s the HTML loaded into the DOMDocument
$dom
: http://pastebin.com/biVSyWn9
Here’s the HTML leaving my parse function: http://pastebin.com/RhzgeWAS
I can’t detect any reason why this particular <p>
block isn’t being set a class.
I ran this using DOMDocument (assuming that you’re using it for parsing). I also commented out your change_tag_name function since the source code for that was not posted.
It works. I got
class
attributes added to all the<p>
tags.Now, as to why it doesn’t work for you, I can think of only two reasons:
<p>
just before the one that doesn’t work is not recognized due to some reason. Because of this, the parser reads the next<p>
tag as part of the previous<p>
.change_tag_name
function may be doing something which it is not intended to do (highly unlikely, but that is something you may want to rule out).Solution
You have to traverse the node list backwards in order to make the kind of changes I want to do. crnix’s answer helped identify that the problem occured with
replaceChild
within thechange_tag_name
function. Changing myforeach
loop to the following fixed my issue: