DOM manipulation

Im trying to use the DOM in PHP to do a pretty specific job and Ive got no luck so far, the objective is to take a string of HTML from a WordPress blog post (from the DB, this is a wordpress plugin). And then out of that HTML replace <div id="do_not_edit">old content</div>" with <div id="do_not_edit">new content</div>" in its place. Saving anything above and below that div in its structure.

Then save the HTML back into the DB, should be simple really, I have read that a regex wouldnt be the right way to go here so Ive turned to the DOM instead.

Read More

The problem is I just cant get it to work, cant extract the div or anything.

Help me!!

UPDATE

The HTML coming out of the wordpress table looks like:

Congratulations on finding us here on the world wide web, we are on a  mission to create a website that will show off your culinary skills  better than any other website does.

<div id="do_not_edit">blah blah</div>
We want this website to be fun and  easy to use, we strive for simple elegance and incredible functionality.We aim to provide a 'complete package'. By this we want to create a  website where people can meet, share ideas and help each other out.

After several different (incorrect) workings all Ive got below is:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));        

$doc = new DOMDocument();
$doc->validateOnParse = true; 
$doc->loadHTMLFile($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

Related posts

Leave a Reply

2 comments

  1. If you are sure that the HTML from WordPress contains only one div, the following should work:

    $doc = new DOMDocument();
    $doc->validateOnParse = false; 
    $doc->loadHTML($content);
    $divs = $doc->getElementsByTagName('div');
    echo $divs->item(0)->textContent;
    

    If not, try:

    $doc = new DOMDocument();
    $doc->validateOnParse = false; 
    $doc->loadHTML($content);
    $divs = $doc->getElementsByTagName('div');
    
    for($i=0; $i<$divs->length; $i++)
    {
      $id = $divs->item($i)->attributes->getNamedItem('id');
      if($id && $id->value == 'do_not_edit')
      {
        //your code here...
        $node = $divs->item($i);
        $newText = new DOMText("This is some new content");
    
        $node->appendChild($newText);
        $node->removeChild($node->firstChild);
        break;
      }
    }
    
    $html = $doc->saveHTML();
    
  2. Your HTML is not a complete HTML document, which is what DOMDocument expects. One option would be to wrap your HTML so it’s a complete document:

    $content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));
    
    $content = '<html><head><title></title></head><body>'.$content.'</body></html>';
    
    $doc = new DOMDocument();
    $doc->validateOnParse = false; 
    $doc->loadHTML($content);
    $element = $doc->getElementById('do_not_edit');
    echo $element;
    

    It’s a bit hacky, but might easily solve the problem.