Change html structure of all img tags in WordPress

I have a WordPress blog and am trying to implement the foresight.js image script. In short, I need to target all post images, swap out the src, width, & height attributes with data-src, data-width, & data-height attributes. I then need to duplicate the image line and wrap it in <noscript> tags. This is the structure I’m trying to have WordPress generate/create:

<img data-src="wordpress/image/url/pic.jpg" data-width="{get width of image with PHP & pass-in that value here} data-height="{get height of image with PHP and pass-in that value here}" class="fs-img">
<noscript>
    <img src="wordpress/image/url/pic.jpg">
</noscript>

I have searched the WordPress codex and the best possible route I can find are to use filters (ie. ‘get_image_tag’ & ‘image_tag’) for modifying the html that WordPress outputs for each image. I’m thinking that one of these options should work, or that I can do some pattern matching with regex (I know, not ideal), throw in a preg_replace and then inject this back into the_content filter.

Read More

I have tried some of these options but cannot get any to work. Could someone please offer some help? Found one suggestion here, but can’t even get it to work!

‘get_image_tag’ attempt:

Found this particular one on the web, but it would need modified to fit my logic (see above structure)…can’t make sense of what the preg_replace array is doing on my own.

<?php function image_tag($html, $id, $alt, $title) {
    return preg_replace(array(
        '/'.str_replace('//','//',get_bloginfo('url')).'/i',
        '/s+width="d+"/i',
        '/s+height="d+"/i',
        '/alt=""/i'
    ),
    array(
        '',
        '',
        '',
        alt='"' . $title . '"'
    ),
    $html);
}
add_filter('get_image_tag', 'image_tag', 0, 4);
?>

Another ‘get_image_tag’ attempt:

<?php function get_image_tag($id, $alt, $title, $align, $size='full') {
    list($width, $height, $type, $attr) = getimagesize($img_src);
    $hwstring = image_hwstring($width, $height);

    $class = 'align' . esc_attr($align) . ' size-' . esc_attr($size) . ' wp-image-' . $id;
    $class = apply_filters('get_image_tag_class', $class, $id, $align, $size);

    $html = '<img src="' . esc_attr($img_src) . '" alt="' . esc_attr($alt) . '" title="' . esc_attr($title).'" data-width="' . $width . '" data-height="' . $height . '" class="' . $class . ' fs-img" />';
    $html = apply_filters( 'get_image_tag', $html, $id, $alt, $title, $align, $size);

    return $html;
}
?>

Pattern-matching attempt:

Tried creating my own regex on this one, but not sure if it’s correct.

<?php function restructure_imgs($content) {
    global $post;
    $pattern = "/<img(.*?)src=('|")(.*?).(bmp|gif|jpeg|jpg|png)(|")(.*?)>/i";

    list($width, $height, $type, $attr) = getimagesize($2$3.$4$5);
    $hwstring = image_hwstring($width, $height);

    $replacement = '<img$1data-src=$2$3.$4$5 title="'.$post->post_title.'" data-width="'.$width.'" data-height="'.$height.'" class="fs-img"$6>';
    $content = preg_replace($pattern, $replacement, $content);
    return $content;
}
add_filter('the_content', 'restructure_imgs');
?>

Unfortunately can’t get any of these examples to work. Any help or sharing your pre-written scripts/functions would be much appreciated! Thanks for helping a student learn!!

Related posts

Leave a Reply

2 comments

  1. The filters you are trying to use run on image insertion, so it is not possible to swap all the images already present in your posts using these filters. It should work, however, if you intend to change to img tags from now on.

    The filter the_content, however, is applied to the post after it is retrieved from the database and before displaying it to screen. I believe that, in order to make a change to your existing posts without reinserting the images, you could use this filter.

    You can parse the_content using the PHP DOMDocument class. When it comes to HTML parsing in PHP, do not use regex.

    I wrote a sample function for what you want to do, it’s a bit verbose in order to explain the passages. Feel free to tweak it at will.

    <?php
    function foresight_hires_img_replace($the_content) {
        // Create a new istance of DOMDocument
        $post = new DOMDocument();
        // Load $the_content as HTML
        $post->loadHTML($the_content);
        // Look up for all the <img> tags.
        $imgs = $post->getElementsByTagName('img');
    
        // Iteration time
        foreach( $imgs as $img ) {
            // Let's make sure the img has not been already manipulated by us
            // by checking if it has a data-src attribute (we could also check
            // if it has the fs-img class, or whatever check you might feel is
            // the most appropriate.
            if( $img->hasAttribute('data-src') ) continue;
    
            // Also, let's check that the <img> we found is not child of a <noscript>
            // tag, we want to leave those alone as well.
            if( $img->parentNode->tagName == 'noscript' ) continue;
    
            // Let's clone the node for later usage.
            $clone = $img->cloneNode();
    
            // Get the src attribute, remove it from the element, swap it with
            // data-src
            $src = $img->getAttribute('src');
            $img->removeAttribute('src');   
            $img->setAttribute('data-src', $src);
    
            // Same goes for width...
            $width = $img->getAttribute('width');
            $img->removeAttribute('width');
            $img->setAttribute('data-width', $width);
    
            // And height... (and whatever other attribute your js may need
            $height = $img->getAttribute('height');
            $img->removeAttribute('height');
            $img->setAttribute('data-height', $height);
    
        // Get the class and add fs-img to the existing classes
            $imgClass = $img->getAttribute('class');
            $img->setAttribute('class', $imgClass . ' fs-img');
    
            // Let's create the <noscript> element and append our original
            // tag, which we cloned earlier, as its child. Then, let's insert
            // it before our manipulated element
            $no_script = $post->createElement('noscript');
            $no_script->appendChild($clone);
            $img->parentNode->insertBefore($no_script, $img);
        };
    
         return $post->saveHTML();
     }
    
     add_filter('the_content', 'foresight_hires_img_replace');
     ?>
    

    I didn’t test it specifically with WordPress, but I tested it with a sample post output and it should work.

  2. This code works very well for me, but I had some issues during the process to get the final version.

    Warning

    The first is that the server started showing some warnings like Warning: DOMDocument::loadHTML(): Unexpected end tag. This question shows more details of the error and how to solve it, but to add, adding this libxml_use_internal_errors(true); code at main function before calling loadHTML fix the problem.

    The second (group of) problem was with the method DOMDocument::loadHTML.

    Character encoding

    The documentation has a very important comment from Shane Harte about character encoding of UTF-8 documents. So, before loadHTML I had to use [mb_convert_encoding][4] with 'HTML-ENTITIES', "UTF-8" parameters.

    HTML wrapper

    The second problem with the method was that the output always contained a doctype + HTML + body tags and in this case, this is a huge problem because you are dealing with only one clipping of the document (the_content()) and not the whole.

    The simplest way to fix this is to use libxml constants as parameters of the loadHTML method

    LIBXML_HTML_NOIMPLIED turns off the automatic adding of implied
    html/body elements
    LIBXML_HTML_NODEFDTD prevents a default doctype
    being added when one is not found.

    Something like $output->loadHTML(mb_convert_encoding($the_content, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

    Unexpected end first tag of the_content()

    Another problem I started to see was that the tag of the first element of “the_content” was not closed properly.

    For example, if the first element was a <p>, the entire content of the_content was wrapped by that first <p>. In many cases, the content started with an H2, then it had the same issue.

    After a lot of research, I found this comment of Nicholas Shanks that opened my mind:

    LibXML requires a root node and is treating the first element it finds as the root node, deleting the (incorrectly located) closing tag it finds half-way through and then outputting the closing tag of the first element is found at the end of the document.

    So, the first part of my code looks like this:

    libxml_use_internal_errors(true);
    $encode_content = mb_convert_encoding($the_content, 'HTML-ENTITIES', 'UTF-8');
    $post = new DOMDocument();
    $workarround = '<section class="sanitized-content">'. $encode_content . '</section>';
    $post->loadHTML( $workarround, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
    

    ✌️