XSLT to format WordPress WXR XML for importing in to Drupal via Feeds

I’m trying to format a WordPress WXR file using XSLT so I can import it into Drupal.

I’m aware of modules for Drupal that will import WXR files but I need the flexibility that the Feeds module can give as the imported data will be imported against different content types and I’ll be pulling images and other attachments into the newly created Drupal pages. With this in mind the standard WordPress Migrate just won’t cut it.

Read More

So, the WXR format has WordPress posts and attachments as separate items within the feed and links the posts an attachments using an id. Attachments can be images, files (pdf,doc etc) and are found at the xpath wp:postmeta/wp:meta_key and have values of _thumbnail_id, _wp_attached_file

What I’d like to do is take various nodes from items of type attachment and put them within the cooresponding post item, where the id links them together

A fragment of the xml to be transformed… First item is post second is attachment. The

<item>
    <title>Some groovy title</title>
    <link>http://example.com/groovy-example</link>
    <wp:post_id>2050</wp:post_id>
    <wp:post_type>page</wp:post_type>
    ...
    ...
    ...
    <wp:postmeta>
        <wp:meta_key>_thumbnail_id</wp:meta_key>
        <wp:meta_value>566</wp:meta_value>
    </wp:postmeta>
</item>
...
...
...
<item>
    <title>My fantastic attachment</title>
    <link>http://www.example.com/fantastic-attachment</link>
    <wp:post_id>566</wp:post_id>
    <wp:post_type>attachment</wp:post_type>
    ...
    ...
    ...
    <wp:attachment_url>http://www.example.com/wp-content/uploads/2012/12/fantastic.jpg</wp:attachment_url>
    <wp:postmeta>
        <wp:meta_key>_wp_attached_file</wp:meta_key>
        <wp:meta_value>2012/12/fantastic.jpg</wp:meta_value>
    </wp:postmeta>
</item>

After the transform I would like

<item>
    <title>Some groovy title</title>
    <link>http://example.com/groovy-example</link>
    <wp:post_id>2050</wp:post_id>
    <wp:post_type>page</wp:post_type>
    ...
    ...
    ...
    <wp:postmeta>
        <wp:meta_key>_thumbnail_id</wp:meta_key>
        <wp:meta_value>566</wp:meta_value>
        <wp:meta_url>http://www.example.com/wp-content/uploads/2012/12/fantastic.jpg</wp:attachment_url>
    </wp:postmeta>


</item>

Maybe, there is a better approach? Maybe merge post and attachment where the id create a link between the nodes?

I’m new to XSLT and have read a few posts on identity transforms and I think thats the correct direction but I just don’t have the experience to pull of what i need, assistance would be appreciated.

Related posts

Leave a Reply

1 comment

  1. It looks like I’ve managed to sort out a solution.

    I used a number of indexes to organise the attachments. My requirements changed a little on further inspection of the XML, as there was

    I changed my resulting output to be in the format of…

    <item>
        <title>Some groovy title</title>
        <link>http://example.com/groovy-example</link>
        <wp:post_id>2050</wp:post_id>
        <wp:post_type>page</wp:post_type>
        ...
        ...
        ...
        <thumbnail>
            <title>Spaner</title>
            <url>http://www.example.com/wp-content/uploads/2012/03/spanner.jpg</url>
        </thumbnail>
        <attachments>
            <attachment>
                <title>Fixing your widgets: An idiots guide</title>
                <url>http://www.example.com/wp-content/uploads/2012/12/fixiing-widgets.pdf</url>
            </attachment>
            <attachment>
                <title>Do It Yourself Trepanning</title>
                <url>http://www.example.com/wp-content/uploads/2013/04/trepanning.pdf</url>
            </attachment>
        </attachments>
    </item>
    

    So using the following xsl gave me the desired result. The conditions on the indexes ensured I was selecting the correct files.

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:wp="http://wordpress.org/export/1.2/">
    
        <xsl:output indent="yes" cdata-section-elements="content"/>
    
        <!-- Setup indexes -->
    
        <!-- Index all main posts -->
        <xsl:key 
            name="mainposts" 
            match="*/item[wp:post_type[text()='post']]" 
            use="wp:post_id" />
    
        <!-- Index all sub posts (posts within posts)-->
        <xsl:key 
            name="subposts" 
            match="*/item[wp:post_type[text()='post'] and category[@nicename = 'documents']]" 
            use="category[@domain = 'post_tag']" />
    
        <!-- Index all image thumbs -->
        <xsl:key 
            name="images" 
            match="*/item[wp:post_type[text()='attachment'] and wp:postmeta/wp:meta_key[text()='_wp_attachment_metadata']]" 
            use="wp:post_parent" />
    
        <!-- Index all files (unable to sort members file at the moment)-->
        <xsl:key 
            name="attachments" 
            match="*/item[wp:post_type[text()='attachment'] and not(wp:postmeta/wp:meta_key = '_wp_attachment_metadata')]"
            use="wp:post_parent" />
    
        <xsl:key 
            name="thumbnails" 
            match="*/item[wp:post_type[text()='attachment']]" 
            use="wp:post_id" />
    
        <xsl:template match="node()|@*">
            <xsl:copy>
                <xsl:apply-templates select="node()|@*"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="*/item/wp:post_parent[text()= 0]">
            <wp:post_parent>
                <xsl:value-of select="." />
            </wp:post_parent>
    
            <xsl:for-each select="key('thumbnails', ../wp:postmeta[wp:meta_key[text()='_thumbnail_id']]/wp:meta_value)">
                <thumbnail>
                    <title><xsl:value-of select="title" /></title>
                    <url><xsl:value-of select="wp:attachment_url" /></url>
                </thumbnail>
            </xsl:for-each>
    
            <xsl:for-each select="key('subposts', ../category[@domain = 'post_tag'])">
                <attachments>
    
                    <xsl:for-each select="key('images', wp:post_id)">
                        <file>
                            <title><xsl:value-of select="title" /></title>
                            <url><xsl:value-of select="wp:attachment_url" /></url>
                        </file>
                    </xsl:for-each>
    
                    <xsl:for-each select="key('attachments', wp:post_id)">
                        <file>
                            <title><xsl:value-of select="title" /></title>
                            <url><xsl:value-of select="wp:attachment_url" /></url>
                        </file>
                    </xsl:for-each>
    
                </attachments>
            </xsl:for-each>
    
        </xsl:template>