Import WordPress XML File from Within Functions.php

I am developing a theme which has a different method of adding in content and so, the default install of WordPress won’t show any content because of this. I was wondering if it is possible to automatically import an XML file via means of an internal function and or hooks after the theme has been activated?

User installs theme > User activates theme > Code behind the scenes
loads up an XML file and performs a silent import of its contents

Read More

Currently to import an XML file you have to install the WordPress importer plugin for WordPress and then manually import the file, select a user for associating the imported content with and deciding if you want to import media attachments. I find this step for the types of clients I am targeting too confusing and would like to effectively eliminate the need for this step.

I did some digging into the WordPress importer script and there are a lot of function calls, what would I have to do to strip out the parts where user input is required and import a file using the class and its methods directory? I’m not sure where to begin really.

My clients are tradesmen, so even something as simple as importing an XML file stumps them and they don’t have the time to do it so there is room for error especially if they try and import more than once causing duplicate pages.

Thank you in advance.

Edit/Clarification

There seems to be a lot of confusion here. I am not asking how to check if a theme has been activated, I’ve got that part sorted. I am asking how I would go about parsing an XML import file and automatically importing it without user effort. I essentially want to automate the WordPress import plugin which you can already use to manually import the XML file, choose author, choose to download and import attachments within my functions.php.

Instead of needing a plugin or requiring my clients with lack of computer knowledge nor the want to learn how to do it using the plugin.

Related posts

Leave a Reply

4 comments

  1. Your question is a bit specific if you “only” want to automatically import some posts/pages. There are other ways to do this then using a XML export file.

    If you have text-only posts, then you should use LOAD DATA INFILE. At first you have to export your posts.

    global $wpdb, $wp_filesystem;
    
    $tables = array(
            'posts'    => array( 'posts', 'postmeta' ),
            'comments' => array( 'comments', 'commentmeta' ),
            'terms'    => array( 'terms', 'term_taxonomy', 'term_relationships' ),
            'users'    => array( 'user', 'usermeta' ),
            'links'    => array( 'links' ),
            'options'  => array( 'options' ),
            'other'    => array(),
            // for multiside
            'multiside' => array( 'blogs', 'signups', 'site', 'sitemeta', 'sitecategories', 'registration_log', 'blog_versions' )
    
    );
    
    $exports = array( 'posts', 'comments', 'users' );
    
    $exportdir = TEMPLATEPATH . '/export';
    
    if ( ! is_dir( $exportdir ) ) {
        $mkdir = wp_mkdir_p( $exportdir );
        if ( false == $mkdir || ! is_dir( $exportdir ) )
            throw new Exception( 'Cannot create export directory. Aborting.' );
    }
    
    // empty the export dir else MySQL throws errors
    $files = glob( $exportdir . '/*' );
    if ( ! empty( $files ) ) {
        foreach( $files as $file )
            unlink( $file );
    }
    
    foreach ( $exports as $export ) {
    
        if ( ! isset( $tables[$export] ) )
            continue;
    
        if ( ! empty( $tables[$export] ) ) {
            foreach ( $tables[$export] as $table ) {
    
                $outfile =  sprintf( '%s/%s_dump.sql', $exportdir, $table );
                $sql = "SELECT * FROM {$wpdb->$table} INTO OUTFILE '%s'";
                $res = $wpdb->query( $wpdb->prepare( $sql, $outfile ) );
    
                if ( is_wp_error( $res ) )
                    echo "<p>Cannot export {$table} into {$outfile}</p>";
            }
        }
    }
    

    This will create a directory in your theme folder (be sure it is writeable!) and export the posts and comments (with it’s meta) into dump files. Use the array export to define what you want to export. I grouped the most things more or less logical (if you want to export the posts, than you should also export postsmeta and so on).

    The benefit of this solution is, with the SELECT statement you can define particular stuff (e.g. only posts from a special category or only pages or only trashed posts).

    Now you want to import this stuff in a new blog

    global $wpdb;
    
    $exportdir = TEMPLATEPATH . '/export';
    
    $files = glob( $exportdir . '/*_dump.sql' );
    
    foreach ( $files as $file ) {
    
        preg_match( '#/([^/]+)_dump.sql$#is', $file, $match );
    
        if ( ! isset( $match[1] ) )
            continue;
    
        $sql = "LOAD DATA LOCAL INFILE '%s' INTO TABLE {$wpdb->$match[1]};";
    
        $res = $wpdb->query( $wpdb->prepare( $sql, $file ) );
    
        if ( is_wp_error( $res ) )
            echo "<p>Cannot import data from file {$file} into table {$wpdb->$match[1]}</p>";
    }
    

    This solution is good if the posts did not contain any attachments like images. Another problem is, no users and no categories will be imported. Be sure both are created befor the import starts (or include users and categories in your export). It is a very rough method to import things, it will override existing stuff!

    If you want to export the attachments also, you have to do a bit more work.

    (Sidenote: Please read the complete answer and the Very Last Words at the end! This topic is not for beginners and I do not write a warning at every risky line of code)

    The WordPress Importer Plugin seems to be a good way to import the whole stuff and automatically import/download the attachments. So let’s have a look what this plugin will do.

    At first the plugin ask for a XML file to upload. Then it parse the XML file and ask for an author mapping and if the attachments should be downloaded or not.

    For an automatically run of the plugin we need to change some things. At first we have to skip the upload process. Thats quite easy because you can bundle the XML file with the theme and you know where the XML file is. Then we have to skip the questions that appears after uploading the XML file. We can predefine our own values and pass them to the import process.

    Start with a copy of the plugin. Create a directory in your theme like autoimport and copy the files wordpress-importer.php and parsers.php to it. It is a good idea to rename the file wordpress-importer.php to something like autoimporter.php. In your theme function add a function call to trigger the automated impoprt

    /**
     * Auto import a XML file
     */
    add_action( 'after_setup_theme', 'autoimport' );
    
    function autoimport() {
        // get the file
        require_once TEMPLATEPATH . '/autoimport/autoimporter.php';
    
        if ( ! class_exists( 'Auto_Importer' ) )
            die( 'Auto_Importer not found' );
    
        // call the function
        $args = array(
            'file'        => TEMPLATEPATH . '/autoimport/import.xml',
            'map_user_id' => 1
        );
    
        auto_import( $args );
    
    }
    

    At first we setup some arguments. The first thing is the complete path to the XML file. The second one is the ID of an existing user. We need this user for author mapping, this is the user where all posts will be mapped to when no new authors should be created.

    Now we have to understand how the plugin works. Open your renamed plugin file and scroll down to the end. There is a function wordpress_importer_init() and an action call. Remove both, it’s not longer needed. Now go to the top of the file and remove the plugin header (the comment at the beginning of the file). After that, rename the class WP_Importer to something like Auto_Importer, do not forget to adjust the function_exists statement and the first method WP_Importer (this is the constructor in PHP4 style).

    Later we will pass the XML file direct to the class constructor, modify the first method to this

    var $xmlfile = '';
    var $map_user_id = 0;
    
    function Auto_Importer( $args ) {
    
        if ( file_exists( $args['file'] ) ) {
    
            // for windows systems
            $file = str_replace( '\', '/', $args['file'] );
    
            $this->xmlfile = $file;
        }
    
        if ( isset( $args['map_user_id'] ) )
            $this->map_user_id = $args['map_user_id'];
    
    }
    

    Now we have a to remove and modify some methods inside the class. The first method is the dispatch() method. This method tells you how the class works. It do three steps. At first upload the XML file, then process it and at last import the data.

    Case zero is the first step, it is the greeting. This is the part that you see if you call the import at the first time. It will ask for a file to upload. Case two handles the upload and display a form for the import options. Case three finally do the import. In other words: the first two steps only ask for data we can provide ourself. We only need step 3 (case 2) and have to provide the data asked in step one and two.

    In step two you see a function call to wp_import_handle_upload(). This function setup some informations about the xml file. We cannot use this function anymore because we haven’t uploaded a file. So we have to copy and modify the function. Create a new method within the class

    function import_handle_upload() {
    
        $url = get_template_directory_uri() . str_replace( TEMPLATEPATH, '', $this->xmlfile );
        $type = 'application/xml'; // we know the mime type of our file
        $file = $this->xmlfile;
        $filename = basename( $this->xmlfile );
    
        // Construct the object array
        $object = array( 'post_title' => $filename,
                'post_content' => $url,
                'post_mime_type' => $type,
                'guid' => $url,
                'context' => 'import',
                'post_status' => 'private'
        );
    
        // Save the data
        $id = wp_insert_attachment( $object, $file );
    
        // schedule a cleanup for one day from now in case of failed import or missing wp_import_cleanup() call
        wp_schedule_single_event( time() + DAY_IN_SECONDS, 'importer_scheduled_cleanup', array( $id ) );
    
        return array( 'file' => $file, 'id' => $id );
    }
    

    And replace the function call $file = wp_import_handle_upload(); in the method handle_upload() with our new method $file = $this->import_handle_upload();

    We replaced now the upload process with our own file (that should already exists). Go on and remove more unneeded methods. The methods gereet(), header() and footer() are not longer needed (header and footer only print some text) and can be removed from the class. In the dispatch() method remove the calls to this methods ($this->header() and $this->footer()).

    The first step is done, now we have to care about the second step, the import options. The import options ask if it should be allowed to download the attachments and mapping the authors.

    The first part is easy. Set to true if the attachments should be downloaded or false if not. The author mapping is a bit more complicated. If it is allowed to create new users (the authors from the import file), create them. If not, assign the postss to an existing user. This is been done in the method get_author_mapping(). We have to replace the $_POST data with existing data. Here we need a simple solution, so we simply map all new authors to an existing one if it is not allowed to create new users. Or simply create all new users. In the second case, be sure all new users are dummy users. If not, everytime you import them, they get an email with login and password to the new blog!! I do not explain every line of code, here is the complete rewritten method

    function get_author_mapping( $map_users_id ) {
        if ( empty( $this->authors ) )
            return;
    
        $create_users = $this->allow_create_users();
    
        foreach ( (array) $this->authors as $i => $data ) {
    
            $old_login = $data['author_login'];
    
            // Multisite adds strtolower to sanitize_user. Need to sanitize here to stop breakage in process_posts.
            $santized_old_login = sanitize_user( $old_login, true );
            $old_id = isset( $this->authors[$old_login]['author_id'] ) ? intval($this->authors[$old_login]['author_id']) : false;
    
            if ( ! $create_users ) {
                $user = get_userdata( intval($map_users_id) );
                if ( isset( $user->ID ) ) {
                    if ( $old_id )
                        $this->processed_authors[$old_id] = $user->ID;
                    $this->author_mapping[$santized_old_login] = $user->ID;
                }
            } else if ( $create_users ) {
                if ( ! empty($this->authors[$i]) ) {
                    $user_id = wp_create_user( $this->authors[$i]['author_login'], wp_generate_password() );
                } else if ( $this->version != '1.0' ) {
                    $user_data = array(
                        'user_login' => $old_login,
                        'user_pass' => wp_generate_password(),
                        'user_email' => isset( $this->authors[$old_login]['author_email'] ) ? $this->authors[$old_login]['author_email'] : '',
                        'display_name' => $this->authors[$old_login]['author_display_name'],
                        'first_name' => isset( $this->authors[$old_login]['author_first_name'] ) ? $this->authors[$old_login]['author_first_name'] : '',
                        'last_name' => isset( $this->authors[$old_login]['author_last_name'] ) ? $this->authors[$old_login]['author_last_name'] : '',
                    );
                    $user_id = wp_insert_user( $user_data );
                }
    
                if ( ! is_wp_error( $user_id ) ) {
                    if ( $old_id )
                        $this->processed_authors[$old_id] = $user_id;
                    $this->author_mapping[$santized_old_login] = $user_id;
                } else {
                    printf( __( 'Failed to create new user for %s. Their posts will be attributed to the current user.', 'wordpress-importer' ), esc_html($this->authors[$old_login]['author_display_name']) );
                    if ( defined('IMPORT_DEBUG') && IMPORT_DEBUG )
                        echo ' ' . $user_id->get_error_message();
                    echo '<br />';
                }
            }
    
            // failsafe: if the user_id was invalid, default to the current user
            if ( ! isset( $this->author_mapping[$santized_old_login] ) ) {
                if ( $old_id )
                    $this->processed_authors[$old_id] = (int) get_current_user_id();
                $this->author_mapping[$santized_old_login] = (int) get_current_user_id();
            }
        }
    }
    

    There is some work left to do. Adding a function auto_import() first

    function auto_import( $args ) {
    
        $defaults = array( 'file' => '', 'map_user_id' => 0);
        $args = wp_parse_args( $args, $defaults );
    
        $autoimport = new Auto_Importer( $args );
        $autoimport->do_import();
    
    }
    

    Place this function after the class. This function miss some error handling and checking (e.g. for an empty file argument).

    If you now run the class, you got a lot of error messages. The first one is, that the class is missing. This is because there is a if statement at the beginning.

    if ( ! defined( 'WP_LOAD_IMPORTERS' ) )
        return;
    

    We have to remove it, otherwise the file would not be parsed completely. Than there are some functions that are not loaded at this point. We have to include some files.

    $required = array(
        'post_exists'                     => ABSPATH . 'wp-admin/includes/post.php',
        'wp_generate_attachment_metadata' => ABSPATH . 'wp-admin/includes/image.php',
        'comment_exists'                  => ABSPATH . 'wp-admin/includes/comment.php'
    );
    
    foreach ( $required as $func => $req_file ) {
        if ( ! function_exists( $func ) )
            require_once $req_file;
    }
    

    Basically thats all. I test this on a local installation with the test data XML from WordPress. It work for me but it is not a perfect solution for production!

    And some last words on setting up some options. There are two options that can be modified by a filter:

    add_filter( 'import_allow_create_users', function() { return false; } );
    add_filter( 'import_allow_fetch_attachments', '__return_false' );
    

    I think I do not have to explain it. Put this filters in your functions.php and setup true or false (first one is PHP5.3 style, second is WP style).

    Very Last Words

    I put alltogether in this gist. Use it at your own risk! I’m not responsible for anything!. Please have a look at the files in the gist, I did not explain every little step here.

    Thinks I haven’t done: Set a value e.g. in the (theme) options after importing. Else the import starts every time the theme will be activated.

    Maybe I will work on it in the future, clean up some things and run more tests on it.

  2. Allow me to re-introduce 2 things here:

    (a) “I am not asking how to… I’ve got that part sorted…”

    »» I’ve learnt over time to be OK with the fact that the approach to issues/fixes doesn’t necessarily require some ‘visible association’ with the issue at hand.

    (b) “…would I have to do to strip out the parts…”
    “…clients are tradesmen, so even something as simple as…”

    »» Why make it easier for the client at the cost of making it difficult for yourself? I certainly could offer ‘services’ after the deliverables and establish a remote connection to do it for them [chargeable], instead of “…hacking the import plugin…”. I mean, ask yourself if its really worth it in your current scheme of things.
    However IF you’re willing to put in the effort then give a shot at the code below.
    If you can, then:

    I concur with both chrisguitarguy and amolv above.

    As chris pointed out the number of ways to achieve an output is many. This is just one. Though it has the potential to get laboriously lengthy, do refer to the last couple of lines before anything else.

    <?php 
    /* I usually dump ONE line in functions.php  */
    require_once (TEMPLATEPATH . '/includes/whatever.php');
    
    /* and then in that loc CHECK FIRST*/
    if ((is_admin() && isset($_GET['activated']) && $pagenow == 'themes.php')||(is_admin() && isset($_GET['upgrade']) && $pagenow == 'admin.php' && $_GET['page'] == 'admin-options.php')) 
    {
    
    global $wpdb, $wp_rewrite, $hey;
    
    // create tables
    your_tables();
    
    // insert value defaults
    your_values();
    
    // insert link defaults
    your_links();
    
    // pages and tpl
    your_pages();
    
    // create category or categories
    // wp_create_categories     $categories, $post_id = ''
    // wp_create_category   $cat_name, $parent
    
    //flush rewrite
    $wp_rewrite->flush_rules();
    
    }
    
    // create them db tables
    function your_tables() {
    global $wpdb, $hey;
    
    $collate = '';
    if($wpdb->supports_collation()) {
    if(!empty($wpdb->charset)) $collate = "DEFAULT CHARACTER SET $wpdb->charset";
    if(!empty($wpdb->collate)) $collate .= " COLLATE $wpdb->collate";
    }
    
    $sql = "CREATE TABLE IF NOT EXISTS ". $wpdb->prefix . "table1_name" ." (
    `id` INT(10) NOT NULL auto_increment,
    `some_name1` VARCHAR(255) NOT NULL,
    `some_name2` VARCHAR(255) NOT NULL,
    `some_name3` LONGTEXT,
    `some_name4` LONGTEXT NOT NULL,
    `some_name5` VARCHAR(255) DEFAULT NULL,
    `some_name6` VARCHAR(255) DEFAULT NULL,
    `some_name7` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
    `some_name8` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
    PRIMARY KEY id  (`id`)) $collate;";
    
    $wpdb->query($sql);
    
    
    $sql = "CREATE TABLE IF NOT EXISTS ". $wpdb->prefix . "table2_name" ." (
    `meta_id` INT(10) NOT NULL AUTO_INCREMENT,
    `some_name1` INT(10) NOT NULL,
    `some_name2` INT(10) NOT NULL,
    `some_name3` VARCHAR(255) NOT NULL,
    `some_name4` INT(10) NOT NULL,
    PRIMARY KEY id  (`meta_id`)) $collate;";
    
    $wpdb->query($sql);
    
    // and so on and so forth
    
    /* Insert default/ALL data into tables */
    // BUT CHECK FIRST IF DATA EXISTS. IF = YES DONT PUSH IN ANYTHING
    
    $sql = "SELECT field_id " . "FROM " . $wpdb->prefix . "table1_name LIMIT 1";
    
    $wpdb->get_results($sql);
    
    if($wpdb->num_rows == 0) {
    
    // more code will follow
    // i have to get going now
    
    } 
    
    ?>
    

    NOTE

    • If you’ve been with WP for a while its needless to mention BACK UP YOUR DB FIRST.

    • phpMyAdmin has raw power and makes it quite easy to carefully screw things up.

    • Though the effort required could seem daunting initially, if done right you could make it function like clockwork ф …

    Finally

    How to push 2000 lines of data in 20 sec into those last 2 lines within those 2 braces?

    phpMyAdmin » Select DB on left »» Select All TABLES on right »» Export ▼

    ➝ Custom: display all options
    ➝ View output as text = ON
    ➝ Save output to a file = OFF
    ➝ Compression = NONE
    ➝ Format = SQL
    ➝ Dump Table = STRUCTURE & DATA
    ➝ Add DROP TABLE... = OFF (Important!)
    ➝ Syntax to use = "both of the above"
    
    »» GO!
    
    • From the next screen I could copy the ‘STRUCTURE’ part into the $sql = “….” for your_tables() and the ‘DATA’ portion into $sql for your_data()

    • For the rest of WP defaults I use update_option(...) & update_post_meta(...)

  3. There is no theme equivalent of register_activation_hook for plugins — there are a few hacks. Why? Because a theme is a skin. Only functionality specifically related to the display of content should go in a theme, not content itself.

    As far as how: use the example above to run a callback function one time. WordPress importer works on XML files there are many different ways to parse XML in PHP. Take your pick, parse the file, do what you want with it.

  4. in functions.php condition can be checked

    if( isset($_GET['activated']) && 'themes.php' == $GLOBALS['pagenow']) )
    { 
      // check duplicates 
       // call import class 
       //xml import code 
       // do whatever you want to 
    }
    

    As soon as your theme activated this will automatically import data.