Clone entire WordPress site without server access using Python

I’m trying to save the contents of an entire WordPress site using python and without ftp / server access. In other words, I want to save a “complete copy, or closest possible” of the WordPress site to disk and I can’t download everything from ftp / server.

I’ve found some options to iterate through the various pages that make up the site, but nothing that will “save the site as a whole.”

Related posts

Leave a Reply

3 comments

  1. If you really want to use Python and nothing else, you could use wpull, which is a wget clone written in Python. They have an example for archiving/downloading an entire website in their docs.

    wpull billy.blogsite.example --warc-file blogsite-billy 
    --no-check-certificate 
    --no-robots --user-agent "InconspiuousWebBrowser/1.0" 
    --wait 0.5 --random-wait --waitretry 600 
    --page-requisites --recursive --level inf 
    --span-hosts --domains blogsitecdn.example,cloudspeeder.example 
    --hostnames billy.blogsite.example 
    --reject-regex "/login.php"  
    --tries inf --retry-connrefused --retry-dns-error 
    --delete-after --database blogsite-billy.db 
    --quiet --output-file blogsite-billy.log
    
  2. Not using python (although I’m sure you could hack up something — or possibly find something on pypi), but why not just use wget. Something like:

    wget -rkp -l3 -np -nH --cut-dirs=1 http://example.com
    

    Of course if you REALLY want to do it in python, you could:

    import subprocess
    subprocess.call(['wget', '-rkp', '-l3', '-np', '-nH', '--cut-dirs=1', 'http://example.com'])
    
  3. If you can, and you should, mantain all WordPress modeling tables, you might want to use a WordPress feature (also plugin) called Migrate…You might have it, so, if you can go to your Administration Panel (aka /wp-admin) you can loggin and use
    http://yourdomain.com/wp-admin/export.php
    This way, you will get a XML that you can use to import to your python project. There are also some plugins that export a full .sql file

    Remember, all content is basically into the MySQL tables, so that all you need there