Security of unzipping user submitted files

Not so much of a coding problem here, but a general question relating to security.
I’m currently working on a project that allows user submitted content.
A key part of this content is the user uploads a Zip file.
The zip file should contain only mp3 files.

I then unzip those files to a directory on the server, so that we can stream the audio on the website for users to listen to.

Read More

My concern is that this opens us up for some potentially damaging zip files.
I’ve read about ‘zipbombs’ in the past, and obviously don’t want a malicious zip file causing damage.

So, is there a safe way of doing this?
Can i scan the zip file without unzipping it first, and if it contains anything other than MP3’s delete it or flag a warning to the admin?

If it makes a difference i’m developing the site on WordPress.
I currently use the built in upload features of wordpress to let the user upload the zip file to our server (i’m not sure if there’s any form of security within wordpress already to scan the zip file?)

Related posts

Leave a Reply

3 comments

  1. Code, only extract MP3s from zip, ignore everthing else

    $zip = new ZipArchive();
    $filename = 'newzip.zip';
    
    if ($zip->open($filename)!==TRUE) {
       exit("cannot open <$filename>n");
    }
    
    for ($i=0; $i<$zip->numFiles;$i++) {
       $info = $zip->statIndex($i);
       $file = pathinfo($info['name']);
       if(strtolower($file['extension']) == "mp3") {
            file_put_contents(basename($info['name']), $zip->getFromIndex($i));
       }
    
    }
    $zip->close();
    

    I would use use something like id3_get_version (http://www.php.net/manual/en/function.id3-get-version.php) to ensure the contents of the file is mp3 too

  2. Is there a reason they need to ZIP the MP3s? Unless there’s a lot of text frames in the ID3v2 info in the MP3s, the file size will actually increase with the ZIP due to storage of the dictionary.

    As far as I know, there isn’t any way to scan a ZIP without actually parsing it. The data are opaque until you run each bit through the Huffman dictionary. And how would you determine what file is an MP3? By file extension? By frames? MP3 encoders have a loose standard (decoders have a more stringent spec) which makes it difficult to scan the file structure without false negatives.

    Here are some ZIP security risks:

    1. Comment data that causes buffer overflows. Solution: remove comment data.
    2. ZIPs that are small in compressed size but inflate to fill the filesystem (classic ZIP bomb). Solution: check inflated size before inflating; check dictionary to ensure it has many entries, and that the compressed data isn’t all 1’s.
    3. Nested ZIPs (related to #2). Solution: stop when an entry in the ZIP archive is itself ZIP data. You can determine this by checking for the central directory’s marker, the number 0x02014b50 (hex, always little-endian in ZIP – http://en.wikipedia.org/wiki/Zip_%28file_format%29#Structure).
    4. Nested directory structures, intended to exceed the filesystem’s limit and hang the deflating process. Solution: don’t unzip directories.

    So, either do a lot of scrubbing and integrity checks, or at the very least use PHP to scan the archive; check each file for its MP3-ness (however you do that – extension and the presence of MP3 headers? You can’t rely on them being at byte 0, though. http://en.wikipedia.org/wiki/MP3#File_structure) and deflated file size (http://www.php.net/manual/en/function.zip-entry-filesize.php). Bail out if an inflated file is too big, or if there are any non-MP3s present.

  3. Use the following code the file names inside a .zip archive:

    $zip = zip_open('test.zip');
    
    while($entry = zip_read($zip)) {
        $file_name = zip_entry_name($entry);
        $ext = pathinfo($file_name, PATHINFO_EXTENSION);
        if(strtoupper($ext) !== 'MP3') {
            notify_admin($file_name);
        }
    }
    

    Note that following code will only have look at the extension. Meaning that user can upload anything what has a MP3 extension. To really check if the file is an mp3 you’ll have to unpack it. I would advice you to do that in a temporary directory.

    After the file is unpacked you may analyze it using, for example ffmpeg or whatever. Having detailed data about bitrate, track lenght, etc will be interesting in any case.

    If the analysis fails you can flag the file.