Get RSS via cURL, fine in browser but 404 error in terminal

A client want that we deliver content via RSS feed, they use cURL to get the feed contents, but they say that they get an 404 error instead. I have tried this command in the terminal: $ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml and as the client says I get the 404 page instead of the feed. When I type the URI in the browser it shows the feed without problem.

I cannot change anything in the client app, so, how can I ensure that they get the feed instead of the 404 error?

Read More

Thanks!

Related posts

Leave a Reply

2 comments

  1. Indeed, the curl returns a 404 status page:

    $ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D-
    HTTP/1.1 **404 Not Found**
    Date: Tue, 04 Mar 2014 08:12:27 GMT
    Server: Apache
    X-Pingback: http://mediosymedia.com/xmlrpc.php
    Expires: Wed, 11 Jan 1984 05:00:00 GMT
    Cache-Control: no-cache, must-revalidate, max-age=0
    Pragma: no-cache
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=UTF-8 
    

    Many webservers will be suspicious of requests without a browser User-Agent because they expect curl to be used for scraping. This is probably not the smartest technique because a simple UserAgent spoofing will fix that problem:

    $ curl -g --compressed http://mediosymedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php -s -o /dev/null -D- -H'User-Agent:  Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:27.0) Gecko/20100101 Firefox/27.0'
    HTTP/1.1 **200 OK**
    Date: Tue, 04 Mar 2014 08:13:46 GMT
    Server: Apache
    Expires: Wed, 11 Jan 1984 05:00:00 GMT
    Cache-Control: no-cache, must-revalidate, max-age=0
    Pragma: no-cache
    Transfer-Encoding: chunked
    Content-Type: text/xml;charset=utf-8
    

    So, in practice, make sure you set up a User-Agent for your requests that is not Curl’s.

  2. My initial though was that this may be related to cookies (see this question), but this may be a localized issue. This is working fine from my machine:

    [root@devtest tmp]# curl -g --compressed http://mediosymedia.com/wp-content/plug
    ins/nextgen-gallery/xml/media-rss.php > temp.xml
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 27926    0 27926    0     0  54564      0 --:--:-- --:--:-- --:--:-- 69815
    

    CORRECTION:

    Thanks to Julien for pointing out that the contents of the downloaded file was the custom 404 page contents. As he mentions, you need to add a useragent flag (-A) to your curl requests:

    # curl -A "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1
    ; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"-g --compressed http://medio
    symedia.com/wp-content/plugins/nextgen-gallery/xml/media-rss.php > temp.xml
    

    I would just delete my answer, but it’s worth leaving up as a warning to others who might be experiencing this issue – make sure you validate the response!