MySQL Order before Group by

I need to find the latest post for each author and then group the results so I only a single latest post for each author.

SELECT wp_posts.* FROM wp_posts
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        GROUP BY wp_posts.post_author           
        ORDER BY wp_posts.post_date DESC

This is correctly grouping the output so I only get one post per author, but it is ordering the results after they have been grouped and not before they have been selected.

Related posts

Leave a Reply

10 comments

  1. select wp_posts.* from wp_posts

    where wp_posts.post_status='publish'and wp_posts.post_type='post'

    group by wp_posts.post_author

    having wp_posts.post_date = MAX(wp_posts.post_date) /* ONLY THE LAST POST FOR EACH AUTHOR */

    order by wp_posts.post_date desc


    EDIT:

    After some comments I have decided to add some additional informations.

    The company I am working at also uses Postgres and especially SQL Server. This databases don’t allow such queries. So I know that there is a other way to do this (I write a solution below). You shoud also have to know what you do if you don’t group by all columns treated in the projection or use aggregate functions. Otherwise let it be!

    I chose the solution above, because it’s a specific question. Tom want to get the recent post for each author in a wordpress site. In my mind it is negligible for the analysis if a author do more than one post per second. WordPress should even forbid it by its spam-double-post detection. I know from personal experience that there is a really significant benefit in performance doing a such dirty group by with MySQL. But if you know what you do, then you can do it! I have such dirty groups in apps where I’m professionally accountable for. Here I have tables with some mio rows which need 5-15s instead of 100++ seconds.

    May be useful about some pros and cons: http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/debunking-group-by-myths.html


    SELECT
        wp_posts.*
    FROM 
        wp_posts
        JOIN 
        (
            SELECT
                g.post_author
                MAX(g.post_date) AS post_date
            FROM wp_posts as g
            WHERE
                g.post_status='publish'
                AND g.post_type='post'
            GROUP BY g.post_author
        ) as t 
        ON wp_posts.post_author = t.post_author AND wp_posts.post_date = t.post_date
    
    ORDER BY wp_posts.post_date
    

    But if here is more then one post per second for a author you will get more then one row and not the only last one.

    Now you can spin the wheel again and get the post with the highest Id. Even here it is at least not guaranteed that you really get the last one.

  2. Not sure if I understand your requirement correct but following inner statement gets the list of the latest post_date for each author and joins these back with the wp_posts table to get a complete record.

    SELECT  *
    FROM    wp_posts wp
            INNER JOIN (
              SELECT  post_author
                      , MAX(post_date) AS post_date
              FROM    wp_posts
              WHERE   post_status = 'publish'
                      AND post_type = 'post'
              GROUP BY
                      post.author
            ) wpmax ON wpmax.post_author = wp.post_author
                       AND wpmax.post_date = wp.post_date
    ORDER BY
            wp.post_date DESC
    
  3. I think that @edze response is wrong.

    In the MySQL manual you can read:

    MySQL extends the use of GROUP BY so that the select list can refer to
    nonaggregated columns not named in the GROUP BY clause. You can use
    this feature to get better performance by avoiding unnecessary column
    sorting and grouping. However, this is useful primarily when all
    values in each nonaggregated column not named in the GROUP BY are the
    same for each group. The server is free to choose any value from each
    group, so unless they are the same, the values chosen are
    indeterminate. Furthermore, the selection of values from each group
    cannot be influenced by adding an ORDER BY clause. Sorting of the
    result set occurs after values have been chosen, and ORDER BY does
    not affect which values the server chooses.

    Two great references:

    Sorry, but I can not comment the @edze response because of my reputation, so I have written a new answer.

  4. Do a GROUP BY after the ORDER BY by wrapping your query with the GROUP BY like this:

    SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t GROUP BY t.author
    
  5. it doesn’t matter if you order before or after the group-statement, because order means only that 213 goes to 123 or 321 and not more. group by takes only SOME entry per column, not only the latest. I consider you working with subselects here like

    SELECT wp_posts.* FROM wp_posts
            WHERE wp_posts.post_status='publish'
            AND wp_posts.post_type='post'
            AND wp_posts.post_date = (Select max(post_date) from wp_posts where author = ... )
    
  6. What do you think about this?? Seems to work for me

    SELECT wp_posts.post_author, MAX(wp_posts.post_date), wp_posts.status, wp_posts.post_type
    FROM wp_posts
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        GROUP BY wp_posts.post_author
    

    It brings me all the Authors with the most updated post_date … Do you identify a problem there?? I don’t

  7.     SELECT wp_posts.*,max(wp_posts.post_date) FROM wp_posts
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        GROUP BY wp_posts.post_author 
    
  8. When our table became large, performance need to checked also.
    I checked all the options in the questions here, with a PM system with a 136K messages and link table with 83K rows.

    When you need only count, or only IDs – Alex’s solution is the best.

    SELECT wp_posts.post_author, MAX(wp_posts.post_date), wp_posts.status, wp_posts.post_type
    FROM wp_posts
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        GROUP BY wp_posts.post_author
    

    When you need other fields, I need to modify Husky110 solution (to my table design – here it is only example – not checked), that in my tables 10x faster than the subquery option:

    SELECT wp_posts.* FROM wp_posts,
        (Select post_id as pid,  max(post_date) maxdate from wp_posts where author = ... group by author order by maxdate  desc limit 4) t
        WHERE wp_posts.post_status='publish'
        AND wp_posts.post_type='post'
        AND wp_posts.post_id = pid
    

    This change can select more than one post (one for user, for example), and can be modified to other solutions.

    Moshe.

  9. Use the below code…

    <?php
    //get all users, iterate through users, query for one post for the user,
    //if there is a post then display the post title, author, content info
    $blogusers = get_users_of_blog();
    if ($blogusers) {
      foreach ($blogusers as $bloguser) {
        $args = array(
        'author' => $bloguser->user_id,
          'showposts' => 1,
          'caller_get_posts' => 1
        );
        $my_query = new WP_Query($args);
        if( $my_query->have_posts() ) {
          // $user = get_userdata($bloguser->user_id);
          // echo 'This is one post for author with User ID: ' . $user->ID . ' ' . $user-    >user_firstname . ' ' . $user->user_lastname;
          while ($my_query->have_posts()) : $my_query->the_post(); ?>
            <a href="<?php the_permalink() ?>" rel="bookmark" title="Permanent Link to <?    php the_title_attribute(); ?>"><?php the_title(); ?></a>
    
            <small><?php the_time('F jS, Y') ?> by <?php the_author_posts_link() ?>     </small><?php
            the_content();
          endwhile;
        }
      }
    }
    ?>