A LEFT JOIN’s WHERE conditions aren’t turning up proper results

I’m writing a special plug-in for WordPress for a client and have a working keyword / custom-field search feature that works. You enter your keyword or phrase and it searches multiple fields for the keywords, returning only distinct results.

If I search for a newspaper titled “The Herald of Freedom & Light” I get 5 article results.

Read More
SELECT
  SQL_CALC_FOUND_ROWS
  DISTINCT
  wp_posts.* FROM wp_posts
LEFT JOIN `wp_postmeta` ON `wp_posts`.`ID` = `wp_postmeta`.`post_id`
WHERE
  1=1
  AND `post_type` = 'post'
  AND `post_status` = 'publish'
  AND (`wp_postmeta`.`meta_key` = 'newspaper_title' AND `wp_postmeta`.`meta_value` = 'The Herald of Freedom & Torch Light')
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC
LIMIT 0, 10

If I attempt to search for the same newspaper with a subject of “Politics” I get 0 results (when it should be no less than 3).

SELECT
  SQL_CALC_FOUND_ROWS
  DISTINCT
  wp_posts.* FROM wp_posts
LEFT JOIN `wp_postmeta` ON `wp_posts`.`ID` = `wp_postmeta`.`post_id`
WHERE
  1=1
  AND `post_type` = 'post'
  AND `post_status` = 'publish'
  AND (`wp_postmeta`.`meta_key` = 'newspaper_title' AND `wp_postmeta`.`meta_value` = 'The Herald of Freedom & Torch Light')
  AND (`wp_postmeta`.`meta_key` = 'article_subject' AND `wp_postmeta`.`meta_value` = 'Politics')
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC
LIMIT 0, 10

I’ve searched around, and most answers to questions about JOINS with multiple conditions say “move the conditions into the join”. Well, I’ve done that with the following query and it turns up 16 results (including post revisions I’ve filtered out!) without properly searching through the keywords I gave. Not only that, but by moving the conditions a 1-condition search like just for the newspaper_title will turn up the same 16 results!

SELECT
  SQL_CALC_FOUND_ROWS
  DISTINCT
  wp_posts.* FROM wp_posts
LEFT JOIN `wp_postmeta` ON
  `wp_posts`.`ID` = `wp_postmeta`.`post_id`
  AND (`wp_postmeta`.`meta_key` = 'newspaper_title' AND `wp_postmeta`.`meta_value` = 'The Herald of Freedom & Torch Light')
  AND (`wp_postmeta`.`meta_key` = 'article_subject' AND `wp_postmeta`.`meta_value` = 'Politics')
WHERE
  1=1
  AND `post_type` = 'post'
  AND `post_status` = 'publish'
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC
LIMIT 0, 10

How must I rewrite my SQL for these multiple conditions to play nicely together? I’ve got 7 other fields to work into this search functionality with similar conditions.

Related posts

Leave a Reply

4 comments

  1. Your problem is that you’re trying to logically compare values that are contained in different rows. A single row can’t have a meta_key that is both “newspaper_title” and “article_subject”. If you change the AND to an OR, then you’ll receive records that are either or and not both.

    I think the solution here is to use a pivot table for the meta values. The idea here is to aggregate the information contained in multiple rows into a single row per post_id then in the where clause target where all the columns have a value of 1. I’ve put together a script as an example based on what information you’ve provided:

    Please ensure this script is run on a test environment and does not conflict with your existing data

    create table wp.posts (post_id int, description varchar(25), post_date date);
    create table wp.meta (post_id int, meta_key varchar(15), meta_value varchar(25));
    
    -- Setup post records
    insert into wp.posts values
        (1, 'Post #1', MAKEDATE(2011, 5)),  (2, 'Post #2', MAKEDATE(2011, 8)),
        (3, 'Post #3', MAKEDATE(2011, 30)), (4, 'Post #4', MAKEDATE(2011, 5)),
        (5, 'Post #5', MAKEDATE(2011, 7)),  (6, 'Post #6', MAKEDATE(2011, 2));
    
    -- Setup meta data for post records                           
    insert into wp.meta values
    (1, 'newspaper_title', 'NY Post'),    (2, 'newspaper_title', 'NY Post'),          
    (1, 'day', 'Monday'),                 (2, 'day', 'Wednesday'),
    (1, 'article_subject', 'Local'),      (2, 'article_subject', 'Politics'),
    
    (3, 'newspaper_title', 'The Times'),  (4, 'newspaper_title', 'The Times'),     
    (3, 'day', 'Friday'),                 (4, 'day', 'Tuesday'),   
    (3, 'article_subject', 'Politics'),   (4, 'article_subject', 'Politics'),
    
    (5, 'newspaper_title', 'The Herald'), (6, 'newspaper_title', 'Daily Tribune'),    
    (5, 'day', 'Sunday'),                 (6, 'day', 'Wednesday'), 
    (5, 'article_subject', 'Arts'),       (6, 'article_subject', 'Local');
    
    -- Show all the data
    SELECT p.description, p.post_date, meta_key, meta_value
    FROM wp.posts p JOIN wp.meta m ON (p.post_id = m.post_id)
    ORDER BY p.post_id;
    
    -- Search based on newspaper_title = 'The Times' AND article_subject = 'Politics'    
    SELECT p.*
    FROM wp.posts p
    JOIN
      (
        SELECT post_id,
               max(CASE WHEN (meta_key = 'newspaper_title' AND meta_value = 'The Times')  
                   THEN 1 ELSE 0 END) targetNewspaper,
               max(CASE WHEN (meta_key = 'article_subject' AND meta_value = 'Politics') 
                   THEN 1 ELSE 0 END) targetSubject
        FROM wp.meta
        GROUP BY post_id
      ) m
    ON (p.post_id = m.post_id)
    WHERE targetNewspaper = 1 AND targetSubject = 1
    ORDER BY p.post_date;
    

    The final query in the script is the one you’re after. With the test dataset it returns:

    post_id     description               post_date                 
    ----------- ------------------------- ------------------------- 
    4           Post #4                   2011-01-05                
    3           Post #3                   2011-01-30  
    

    For each attribute you need to check, you would add an additional case statement as shown above in the meta query and add to the where clause the condition to check whether it was found. (i.e. newTargetedValue = 1)

    Update based on OP comment:

    In my opinion, the score or count method isn’t as flexible as using a pivot table. The inner/pivot table is essentially setting flags for the attributes that have matched based on the cases you’ve provided. (The value will be 1 or 0) In your current example, you’re just ANDing all those together so everything has to be set so a score or count could be used. If you later were required to logically compare those attributes to accommodate a more advanced search, the count/score no longer works. I’ll try to explain with an example.

    Say I asked you to add to the search results you’ve already provided in the question, where I want all posts that had a meta value of ‘day’ = ‘Sunday’ regardless of the paper. So in short I want:

    • All “political” columns from “The Times”.
    • Along with all posts that occurred on a “Sunday” (regardless of the newspaper it was in)

    That wouldn’t work with a count/score because matching rows can return 1, 2, or 3 rows depending on how many attributes match.

    • Count = 1 (i.e Sunday post articles attribute only)
    • Count = 2 Any 2 attributes matched (i.e Sunday post and an article about politics)
    • Count = 3 Matched all criteria (i.e. Politics article in the Sunday edition of ‘The Times’)

    With a pivot table, you can still use logical expressions: (Including the meta flags for clarity)

    SELECT p.*, m.targetNewspaper, targetSubject, targetDay
    FROM wp.posts p
    JOIN
      (
        SELECT post_id,
               max(CASE WHEN (meta_key = 'newspaper_title' AND meta_value = 'The Times')  
                   THEN 1 ELSE 0 END) targetNewspaper,
               max(CASE WHEN (meta_key = 'article_subject' AND meta_value = 'Politics') 
                   THEN 1 ELSE 0 END) targetSubject,
               max(CASE WHEN (meta_key = 'day' AND meta_value = 'Sunday')               
                   THEN 1 ELSE 0 END) targetDay
        FROM wp.meta
        GROUP BY post_id
      ) m
    ON (p.post_id = m.post_id)
    WHERE (targetNewspaper = 1 AND targetSubject = 1) OR targetDay = 1
    ORDER BY p.post_date;
    

    Here are the results:

    post_id  description   post_date   targetNewspaper   targetSubject   targetDay            
    -------- ------------- ----------- ----------------- --------------- ----------- 
    4        Post #4       2011-01-05  1                 1               0                    
    5        Post #5       2011-01-07  0                 0               1                    
    3        Post #3       2011-01-30  1                 1               0                    
    

    Yes, it looks somewhat complex, but once you have the initial idea it’s pretty straight-forward as to how you go about adding more search targets and how to logically compare them to get the records you’re after.

    Hope that explanation made things a little more digestible.

  2. Your code as it is tries to find rows where the meta_key is both “newspaper_title” and “article_subject” at the same time. That is of course impossible. What you really want to ask is, “which rows in wp_posts have a row in wp_postmeta with “newspaper_title” and another row with “article_subject”.

    You can do that with multiple JOINs, but they will explode pretty quickly as you have more and more criteria to check. Another way is to check that the number of rows that match any of the criteria matches the number of criteria:

    SELECT
        PT.parent_id
    FROM
        Parent_Table PT
    LEFT OUTER JOIN Child_Table CT ON
        CT.parent_id = PT.parent_id AND
        CT.tag IN (@tag1, @tag2)
    GROUP BY
        PT.parent_id
    HAVING
        COUNT(DISTINCT CT.tag) = 2
    

    You can change the above query to use a subquery instead, like 2 = (SELECT COUNT(*)...)

    Using your specific situation (please excuse any minor syntax problems since I don’t normally work with MySQL):

    SELECT
        wp_posts.*
    FROM
        wp_posts wp
    INNER JOIN (
        SELECT
            wp2.id,
            COUNT(*) AS cnt
        FROM
            wp_posts wp2
        INNER JOIN wp_postmeta wpm ON
            wpm.post_id = wp2.id AND
            (wpm.meta_key = 'newspaper_title' AND wpm.meta_value = 'The Herald of Freedom & Torch Light') OR
            (wpm.meta_key = 'article_subject' AND wpm.meta_value = 'Politics')
        GROUP BY
            wp2.id
        ) AS SQ ON SQ.id = wp.id AND SQ.cnt = 2
    
  3. A LEFT JOIN is an OUTER join, which means you will get all rows from table on the left, with matching rows or null values if match is not found from table on the right.

    Switch to INNER JOIN to limit number of rows returned to only these, that have matches in table on the right.

  4. I figured I would share what I’ve found with your help.

    1.) This is an schema issue.

    We left WordPress’ post/post-meta format in place and used a plugin to store our record data within the post-meta tables. This created a major issue: instead of having a definitive table designed to contain our data, we had a post (with just a title) and a slew of associated post-meta rows (sorta key-value style).

    Therefore searching with 1 query is impractical.

    2.) The client demanded search across multiple fields (late).

    I took our existing keyword search that operated across all relevant post-meta rows and attempted to expand it. This originally worked, at the time, because it grabbed the first result and returned it… and it didn’t care about multiple matches. As soon as I tried to have multiple conditions that applied only to the post-meta table I pushed the limit of what SQL should do.

    We really should have made our own table.

    3.) We really should’ve made our own table from the beginning.

    I knew it was possible that the client would want to search in very specific ways. And, I should have foreseen the difficulties of spreading the actual column data over multiple rows in a second table. Except for an edge-case this is bad design.

    Conclusion

    When you have need of searching through nearly every attribute of a post simultaneously don’t rely upon WordPress’ Custom Fields. They are very powerful for everything except searching. Build your own table, link it to posts, search in your custom table and join in the valid posts—not the other way around!