Unable to figure out a SQL query involving 4 tables in MySQL

There is a query in SQL where I am badly stuck, I have tried every possible way I could but couldn’t get the solution. I have 4 tables named: user, item, buys, rates.

CREATE TABLE User (
    id integer,
    name varchar(30),
    Primary Key (id)
)

INSERT INTO User(id, name)
VALUES
('1', 'Lorren'),
('2', 'Smith'),
('3', 'Stephen'),
('4', 'David'),
('5', 'Sophie'),
('6', 'Alex'),
('7', 'Henry'),
('8', 'Jasmine'),
('9', 'Anderson'),
('10', 'Bilal')

CREATE TABLE Item (
    id integer,
    description varchar(50),
    category varchar(30),
    price integer,
    Primary Key (id)
)

INSERT INTO Item(id, description, category, price)
VALUES
('50', 'Princess Diary', 'Book', '8'),
('51', 'Frozen', 'Book', '4'),
('52', 'Tangled', 'Book', '3'),
('53', 'Oak Table', 'Furniture', '370'),
('54', 'Doble Bed', 'Furniture', '450'),
('55', 'Metal Cupboard', 'Furniture', '700'),
('56', 'Levi 501', 'Clothes', '90'),
('57', 'Corduroy Coat', 'Clothes', '230'),
('58', 'Straight Trousers', 'Clothes', '45'),
('59', 'Black Sequin Top', 'Clothes', '85')


CREATE TABLE Buys (
    user integer,
    item integer,
    price integer,
    Primary Key (user, item),
    Foreign key (user) REFERENCES User(id),
    Foreign Key (item) REFERENCES Item(id)
)

INSERT INTO Buys
VALUES ('1', '52', '3'),
('1', '56', '90'),
('2','56','100'),
('2', '54', '450'),
('5', '53', '400'),
('5', '55', '700'),
('5', '59', '90'),
('6', '57', '230'),
('10', '58', '50'),
('8', '50', '8')


CREATE TABLE Rates (
    user integer,
    item integer,
    rating integer CHECK (0<=rating<=5),
    Primary Key (user, item),
    Foreign key (user) REFERENCES User(id),
    Foreign Key (item) REFERENCES Item(id)
)

INSERT INTO Rates
VALUES
('1', '52', '5'),
('1', '56', '3'),
('2', '54', '5'),
('2', '55', '4'),
('2', '56', '2'),
('5', '53', '5'),
('5', '55', '5'),
('8', '50', '1'),
('8', '55', '3'),
('9', '55', '4')

I have to find against each user all the items not bought by him, but display only those item/items among them which has/have the highest average rating. So the result should display only those item/items which were not bought by him and have the highest average rating. Rating is 1-5, and each item may have different ratings so avg rating can be calculated for each, but I am not able to find out items with highest avg rating gaainst each user which are not bought by him. I am working in MYSQL, I am stuck in here for 6 days and even my friends tried nobody could solve it. Can anybody help?

Read More

Expected Output considering the current tables should be like this:

User    Items With Highest Average      
Lorren     53
Lorren     54
Smith      52
Smith      53
Stephen    52
Stephen    53
Stephen    54
David      52
David      53
David      54
Sophie     52
Sophie     54
Alex       52
Alex       53
Alex       54
Henry      52
Henry      53
Henry      54
Jasmine    52
Jasmine    53
Jasmine    54
Anderson   52
Anderson   53
Anderson   54
Bilal      52
Bilal      53
Bilal      54

Related posts

Leave a Reply

2 comments

  1. Ok, definitely not my prettiest work, especially since I don’t usually work in MySQL (EDIT: SQLFiddle is back up. Fixed an inner group, now this works):

    SELECT topItemsAllUsers.* FROM
      (SELECT 
        u.id AS userId, 
        u.name, 
        topItems.itemId
      FROM
        (SELECT 
          iwa.id AS itemId
        FROM
          (SELECT 
            MAX(AverageRating) AS MaxRating
          FROM
            (SELECT 
              i.id, 
              AVG(COALESCE(r.rating, 0)) AS AverageRating
            FROM Item i
            LEFT JOIN Rates r ON r.item = i.id
            GROUP BY i.id
            ) AS averages
          ) AS MaxOuterRating
          INNER JOIN 
          (SELECT 
            i.id, 
            AVG(COALESCE(r.rating, 0)) AS AverageRating
          FROM Item i
          LEFT JOIN Rates r ON r.item = i.id
          GROUP BY i.id
          ) as iwa ON iwa.AverageRating = MaxOuterRating.MaxRating
        ) as topItems
      CROSS JOIN
      User u
      ) as topItemsAllUsers
    LEFT JOIN Buys b ON topItemsAllUsers.userId = b.user AND topItemsAllUsers.itemId = b.item
    WHERE b.user IS NULL
    

    Here is the SQLFiddle

    In TSQL, I would at least use a CTE for that table of average ratings. This was a lot tougher than it looked initially!

    EDIT: Some explanation follows. The first thing to get is each item’s average rating, using 0 for items with no rating (hence the COALESCE() statement):

    (SELECT 
      i.id, 
      AVG(COALESCE(r.rating, 0)) AS AverageRating
    FROM Item i
    LEFT JOIN Rates r ON r.item = i.id
    GROUP BY i.id)
    

    This will list each item id once with its average rating. I named this averages, and I actually used it query twice (naming it iwa the second time. I don’t remember what “iwa” was supposed to mean any more…), once to get the actual highest rating:

    SELECT
      MAX(AverageRating) AS MaxRating
    FROM averages
    

    and named that MaxOuterRating, then INNER JOINed that result back onto iwa, on AverageRating = MaxRating , to only get item(s) with that highest rating:

    SELECT
      iwa.itemId
    FROM
    MaxOuterRating
    INNER JOIN iwa ON iwa.AverageRating = MaxOuterRating.MaxRating
    

    This result is contained in the topItems alias.

    Now that we have only the item(s) with the top rating, CROSS JOIN with User to get a table with every top item for every user:

    SELECT 
      ... 
    FROM
    topItems
    CROSS JOIN
    Users
    

    This result is in topItemsAllUsers.

    Finally, do LEFT JOIN with Buys on both user id and item id, then restrict results to only those rows where there are no Buys records associated (This is usually called an exclude join):

    SELECT 
      topItemsAllUsers.*
    FROM
    topItemsAllUsers
    LEFT JOIN Buys b ON topItemsAllUsers.userId = b.user AND topItemsAllUsers.itemId = b.item
    WHERE b.user IS NULL
    

    Et viola. None of the operations is particularly difficult, but they are nested up so badly, it was difficult to see how to attack. I do not doubt this could be improved greatly, but this does return expected results.

  2. So just for starters, the list of items not bought by respective users is as follows, right?

        SELECT u.*
             , i.* 
          FROM user u 
          JOIN item i 
          LEFT 
          JOIN buys b 
            ON b.user = u.id 
           AND b.item = i.id 
         WHERE b.item IS NULL;
    

    … in which case …

    SELECT x.* FROM
    (
    SELECT u.id user_id
         , u.name
         , i.id item_id
         , i.description
         , i.category
         , i.price
         , r.rating
      FROM user u 
      JOIN item i 
      LEFT 
      JOIN buys b 
        ON b.user = u.id AND b.item = i.id
      JOIN rates r
        ON r.item = i.id
     WHERE b.item IS NULL
    ) x
    JOIN
    (
    SELECT u.id,r.rating 
      FROM user u 
      JOIN item i 
      LEFT 
      JOIN buys b 
        ON b.user = u.id AND b.item = i.id
      JOIN rates r
        ON r.item = i.id
      JOIN (SELECT AVG(rating) max_avg FROM rates GROUP BY item ORDER BY AVG(rating) DESC LIMIT 1) n
        ON n.max_avg = r.rating
     WHERE b.item IS NULL
     GROUP 
        BY u.id
     ) y
     ON y.id = x.user_id
    AND y.rating = x.rating
    ORDER 
      BY user_id,item_id;
    

    … should produce the desired result

    Edited to incorporate Paul Griffin’s observation, although in doing so, I’ve perhaps made the query more convoluted than it needs to be.