'What is the optimized/best way to retrieve data from two tables?

I have two tables:

post table:

|post_id | post_title |
+--------+------------+
| 1      | Post 1     |
| 2      | Post 2     |
| 3      | Post 3     |

post_creator table:

|post_id | creator |
+--------+---------+
| 1      | John    | 
| 1      | Smith   | 
| 1      | Mike    |
| 2      | Bob     |
| 3      | Peter   |
| 3      | Brad    |

When I join these tables it looks like this.

SELECT * 
FROM post p
JOIN post_creator c ON p.post_id = c.post_id


|post_id | post_title | post_id | creator|
+----------------------------------------+
| 1      | Post 1     | 1       | John   |
| 1      | Post 1     | 1       | Smith  |
| 1      | Post 1     | 1       | Mike   |
| 2      | Post 2     | 2       | Bob    |
| 3      | Post 3     | 3       | Peter  |
| 3      | Post 3     | 3       | Brad   |

I want to grab each post with it's creators. But in this case my joined result has same post repeated again and again because of the creator.

What I did was first I fetched all data from post table. Then I looped that result and inside the loop I fetched all creators of each posts. But in this case it query again and again for each content to get the creators.

$sql = "SELECT * FROM post";
$stmt = $conn->prepare($sql);
$stmt->execute();
$res = $stmt->fetchAll(PDO::FETCH_OBJ);

$dataObj = new stdClass;
$dataArr = [];

foreach($res as $post){
  $sql = "SELECT creator FROM post_creator WHERE post_id=$post->post_id";
  $stmt = $conn->prepare($sql);
  $stmt->execute();
  $creators = $stmt->fetchAll(PDO::FETCH_OBJ);
   
  $dataObj->post_id = $post->post_id
  $dataObj->post_title = $post->title
  $dataObj->creators = $creators;
  
  array_push($dataArr, $dataObj);
}

So finally my dataArr has this kind of a structure.

[
  {
    post_id: 1, 
    post_title: Post 1, 
    creators:[John, Smith, Mike] 
  },
  
  {
    post_id: 2, 
    post_title: Post 2, 
    creators:[Bob] 
  },

  {
    post_id: 2, 
    post_title: Post 1, 
    creators:[Peter, Brad] 
  },
]

This is what I wanted. Now I can loop this and render to a view.

Are there any optimized/better way to get this result without looping and querying again and again?



Solution 1:[1]

I think you need to use group_concat to group your creators.

SELECT p.post_id, post_title, group_concat(creator) 
FROM post p
JOIN post_creator using(post_id) 
group by p.post_id

Additionally, this:

$sql = "SELECT creator FROM post_creator WHERE post_id=$post->post_id";
$stmt = $conn->prepare($sql);
$stmt->execute();

is improper usage of a prepared statement. It should be written as:

$sql = "SELECT creator FROM post_creator WHERE post_id=?";
$stmt = $conn->prepare($sql);
$stmt->execute(array($post->post_id));

if it were needed, but it is not. Always bind values, never put direct to SQL.

Solution 2:[2]

I'd say there are 3 different roads you could follow, all of whom have some benefit or another.

Option 1. Simple SELECT query with JOIN (and overlapping rows)

This is more or less what you've already tried, with the first query you listed; which resulted in duplicate rows.

It's fairly trivial to modify your application code to deal with the dupes, and simply fold the creators into the same array/object. The overhead is almost nil as well. From a relational database design point-of-view, this method is still the best practice.

   SELECT p.post_id
        , p.post_title
        , c.creator
     FROM post         p
LEFT JOIN post_creator c 
       ON p.post_id = c.post_id
 ORDER BY p.post_id ASC

.

/* $rows = ...query...; */
$posts = [];
foreach ($rows as $row) {
    if (!isset($posts[( $row['post_id'] )])) {
        // this is a new post_id
        $post                       = [];
        $post['id']                 = $row['post_id'];
        $post['creators']           = [];
        $post['creators'][]         = $row['creator'];
        $posts[( $row['post_id'] )] = $post;
    } else {
        // this is just an additional creator
        $posts[( $row['post_id'] )]['creators'][] = $row['creator'];
    }
}

Option 2. Multivalue columns (arrays or json)

A slightly more pragmatic solution for non-purists can be to have your query produce output columns which contain more than one value. This generally means either a JSON or an ARRAY column. The exact details depend on your choice of database system.

In either case, you'd combine it with the SQL GROUP BY feature. Let's assume you use MySQL and prefer the JSON type; you'd then go with a query such as:

    SELECT p.post_id
         , p.post_title
         , JSON_ARRAYAGG(c.creator) AS creators
      FROM post         p
 LEFT JOIN post_creator c 
        ON p.post_id = c.post_id
  GROUP BY p.post_id
  ORDER BY p.post_id ASC

This way, you'll only receive one record per post, and you'll get a value such as ['Mike', 'Paul', 'Susan'] which json_decode() can turn into a proper PHP array.

Option 3. Fullblown documents

Another alternative that kind of builds upon option #2 is to go entirely with JSON, and abandon the relational recordset altogether.

Most modern DBMS have plenty of JSON functionality and the format you yourself listed as dataArr, could be fully produced by the database in response to a single SELECT query.

This way, the query would always result in just 1 row with 1 single column, which holds the entire dataArr combining all those posts (which again, can be turned into a native PHP array or object tree with json_decode, just like before).

While the result of this method can be very neat (depending on the way your application is written), some may wonder why you're using an RDBMS and not something like MongoDB.


Overall i'd recommend Option 1.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user3783243
Solution 2 Raxi