'SQL query returns result multiple times

I'm pretty new to SQL and am trying to join some tables in SQL. I'm using SQLite3 and Pandas and have the following table structure:

    User
      |
Measurement - Environment - meas_device - Device
      |                          | 
    Data                 Unit_of_Measurement

Why do I get the result of the following SQL-query multiple times (4x)?

query = """
SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data 
FROM Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE User.name = 'nicola'
"""

pd.read_sql_query(query, conn)

My guess is that I did something wrong with the joining, but I can not see what. I hoped to be able to save a JOIN statement somewhere that works for every possible query, that's why more tables are joined than necessary for this query.

Update I think the problem lies within the Environment table. Whenever I join this table the results get multiplied. As the Environment is a collection of meas_devices, there are multiple entries with the same Environment id. (I could save the Environment table with the corresponding meas_device_id's as lists, but then I see no possibility to link the Environment table with the meas_device table.)

id | meas_device_id
1  |      1
1  |      2
1  |      5
2  |      3
2  |      4

Up until now i created the tables with pandas DataFrame.to_sql() therefore the id is not marked as primary key or something like that. Could this be the reason for my problem

Update 2 I found the problem. I don't think that actually helps somebody in the future. But for the sake of completeness, here the explanation. It was not really a question of how to link the tables but I neglected a crucial link. Because the Environment has multiple indices with the same value it created "open ends" that resulted in a multiplication of the results. I needed to add a cross-check between Environment.subset_id and Data.subset_id. The following query works fine:

query = f""" SELECT {SELECT}
          FROM Data
          JOIN Measurement ON Data.measurement_id = Measurement.id
          JOIN User ON Measurement.user_id = User.id
          JOIN Environment ON Measurement.Environment_id = Environment.id
          JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
          JOIN Device ON meas_device.Device_id = Device.id
          JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
          WHERE {WHERE} AND Environment.subset_id = Data.subset_id
          """


Solution 1:[1]

If you need to filter on tables that produce additional rows in the result the when they are joined, don't join them and instead include them in a sub-query in the WHERE clause.

E.g.

SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM
    Measurement
    JOIN Data ON Measurement.id = Data.measurement_id
    JOIN User ON Measurement.user_id = user.id
WHERE
    Measurement.Environment_id IN (
        SELECT Environment.id
        FROM
            Environment
            JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
            JOIN Device ON meas_device.device_id = Device.id
            JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
        WHERE Device.name = 'xy'
    )

In this subquery you can join many tables without generating additional records.

If this is not an option because you want to select entries from other tables as well, you can simply add a DISTINCT to you original query.

SELECT DISTINCT
    User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data 
FROM
   Measurement
   JOIN Data ON Measurement.id = Data.measurement_id
   JOIN User ON Measurement.user_id = user.id
   JOIN Environment ON Measurement.Environment_id = Environment.id
   JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
   JOIN Device ON meas_device.device_id = Device.id
   JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE
   User.name = 'nicola'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1