'How Can I Retrieve CLOBs for Unique Records in Group By

I have a table with one CLOB field (s.File) and I need to create a join on this table to get related data. However, in Oracle, CLOBs are not allowed in group by queries. What is the alternative? I have the following query for which I need to retrieve and serialize the CLOB, but I need the related information as well. See below:

select s.FileName, s.File, c.Name as CompanyName
from Source s inner join Company c on s.id = c.id 
group by s.FileName, s.File, c.Name

I tried rewriting the query using row_number and partition by, but couldn't get that solution to work either. Any ideas would be greatly appreciated. The expected results would look something like

  • List item

:

FileName                File        CompanyName
interestincome.xlsx     CLOB        Woodson Company, Inc.
projectdetails.pdf      CLOB        Johnson Products, LLC
expenses2021.xlsx       CLOB        Marshall Consultants, LLP 


Solution 1:[1]

If you want to get the unique CompanyName/Filename combinations and get a single FILE (rather than including the files in the GROUP BY clause) then:

SELECT filename,
       (SELECT "FILE" FROM Source f WHERE f.ROWID = min_rowid) AS "FILE",
       companyname
FROM   (
  select s.FileName,
         c.Name as CompanyName,
         MIN(s.ROWID) AS min_rowid
  from   Source s
         inner join Company c
         on s.id = c.id 
  group by
         s.FileName, c.Name
)

If you want to check that the files are identical then you can use:

SELECT filename,
       (SELECT "FILE" FROM Source f WHERE f.ROWID = min_rowid) AS "FILE",
       companyname
FROM   (
  select s.FileName,
         c.Name as CompanyName,
         MIN(s.ROWID) AS min_rowid
  from   Source s
         inner join Company c
         on s.id = c.id 
  group by
         s.FileName,
         c.Name,
         DBMS_CRYPTO.HASH(s."FILE", DBMS_CRYPTO.HASHSH256)
)

If you want to also consider the unlikely event of hash collisions then you can also compare the start of the files (since any hash collisions are likely to be between files that are not alike):

SELECT filename,
       (SELECT "FILE" FROM Source f WHERE f.ROWID = min_rowid) AS "FILE",
       companyname
FROM   (
  select s.FileName,
         c.Name as CompanyName,
         MIN(s.ROWID) AS min_rowid
  from   Source s
         inner join Company c
         on s.id = c.id 
  group by
         s.FileName,
         c.Name,
         DBMS_LOB.SUBSTR(s."FILE", 1, 4000),
         DBMS_CRYPTO.HASH(s."FILE", DBMS_CRYPTO.HASHSH256)
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MT0