'width_bucket not returning buckets of equal width

I'm using Postgres version 9.6.9 and attempting to use width_bucket() to generate a histogram with buckets consisting of equal widths. However, the query I'm using is not returning buckets of equal widths.

As you can see in the example below, the values in the bucket have varying widths. e.g. bucket 1 has a min of 7 and a max of 18 - a width of 11. bucket 3 has a min of 52 and a max of 55 - a width of 3.

How can I adjust my query to ensure that each bucket has the same width?

Here's what the data looks like:

 value
-------
 7
 7
15
17
18
22
23
25
29
42
52
52
55
60
74
85
90
90
92
95

(20 rows)

Here's the query and resulting histogram:

WITH min_max AS (
    SELECT 
        min(value) AS min_val,
        max(value) AS max_val
    FROM table
) 
SELECT 
    min(value),
    max(value),
    count(*),
    width_bucket(value, min_val, max_val, 5) AS bucket
FROM table, min_max  
GROUP BY bucket 
ORDER BY bucket;

 min | max | count | bucket
-----+-----+-------+--------
   7 |  23 |     7 |      1
  25 |  42 |     3 |      2
  52 |  55 |     3 |      3
  60 |  74 |     2 |      4
  85 |  92 |     4 |      5
  95 |  95 |     1 |      6
 ( 6 rows )


Solution 1:[1]

From https://prestodb.io/docs/current/functions/window.html

Have a look at ntile():

ntile(n) ? bigint

Divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket.

For example, with 6 rows and 4 buckets, the bucket values would be as follows: 1 1 2 2 3 4

Or say to rank each runner's 100m race times to find their personal best out of their 10 races:

SELECT
NTILE(10) over (PARTITION BY runners ORDER BY racetimes)
FROM
table

Solution 2:[2]

Your buckets are the same size. You just don't have data that accurately represents the end-points.

For instance, would 24 be in the first or second bucket? This is more notable for the ranges without any data, such as 75-83.

Solution 3:[3]

From https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html

WIDTH_BUCKET( expression, min, max, buckets)

The buckets argument specifies the number of buckets to create over the range defined by min through max. min is inclusive, whereas max is not.

Maximum is not included. so set

    WIDTH_BUCKET( expression, min, max + 1, buckets)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 bguiz
Solution 2 Gordon Linoff
Solution 3