'SQL rank counting occurences
In my table I have data from employee contracts like this:
And I want to get job position in time periods, something like this
login | ValidFrom | ValidTo | JobPosition |
---|---|---|---|
bcde | 2019-07-01 | 2019-09-30 | Project Manager |
bcde | 2020-01-09 | 2020-06-16 | Head of Center of Excellence |
bcde | 2020-06-17 | 2021-07-31 | Team Leader |
bcde | 2021-08-01 | 2099-12-31 | Head of Center of Excellence |
So I write query:
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, login
But It doesn't work in this case (if someone has the same JobPosition), so I decided to use dense_rank, like this:
select login, ValidFrom, ValidTo, JobPosition
,dense_rank() OVER (Partition BY JobPosition ORDER BY login, ValidFrom, ValidTo, JobPosition) as no
from employeeContracts
and after that
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login
but problem is dense_rank doesn't work as I need ;) I want to get something like this:
login | ValidFrom | ValidTo | JobPosition | no |
---|---|---|---|---|
bcde | 2019-07-01 | 2019-09-30 | Project Manager | 1 |
bcde | 2020-01-09 | 2020-06-16 | Head of Center of Excellence | 2 |
bcde | 2020-06-17 | 2020-07-31 | Team Leader | 3 |
bcde | 2020-08-01 | 2021-03-31 | Team Leader | 3 |
bcde | 2021-04-01 | 2021-06-30 | Team Leader | 3 |
bcde | 2021-07-01 | 2021-07-31 | Team Leader | 3 |
bcde | 2021-08-01 | 2021-12-31 | Head of Center of Excellence | 4 |
bcde | 2022-01-01 | 2022-05-09 | Head of Center of Excellence | 4 |
bcde | 2022-02-01 | 2022-05-09 | Head of Center of Excellence | 4 |
bcde | 2022-05-09 | 2099-12-31 | Head of Center of Excellence | 4 |
and after that use query to get final result:
select DimEmployeeId, JobPosition, login, min(ValidFrom), max(ValidTo)
from employeeContracts
group by DimEmployeeId, JobPosition, no, login
Example Data Scheme
CREATE TABLE employeeContracts (
login text,
ValidFrom datetime,
ValidTo datetime,
JobPosition text
);
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2019-07-01', '2019-09-30', 'Project Manager');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-01-09', '2020-06-16', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-06-17', '2020-07-31', 'Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2020-08-01', '2021-03-31', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-04-01', '2021-06-30', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-07-01', '2021-07-31', ' Team Leader');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2021-08-01', '2021-12-31', ' Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-01-01', ' 2022-05-09', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-02-01', '2022-05-09', 'Head of Center of Excellence');
INSERT INTO employeeContracts (login, ValidFrom, ValidTo, JobPosition)
VALUES ('bcde', '2022-05-09', '2099-12-31', 'Head of Center of Excellence');
This can be tested Here
Solution 1:[1]
This sounds like a "Gaps and Islands" problem. One way of approaching it is to identify and assign rankings to each "island" (or date ranges for each JobPosition).
login | ValidFrom | ValidTo | JobPosition | PrevPosition | JobPositionGroup |
---|---|---|---|---|---|
bcde | 2019-07-01 00:00:00 | 2019-09-30 00:00:00 | Project Manager | null | 1 |
bcde | 2020-01-09 00:00:00 | 2020-06-16 00:00:00 | Head of Center of Excellence | Project Manager | 2 |
bcde | 2020-06-17 00:00:00 | 2020-07-31 00:00:00 | Team Leader | Head of Center of Excellence | 3 |
bcde | 2020-08-01 00:00:00 | 2021-03-31 00:00:00 | Team Leader | Team Leader | 3 |
bcde | 2021-04-01 00:00:00 | 2021-06-30 00:00:00 | Team Leader | Team Leader | 3 |
bcde | 2021-07-01 00:00:00 | 2021-07-31 00:00:00 | Team Leader | Team Leader | 3 |
bcde | 2021-08-01 00:00:00 | 2021-12-31 00:00:00 | Head of Center of Excellence | Team Leader | 4 |
bcde | 2022-01-01 00:00:00 | 2022-05-09 00:00:00 | Head of Center of Excellence | Head of Center of Excellence | 4 |
bcde | 2022-02-01 00:00:00 | 2022-05-09 00:00:00 | Head of Center of Excellence | Head of Center of Excellence | 4 |
bcde | 2022-05-09 00:00:00 | 2099-12-31 00:00:00 | Head of Center of Excellence | Head of Center of Excellence | 4 |
Then apply your min/max logic, grouping by the ranking number:
SELECT grp.JobPositionGroup
, grp.JobPosition
, grp.Login
, MIN(grp.ValidFrom) AS ValidFrom
, MAX(grp.ValidTo) AS ValidTo
FROM (
SELECT cron.*
, SUM( IF( JobPosition = PrevPosition, 0, 1) ) OVER(
ORDER BY ValidFrom, ValidTo
) AS JobPositionGroup
FROM (
SELECT ec.*
, LAG(JobPosition, 1) OVER (
ORDER BY ValidFrom, ValidTo
) AS PrevPosition
FROM employeeContracts ec
)
cron
) grp
GROUP BY grp.JobPositionGroup
, grp.JobPosition
, grp.Login
Results:
JobPositionGroup | JobPosition | Login | ValidFrom | ValidTo |
---|---|---|---|---|
1 | Project Manager | bcde | 2019-07-01 00:00:00 | 2019-09-30 00:00:00 |
2 | Head of Center of Excellence | bcde | 2020-01-09 00:00:00 | 2020-06-16 00:00:00 |
3 | Team Leader | bcde | 2020-06-17 00:00:00 | 2021-07-31 00:00:00 |
4 | Head of Center of Excellence | bcde | 2021-08-01 00:00:00 | 2099-12-31 00:00:00 |
db<>fiddle here
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | SOS |