'Classify if a customer converted on initial email or reminder email using SQL

I have been stuck to get around this in Oracle SQL. I have a table that looks like following.

cust_id event event_date camp_id email_ind(calculated column)
1 OPENED 06/09/2021 80
1 SENT 06/09/2021 80 initial
1 SENT 14/09/2021 80 reminder
2 CLICK 15/04/2021 80
2 OPENED 15/04/2021 80
2 SENT 15/04/2021 80 initial
2 CONVERTED 15/04/2021 80
2 WEBSITE_VISIT 15/04/2021 80
3 OPENED 01/06/2021 80
3 SENT 01/06/2021 80 initial
3 OPENED 09/06/2021 80
3 SENT 09/06/2021 80 reminder
3 CLICK 10/06/2021 80
3 OPENED 10/06/2021 80
3 SENT 10/06/2021 80
3 CONVERTED 10/06/2021 80
3 WEBSITE_VISIT 10/06/2021 80
4 SENT 06/09/2021 80 initial
4 SENT 14/09/2021 80 reminder
5 OPENED 20/09/2021 80
5 SENT 20/09/2021 80 initial
5 SENT 28/09/2021 80 reminder
5 CLICK 03/10/2021 80
5 OPENED 03/10/2021 80
5 SENT 03/10/2021 80 not a reinder email
5 CONVERTED 03/10/2021 80
5 WEBSITE_VISIT 03/10/2021 80
5 OPENED 05/11/2021 80
6 OPENED 01-Jun-21 80
6 SENT 01-Jun-21 80 initial
6 OPENED 09-Jun-21 80
6 SENT 09-Jun-21 80 reminder
7 OPENED 26-Jul-21 80
7 SENT 26-Jul-21 80 initial
7 CLICK 03-Aug-21 80
7 OPENED 03-Aug-21 80
7 SENT 03-Aug-21 80 not a reminder email
7 CONVERTED 03-Aug-21 80
7 WEBSITE_VISIT 03-Aug-21 80
7 CLICK 04-Aug-21 80
7 OPENED 04-Aug-21 80
7 SENT 04-Aug-21 80
8 OPENED 12-Jul-21 80
8 SENT 12-Jul-21 80 initial
8 OPENED 20-Jul-21 80
8 SENT 20-Jul-21 80 reminder
9 SENT 29-Apr-21 80 initial
9 SENT 07-May-21 80 reminder

cust_id is customer_id ,event is the type of event (SENT - email was sent, OPENED - email was opened, CLICK- email was clicked, CONVERTED - customer accepted the offer) ,event_date is the timestamp of the event ,camp_id is the campaign_id.

Please note email_ind is not in the dataset. I basically want to know if the customer CONVERTED on the initial email or the reminder email that was SENT or there was no CONVERSION at all?

I want to create a new column called email_ind that tells if it is a initial email or a reminder email. Suppose, for cust_id=1, the initial email was sent on 06-Sept-21 and reminder email was sent 14-Sept-21 For cust_id=2, there was only initial email on 15-Apr-21 and then the cust_id CONVERTED on the same day. That means the customer accepted the offer on the initial email.

Also, when there is event='CONVERTED' there is automatically an event='SENT' is generated which should not be considered as a reminder email. For ex for cusT_id='3', the event 'CONVERTED' occured on 10-JUN-21 and automatically an event='SENT' is generated for 10-JUN-21 which should not be considered as reminder email.

Also, once the customer is CONVERTED, I do not want to consider the events that follow after conversion. For ex cust_id=7, it got converted on 03-Aug-21, so the rest of the vents on 04-Aug-21 can be ignored There are multiple camp_ids.

I tried self join and use windows function like over() partition by() but it doesn't seem to work. I must be missing something. It would be great if anyone could help here?



Solution 1:[1]

From Oracle 12, you can use MATCH_RECOGNIZE to do row-by-row processing:

SELECT cust_id,
       event,
       event_date,
       camp_id,
       CASE email_ind
       WHEN 'CONVERTED' THEN 'CONVERTED'
       WHEN 'INITIAL'   THEN 'INITIAL'
       WHEN 'REMINDER'  THEN 'REMINDER'
       END AS email_ind,
       CASE
       WHEN email_ind = 'CONVERTED'
       AND  has_reminder = 1
       THEN 'REMINDER'
       WHEN email_ind = 'CONVERTED'
       THEN 'INITIAL'
       END AS conversion_type
FROM   (SELECT t.*, ROWNUM AS rn FROM table_name t)
MATCH_RECOGNIZE (
  PARTITION BY cust_id
  ORDER     BY rn
  MEASURES
    CLASSIFIER() AS email_ind,
    COUNT(reminder.cust_id)  AS has_reminder
  ALL ROWS PER MATCH
  PATTERN ( ("INITIAL" not_sent*)? (reminder not_sent*)? not_reminder converted | other )
  DEFINE
    "INITIAL"    AS event = 'SENT',
    reminder     AS event = 'SENT',
    not_reminder AS event = 'SENT',
    not_sent     AS event <> 'SENT',
    converted    AS event = 'CONVERTED'
)

db<>fiddle here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1