'Get the rate of change by finding the change in price
UPDATE: I'm getting a strange result in the outcome. Occasionally, the earliest date of the result show after 2 or 3 etc times for example
Item | Kg | Date_1 | Price_1 | change_1 | Date_2 | Price_2 | change_2 |
---|---|---|---|---|---|---|---|
Apples | 1 | 2022-02-01 | 1 | NA | 2022-02-16 | 2 | 1 |
Meat | NA | NA | NA | NA | 2022-02-03 | 1 | NA |
As you can see, meat is showing no change at first, but the result is showing in the second one. This occurs throughout the program. Any idea why?
I am fairly new to programming. I am working on my portfolio, and am looking at a dataset regarding the price of food from distribution centers to a grocery store. What I am looking at is a set of data with the price, item, and date of transaction. What I am looking for is to find the rate of change from the distribution center to the store, and when it happened.
Note: the price of the item changes from the distribution center.
Here is an example of what I am looking at:
Date | Item | Price | Kg |
---|---|---|---|
01.02.2022 | Apple | $1.00 | 1 |
02.02.2022 | Meat | $4.00 | 1 |
03.02.2022 | Fish | $3.00 | 1 |
03.02.2022 | Bread | $1.00 | 1 |
15.02.2022 | Meat | $5.00 | 1 |
15.02.2022 | Meat | $3.00 | 1 |
16.02.2022 | Apple | $2.00 | 1 |
20.02.2022 | Fish | $3.00 | 1 |
25.02.2022 | Apple | $0.50 | 1 |
As you can see, the price for the same quantity for the same product changes randomly over time. What I would like to analyse is:
- The rate of change per item
- When the change occured
This is the ideal outcome:
item | kg | 1st_price | 1st_price_date | 2nd_price | 2nd_price_date | amount_of_change |
---|---|---|---|---|---|---|
Apple | 1 | $1.00 | 01.02.2022 | $2.00 | 16.02.2022 | +$1.00 |
Meat | 1 | $4.00 | 02.02.2022 | $5.00 | 15.02.2022 | +$1.00 |
Bread | 1 | $1.00 | 03.02.2022 | N/A | N/A | N/A |
Fish | 1 | $3.00 | 03.02.2022 | $3.00 | 20.02.2022 | +$0.00 |
#Continuing the table below. These columns would go to the right of the columns above. #Unfortunetly, StackOverflow was not able to create a table with everything together. #total_change is for the entire period
item | 3rd_price | 3rd_price_date | amount_of_change | change_duration_period | total_change |
---|---|---|---|---|---|
Apple | $0.50 | 25.02.2022 | -$1.50 | 01.02.2022-25.02.2022 | -$0.50 |
Meat | $3.00 | 15.02.2022 | -$2.00 | 02.02.2022-1502.2022 | -$1.00 |
Bread | N/A | N/A | N/A | 03.02.2022-03.02-2022 | +$0.00 |
Fish | $3.00 | 20.02.2022 | +$0.00 | 03.02.2022-20.02.2022 | +$0.00 |
As you can see, some items can have more price changes per month than others depending on the item. Some items have drastic changes, some have no changes at all.
Presuming there are over 14,000 unique items what would you recommend to gather the data an place them in a table as seen in the "Ideal outcome" section?
I am still new to programming, please don't be too harsh!
Thanks!
Solution 1:[1]
Something like this?
library(tidyverse)
df %>%
# convert Date to a date, and Price to a number
mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
Price = parse_number(Price)) %>%
# for each Item, arrange by Date, tally, and calc price change
group_by(Item) %>%
arrange(Date) %>%
mutate(appearance = row_number(),
change = Price - lag(Price)) %>%
ungroup() %>%
# use the tally to reshape wider the date, price and change
pivot_wider(names_from = appearance,
values_from = c(Date, Price, change),
names_vary = "slowest")
Result
# A tibble: 4 × 11
Item Kg Date_1 Price_1 change_1 Date_2 Price_2 change_2 Date_3 Price_3 change_3
<chr> <int> <date> <dbl> <dbl> <date> <dbl> <dbl> <date> <dbl> <dbl>
1 Apple 1 2022-02-01 1 NA 2022-02-16 2 1 2022-02-25 0.5 -1.5
2 Meat 1 2022-02-02 4 NA 2022-02-15 5 1 2022-02-15 3 -2
3 Fish 1 2022-02-03 3 NA 2022-02-20 3 0 NA NA 0
4 Bread 1 2022-02-03 1 NA NA NA 0 NA NA 0
Source data
df <- data.frame(
stringsAsFactors = FALSE,
Date = c("01.02.2022","02.02.2022",
"03.02.2022","03.02.2022","15.02.2022","15.02.2022",
"16.02.2022","20.02.2022","25.02.2022"),
Item = c("Apple","Meat","Fish",
"Bread","Meat","Meat","Apple","Fish","Apple"),
Price = c("$1.00","$4.00","$3.00",
"$1.00","$5.00","$3.00","$2.00","$3.00","$0.50"),
Kg = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |