'Rolling mean, standard deviation in dbplyr
I want to set a new variable with rolling function (rolling mean, stdev...etc.) in dbplyr
Here is a database
library(odbc)
library(DBI)
library(tidyverse)
library(zoo)
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "xx.xxx.xxx.xxx",
Database = "stock",
UID = "userid",
PWD = "userpassword")
startday = 20150101
day = tbl(con, in_schema("dbo", "LogDay"))
I'd like to calculate rolling mean over 5 days, Here's my code, but it does not work
How can I solve this problem?
library(zoo)
day %>%
mutate(ma5 = rollmean(priceClose, k = 5, fill = NA))
error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]키워드 'AS' 근처의 구문이 [Microsoft][ODBC SQL Server Driver][SQL Server]문을 준비할 수
<SQL> 'SELECT TOP 11 "logNo", "stockCode", "logDate", "priceOpen", "priceHigh", "priceLow", "priceClose", "adjRate", "volume", "amount", "numListed", "remark", "marketCap", "foreignRate", "personNetbuy", "foreignNetbuy", "instNetbuy", "financeNetbuy", "insuranceNetbuy", "toosinNetbuy", "bankNetbuy", "gitaFinanceNetbuy", "pensionNetbuy", "gitaInstNetbuy", "gitaForeignNetbuy", "samoNetbuy", "nationNetbuy", rollmean("priceClose", 5.0 AS "k", NULL AS "fill") AS "ma5"
FROM "dbo"."LogDay"
WHERE ("logDate" > 20150101.0)
ORDER BY "stockCode"'
Warning :
Named arguments ignored for SQL rollmean
Solution 1:[1]
The error is occurring because rollmean
does not have a dbplyr translation defined nor is it an SQL command that can be used without translation. This is unsurprising as rollmean
is part of the data.table library and dbplyr focuses on translating dplyr and base R commands.
Part of what you are after is a window function. dplyr has a range of window functions, as does SQL, but translation between these is not always straightforward. But there are ways to do this using commands that have translations defined.
Two possible approaches to consider:
(1) combining lag and lead
df %>%
mutate(prev2_price = lag(priceClose, 2, order_by = date),
prev1_price = lag(priceClose, 1, order_by = date),
next1_price = lead(priceClose, 1, order_by = date),
next2_price = lead(priceClose, 2, order_by = date)) %>%
mutate(ma5 = (prev2_price + prev1_price + priceClose + next1_price + next2_price) / 5)
This approach will not scale well, but it is straightforward and easy to reason about. If you want to work within groups (e.g. separate moving averages for each stock) apply a group_by
before using lag
and lead
.
(2) join and filter out records that are not wanted
df2 = df %>%
select(stockCode, date, priceClose)
df %>%
inner_join(df2, by = "stockCode", suffix = c("","_2") %>%
filter(abs(date - date_2) <= 2) %>% # two records either side = window of width 5
group_by(stockCode, date, priceClose) %>%
summarise(ma5 = mean(priceClose_2)
This approach is much more general, but may be harder to reason about.
Solution 2:[2]
day = tbl(con, in_schema("dbo", "LogDay")) %>% filter(logDate > startday) %>% lazy_dt()
dayt = day %>%
group_by(stockCode) %>%
arrange(logDate) %>%
mutate(rise = (priceClose/lag(priceClose,1)-1)*100,
candle = ifelse(priceClose > priceOpen, 1, 0),
middle = ifelse(priceClose > (priceHigh + priceLow)/2, 1, 0),
ma5 = rollmean(priceClose, k = 5, fill = NA, align = 'right'),
ovnprofit = lead(priceOpen,1)/priceClose,
disparity = priceClose/ma5*100)
Solution 3:[3]
mgirlich's comment on this github issue (Request for time series function (lead, lag, rolling)) should be helpful here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Simon.S.A. |
Solution 2 | Wookeun Lee |
Solution 3 | Richard Harrison |