'Simulate Time Series Events with Accurate Scheduler
I have an API which I will need to run some tests. We have already done the stress and load testing but the best way to test is to run some real life data. I have a fact table with all the historical data for the past years. The goal is to find a busy window of that history and "replay" it against our API.
Is there a way to "replay time series" data and simulate the API requests activity in Python.
The input data is like this with hundreds of thousands rows a day:
TimeStamp Input Data
------------------------------------------
2020-01-01 00:00:01:231 ABC
2020-01-01 00:00:01:456 ABD
2020-01-01 00:00:01:789 XYZ
...
I first thought of converting each row as a cron-entry, so when each row is activated, it will trigger a request to the API and use the data entry as the payload.
However, this approach adds so much overhead of starting Python processes and the time distribution is whacked: within a second, it might start lots of processes, load the library etc.
Is there a way I can start a long running Python process to perfectly replay based on the time series data? (ideally be as accurate within a few milliseconds)
Almost like:
while True:
currenttime = datetime.now()
# find from table rows with currentime
# make web requests with those rows
And then this become synchronous and every loop requires a database lookup..
Solution 1:[1]
Perhaps you'd want to write your real-time playback routine to be something like this (pseudocode):
def playbackEventsInWindow(startTime, endTime):
timeDiff = datetime.timedelta(startTime, datetime.now()).total_seconds()
prevTime = startTime
while True:
nextEvent = GetFirstEventInListAfterSpecifiedTime(prevTime)
if nextEvent:
nextTime = nextEvent.getEventTimeStamp()
if (nextTime >= endTime):
return # we've reached the end of our window
sleepTimeSeconds = datetime.timedelta(datetime.now(), nextTime).total_seconds()+timeDiff
if (sleepTimeSeconds > 0.0):
time.sleep(sleepTimeSeconds)
executeWebRequestsForEvent(nextEvent)
prevTime = nextTime
else:
return # we've reached the end of the list
Note that a naive implementation of GetFirstEventInListAfterSpecifiedTime(timeStamp)
would simply start at the beginning of the events-list and then linearly scan down the list until it found an event with a timestamp greater than the specified argument, and return that event... but that implementation would quickly become very inefficient if the events-list is long. However, you could tweak it by having it store the index of the value it returned on the previous call, and start its linear-scan at that position rather than from the top of the list. That would allow it to return quickly (i.e. usually after just one step) in the common case (i.e. where the requested timestamps are steadily increasing).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |