'Problems while plotting time series against user logins?

I have a large pandas dataframe, which is a log of user ids that login in a website:

  id        datetime
  130    2018-05-17 19:46:18
  133    2018-05-17 20:59:57
  133    2018-05-17 21:54:01
  142    2018-05-17 22:49:07
  114    2018-05-17 23:02:34
  136    2018-05-18 06:06:48
  136    2018-05-18 12:21:38
  180    2018-05-18 12:49:33
           .......

  120    2018-05-18 14:03:58
  120    2018-05-18 15:28:36

How can I visualize the above pandas dataframe as a time series plot? For example I would like to represent the frequency of logins of each person id as a line of a different color (note that I have about 400 ids). Something like this plot (*):

[image output]

I tried to:

from datetime import date
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd

# set your data as df
# strip only YYYY-mm-dd part from original `datetime` column
df3.timestamp = df3.datetime.apply(lambda x: str(x)[:10])
df3.timestamp = df3.datetime.apply(lambda x: date(int(x[:4]), int(x[5:7]), int(x[8:10])))

# plot
plt.figure(figsize=(150,10))
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
plt.plot(df3.datetime[:800], df3.id[:800], '-')
plt.gcf().autofmt_xdate()

and

import matplotlib.dates as dates

df5 = df3.set_index('datetime')
df5.plot(x_compat=True)
plt.gca().xaxis.set_major_locator(dates.DayLocator())

plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.gcf().autofmt_xdate(rotation=0, ha="center")
plt.figure(figsize=(150,10))

However, I got something like this:

[image1]

Any idea of how to get a plot similar to (*)?



Solution 1:[1]

I've played with your sample data a little so that one user logs in on three days. The problem in your attempt is that you are trying to "just plot" the logins. If you want to see the frequency of logins, you have to calculate that. So I read the data and use a proper DateTime index, then use groupby followed by resample to calculate the frequencies. I think with 400 users this might become a bit messy, but this will do a plot of the daily logins per user.

import pandas
import io

d = """id,datetime
130,2018-05-17T19:46:18
133,2018-05-17T20:59:57
133,2018-05-17T21:54:01
142,2018-05-17T22:49:07
114,2018-05-17T23:02:34
136,2018-05-18T06:06:48
136,2018-05-18T12:21:38
130,2018-05-18T12:49:33
120,2018-05-18T14:03:58
130,2018-05-19T15:28:36"""

# for the data aboce, this is a quick way to parse it
df = pandas.read_csv(io.StringIO(d), parse_dates=['datetime'], index_col='datetime')

# This method is more roundabout but is perhaps useful if you have other data
df = pandas.read_csv(io.StringIO(d))
df.datetime = pandas.to_datetime(df.datetime)
df = df.set_index('datetime')

# Plot daily logins per user id
r = df.groupby('id').resample('D').apply(len).unstack('id').plot()

Sample plot

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1