'Using Python Logging module to save logs to S3, how to capture all levels of logs for all modules?
It's my first time using the logging module in Python(3.7). My code uses imported modules that also have their own log statements. When I first added log statements to my code, I didn't use getLogger(). I just used logging.basicConfig(filename)
and called logger.debug()
directly to log statements. When I did this, all the logs from both my script and also all the imported modules was output to the same file together.
Now I need to convert my code to save logs to s3 instead of a file. I tried the solution mentioned in How Can I Write Logs Directly to AWS S3 from Memory Without First Writing to stdout? (Python, boto3) - Stack Overflow but I have two issues with it:
- None of the 'prefixes' are present in the output when I check on s3.
- Only INFO statements are showing up. I was under the impression that
logging.basicConfig(level=logging.INFO)
would make it would output all logs at or above level INFO, but I'm only seeing INFO. Also, only INFO logs get printed to stdout, when before all levels were. I don't know why the 'prefixes' are missing.
from psaw import PushshiftAPI
api = PushshiftAPI()
import time
import logging
import boto3
import io
import atexit
def write_logs(body, bucket, key):
s3 = boto3.client("s3")
s3.put_object(Body=body.getvalue(), Bucket=bucket, Key=key)
logging.basicConfig(level=logging.INFO)
log = logging.getLogger()
log_stringio = io.StringIO()
handler = logging.StreamHandler(log_stringio)
log.addHandler(handler)
def collectRange(sub,start,end):
atexit.register(write_logs, body=log_stringio, bucket="<...>", key=f'{sub}/log.txt')
s3 = boto3.resource('s3')
object = s3.Object('<...>', f'{sub}/{sub}@{start}-{end}.csv')
now = time.time()
logging.info(f'Start Time:{now}')
logging.debug('First request')
gen = api.search_comments(after=start, before=end,<...>, subreddit=sub)
r=next(gen)
<...>
quit()
Output:
Found credentials in shared credentials file: ~/.aws/credentials
Start Time:1591310443.7060978
https://api.pushshift.io/reddit/comment/search?<...>
https://api.pushshift.io/reddit/comment/search?<...>
Desired output:
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:root:Start Time:1591310443.7060978
DEBUG:root:First request
INFO:psaw.PushshiftAPI:https://api.pushshift.io/reddit/comment/search?<...>
DEBUG:psaw.PushshiftAPI:<whatever is usually here>
DEBUG:psaw.PushshiftAPI:<whatever is usually here>
INFO:psaw.PushshiftAPI:https://api.pushshift.io/reddit/comment/search?<...>
DEBUG:psaw.PushshiftAPI:<whatever is usually here>
DEBUG:psaw.PushshiftAPI:<whatever is usually here>
Any help is appreciated. Thanks.
Solution 1:[1]
You can at least add the level name (or time) by following this documentation:
Changing the format of displayed messages.
And to get DEBUG as well you need to use the following instead of INFO
logging.basicConfig(..., level=logging.DEBUG)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | SJGD |