'Parsing google alerts using gmail and app script
I have google alerts set up and receive alert emails which look like this:
I would like to parse the alert and save it in a google sheet with the format: alert_name,publication_date, title,news source
Not sure where to start any help/guidance appreciated Thanks
Solution 1:[1]
Although I share the opinion that @Ruben has given in the comments, I think this topic is interesting and could help other users to save and manage their Google Alerts.
Code.gs
/* Retrieve all news from googlealert source */
const threads = GmailApp.search('from:([email protected])')
/* Our sheet for save the news */
const sS = SpreadsheetApp.openById(SS_ID).getSheetByName('Google Alerts')
/* Control the already added answers */
let addedNews = []
try { addedNews = sS.getRange('A1:A' + sS.getLastRow()).getValues().flat() } catch (err) { }
function parseContent() {
const totalNews = []
/* Maybe add a control system for remove the threads already saved */
threads.forEach((th) => {
const msgs = th.getMessages()
msgs.forEach((msg) => {
/* Divide the content in new lines an parse the content */
const body = msg.getPlainBody().split('\n')
/* Extract the filter name eg === News - 2 new results for [python] === */
const filterName = body.slice(0, 1)[0].match(/\[(.*?)\]/)[1]
const date = msg.getDate()
/* Remove the unnecessary lines */
const cleanedBody = body.slice(1, -11)
/* Detect the news via empty new lines "\r" */
const newsIdxs = cleanedBody.reduce((pre, curr, idx) => {
curr === "\r" && pre.push(idx)
return pre
}, [])
newsIdxs.forEach((v, idx, arr) => {
if (idx == arr.length - 1) return
/* From one empty line to the nex */
const parsedNew = cleanedBody.slice(v + 1, arr[idx + 1])
/* Simply extracted from the first line */
const title = parsedNew[0].split('|')[0]
/* Last line and between <> */
const url = parsedNew[parsedNew.length - 1].match(/<(.*?)>/)[1]
/* Extracted from the URL rather than the title due variability */
const source = url.match(/url=https:\/\/(.*?)\//)[1]
totalNews.push({ title, url, date, source, filterName })
})
})
})
totalNews.forEach((nw) => {
/* Hash the object for preventing adding already present */
const id = hashCode(Object.values(nw).toString())
if (addedNews.includes(id)) return
sS.appendRow([id, ...Object.values(nw)])
})
}
/* Extracted from here https://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript */
const hashCode = s => s.split('').reduce((a, b) => { a = ((a << 5) - a) + b.charCodeAt(0); return a & a }, 0)
Results
Note 1: This script is an approximation of the problem, and has only been tested for News related alerts.
Note 2: Thanks to @DalmTo for the pseudo-code, it has helped me to approach the problem more quickly.
Note 3: The hashCode
function has been extracted from here
Note 4: I have decided to take an approach using RegExp due to the use of getPlainBody()
, but I think that in this case, using a library that allows parsing HTML with getBody()
would be easier to implement.
Solution 2:[2]
You can use the gmail api with appscript the way GmailApp works is very similar to how the gmail api works.
The first thing you should do is create a search in the gmail web app which would return just the messages you are looking for something like this
var threads = GmailApp.search('from:([email protected])');
for(var i=0; i<threads.length; i++)
{
messages = threads[i].getMessages();
for(var j=0; j<messages.length; j++)
{
var date = messages[j].getDate();
var body = messages[j].getPlainBody();
var start = body.indexOf('<table');
var end = body.indexOf('</table>');
}
}
}
}
}
The body comes back in html format so your going to have to do some cleaning on it to find the text you want. Once you find the text you want you can just write it out to a sheet using SpreadsheetApp
function WriteToSheet(date, value){
var sheet = SpreadsheetApp.getActiveSheet();
sheet.appendRow([date, value]);
}
This code is from a script i use to scan my one of my own emails for some text.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Emel |
Solution 2 | DaImTo |