'How can I use Google Apps Script's UrlFetchApp to scrape a New Google Sites page?
My group recently migrated from classic to New Google Sites. There were many things I was doing with classic sites and/or apps script's Sites Service I can't do anymore since new sites and apps script aren't integrated.
I would like the ability to scrape content from our internal sites pages using UrlFetchApp; however, running the code below (as a user with access to the sites page I'd like to scrape), returns the Google sign-in page, not the page's content.
Is it possible to scrape the group's internal Google Site using UrlFetchApp?
function myFunction() {
var txt = UrlFetchApp.fetch("https://sites.google.com/a/domain.com/home").getContentText();
Logger.log(txt);
}
Which returns....
Logging output too large. Truncating output.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta content="width=300, initial-scale=1" name="viewport">
<meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074-BPEVmcpBxF6Gwf0MSgQXZs">
<title>Sign in - Google Accounts</title>
<style>
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: 300;
src: url(//fonts.gstatic.com/s/opensans/v15/mem5YaGs126MiZpBA-UN_r8OUuhs.ttf) format('truetype');
}
Solution 1:[1]
I appreciate your question is quite old now, but just to confirm, no you can't scrape Google Sites using Apps Script. It's infuriating that Google still hasn't implemented an API for Sites.
In your example code, you're not passing in any authorisation so the UrlFetchApp
call is effectively 'anonymous', which wouldn't necessarily work in any case. However, even if you pass in a valid authorisation header, it won't work. For example:
let url = "https://sites.google.com/YOUR_SITE";
let opts = {
method: "get",
headers: { Authorization: `Bearer ${ScriptApp.getOAuthToken()}`}
};
let resp = UrlFetchApp.fetch(url, opts);
console.log(resp.getContentText());
It's disappointing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ben |