'How to parse a JavaScript object from a HTML page I crawl?
I'm trying to index a food recipes page, and the actual recipe is stored as an object within a JavaScript in the page.
One example URL: http://www.dagbladet.no/mat/oppskrift/bakt-potet-med-romme-og-blamuggostdressing
If I open the developer tool in the browser and type:
console.dir(food.recipeItem.title)
I get the title back:
"Bakt potet med rømme- og blåmuggostdressing"
All nice and dandy, and just what I need. But how can I get ahold of that script and parse it within a Node.js application? Cheerio will maybe help me find the script, but not do much more than that? Or maybe it will? I'm not sure how to do it, and not what is the most computation-effective answer. Or most solid.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|