'Ruby -- trying to grab <title>this here</title> even if on multiple lines
Currently, I am grabbing titles using the following method:
title = html_response[/<title[^>]*>(.*?)<\/title>/,1]
This does a great job at catching "This is a title" from <title>This is a title</title>
. However, there are some web pages that open the title tag on one line, print the title on the next line, and then close the title tag.
The Ruby line I presented above doesn't catch titles such as those, so I'm just trying to find a fix for that.
Solution 1:[1]
Obligatory don't use regex with HTML sentence.
title = html_response[/<title[^>]*>(.*?)<\/title>/m,1]
The m
enables multiline mode.
Solution 2:[2]
This famous stackoverflow post explains why it's a bad idea to use regular expressions to parse HTML. A better approach is to use a gem like Nokogiri to parse out the title tags.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | cfeduke |
Solution 2 | Community |