'Ruby -- trying to grab <title>this here</title> even if on multiple lines

Currently, I am grabbing titles using the following method:

title = html_response[/<title[^>]*>(.*?)<\/title>/,1]

This does a great job at catching "This is a title" from <title>This is a title</title>. However, there are some web pages that open the title tag on one line, print the title on the next line, and then close the title tag.

The Ruby line I presented above doesn't catch titles such as those, so I'm just trying to find a fix for that.



Solution 1:[1]

Obligatory don't use regex with HTML sentence.

title = html_response[/<title[^>]*>(.*?)<\/title>/m,1]

The m enables multiline mode.

Solution 2:[2]

This famous stackoverflow post explains why it's a bad idea to use regular expressions to parse HTML. A better approach is to use a gem like Nokogiri to parse out the title tags.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cfeduke
Solution 2 Community