'Parse quoted-printable encoding content from .mht file

I am trying to get all the images from .mht file by using Nokogiri gem. But since the .mht file has quoted-printable encoding, all the images that I received, has weird characters in it:

<img alt='3D"AFC-Logo' src="3D%22https://upload.=" width='3D"75"' height='3D"75"'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/wikimedia-butto=" width='3D"88"' height='3D"31"' alt='3D"Wikimedia'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/poweredby_mediawiki_8=" alt='3D"Powered' width='3D"88"' height='3D"31"'>

This is the link to that .mht file: https://drive.google.com/file/d/1DtbgrFyCEcggAk1nqpZSluNhRt-k3t95/view?usp=sharing

And below is the code that I am using to get all the images from the .mht file:

html = File.open("1646037951.mht").read
image_links = get_image_links(html)

def get_image_links(html)
  html_doc = Nokogiri::HTML(html)
  nodes = html_doc.xpath("//img[@src]")
  raise "No <img .../> tags!" if nodes.empty?
  nodes.inject([]) do |uris, node|
     puts node.to_s
     uris << node.attr('src').strip
  end.uniq
end

I have tried to parse it by using .unpack('M').first but it's still not working as it just returns the same result as above.

Or maybe Rails have something for this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source