'How to parse over different elements in a HTML document (in Dart/Flutter) and keep the order intact

  • I have a large HTML document containing important information of different types in sequence.
  • I'm parsing in Dart/Flutter
  • Obtaining the raw information is fine
  • My problem is that parsing for Elements of the different types/names (Image, text, headings etc) will lose the order in which the elements are displayed in relation to each other in the document.

Eg. A heading, then an image, then some text, then another image, then some text.

I really need the equivalent to this: html.getElementsByTagName('title' or 'p' or 'whatever-else-I-need'). Then I can process in the loop and output my model in a properly sequenced list.

Parsing sequence-critical information of different element tags / data types must be a common occurrence. Much appreciated.



Solution 1:[1]

I'm not an expert on package:html (nor with HTML and CSS in general), but I think that you can use Document.querySelectorAll with an appropriate selector string:

import 'package:html/parser.dart' as html;

void main() {
  var htmlStr = r'''
<html>
<head>
<title>My title</title>
</head>
<body>
<p>Lorem ipsum</p>
<img src="foo.png">
</body>
</html>  
''';
  var document = html.parse(htmlStr);
  var elements = document.querySelectorAll('title,p,img');
  elements.forEach(print);
  // Prints: 
  // <html title>
  // <html p>
  // <html img>
}

If a selector doesn't do what you want, you could write a function that walks the tree:

import 'package:html/dom.dart' as dom;

/// Walks [document] and invokes [elementCallback] on each element using a preorder
/// traversal.
///
/// [elementCallback] should return true to continue walking the tree, false to
/// abort.
void walk(dom.Document document, bool Function(dom.Element) elementCallback) {
  var stack = <dom.Element>[];
  stack.addAll(document.children.reversed);
  while (stack.isNotEmpty) {
    var element = stack.removeLast();
    if (!elementCallback(element)) {
      break;
    }
    stack.addAll(element.children.reversed);
  }
}

and then you could run walk with an appropriate callback that conditionally adds each Element to some List, e.g.:

var elements = <dom.Element>[];
var wantedTags = {'title', 'p', 'img'};
walk(document, (element) {
  if (wantedTags.contains(element.localName)) {
    elements.add(element);
  }
  return true;
});

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1