'How to parse over different elements in a HTML document (in Dart/Flutter) and keep the order intact
- I have a large HTML document containing important information of different types in sequence.
- I'm parsing in Dart/Flutter
- Obtaining the raw information is fine
- My problem is that parsing for Elements of the different types/names (Image, text, headings etc) will lose the order in which the elements are displayed in relation to each other in the document.
Eg. A heading, then an image, then some text, then another image, then some text.
I really need the equivalent to this: html.getElementsByTagName('title' or 'p' or 'whatever-else-I-need'). Then I can process in the loop and output my model in a properly sequenced list.
Parsing sequence-critical information of different element tags / data types must be a common occurrence. Much appreciated.
Solution 1:[1]
I'm not an expert on package:html
(nor with HTML and CSS in general), but I think that you can use Document.querySelectorAll
with an appropriate selector string:
import 'package:html/parser.dart' as html;
void main() {
var htmlStr = r'''
<html>
<head>
<title>My title</title>
</head>
<body>
<p>Lorem ipsum</p>
<img src="foo.png">
</body>
</html>
''';
var document = html.parse(htmlStr);
var elements = document.querySelectorAll('title,p,img');
elements.forEach(print);
// Prints:
// <html title>
// <html p>
// <html img>
}
If a selector doesn't do what you want, you could write a function that walks the tree:
import 'package:html/dom.dart' as dom;
/// Walks [document] and invokes [elementCallback] on each element using a preorder
/// traversal.
///
/// [elementCallback] should return true to continue walking the tree, false to
/// abort.
void walk(dom.Document document, bool Function(dom.Element) elementCallback) {
var stack = <dom.Element>[];
stack.addAll(document.children.reversed);
while (stack.isNotEmpty) {
var element = stack.removeLast();
if (!elementCallback(element)) {
break;
}
stack.addAll(element.children.reversed);
}
}
and then you could run walk
with an appropriate callback that conditionally adds each Element
to some List
, e.g.:
var elements = <dom.Element>[];
var wantedTags = {'title', 'p', 'img'};
walk(document, (element) {
if (wantedTags.contains(element.localName)) {
elements.add(element);
}
return true;
});
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |