'Stringbuffer access by index or range
TLDR: I am looking for a datastructure with the properties of a string buffer that I can access by index in dart.
Long version: I need the properties of Stringbuffer (very efficient at adding characters/bytes) but with the added capability of accessing characters/bytes at specific indices or ranges. I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.
Fyi I am trying to build a piece table in dart. I do not think that simply calling stringbuffer.toString().substring(start, end)
is an option, as this probably copies the whole string, which is the opposite of the efficiency I want.
Solution 1:[1]
To answer my own question: After playing around with the typed library and some custom data structures for a bit, it seems that a simple list of Strings is actually the fastest, assuming that I need half as many insertions as I need random read access.
Of course still open to any suggestions from anyone who knows more about dart, microbenchmarking, etc.
class IndexableStringBuffer {
final List<String> _characters = List.empty(growable: true);
int get length => _characters.length;
void writeAll(String string) {
_characters.addAll(string.characters);
}
String getRange(int start, int? end) {
return _characters.sublist(start, end).join();
}
}
Solution 2:[2]
I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.
I would not expect that to be something that many people need to do, so I am not surprised.
Here is a quick implementation to get you started. The basic principle is to keep track of starting (and/or ending) indices of each string component. When doing lookups, do a binary-search on those indices to find the relevant components. lowerBound
from package:collection
is useful for this.
This implementation operates in terms of UTF-16 code units (which is what Dart's [String
] class uses), but it should not be too hard to make it use Runes
/code-points instead if you want.
import 'package:collection/collection.dart';
/// Part of a [IndexableStringBuffer].
class _Part implements Comparable<_Part> {
const _Part(this.startIndexCodeUnits, [this.string = '']);
final String string;
final int startIndexCodeUnits;
int get endIndexCodeUnits => startIndexCodeUnits + length;
int get length => string.length;
@override
String toString() => string;
@override
int compareTo(_Part other) =>
endIndexCodeUnits.compareTo(other.endIndexCodeUnits);
}
/// Similar to [StringBuffer] but allows extracting a substring.
class IndexableStringBuffer {
final _parts = <_Part>[];
int _totalLengthCodeUnits = 0;
IndexableStringBuffer();
int get totalLengthCodeUnits => _totalLengthCodeUnits;
/// Adds a new [String].
void add(String string) {
_parts.add(_Part(_totalLengthCodeUnits, string));
_totalLengthCodeUnits += string.length;
}
/// Returns a substring in the range \[start, end), where [start] and
/// [end] are indices in UTF-16 code units.
String substring(int start, int end) {
assert(start >= 0);
assert(start <= totalLengthCodeUnits);
assert(end >= 0);
assert(end <= totalLengthCodeUnits);
// Inclusive start and end indices of the relevant substrings from [_parts].
var startPartIndex = lowerBound<_Part>(_parts, _Part(start));
var endPartIndex = lowerBound<_Part>(_parts, _Part(end));
if (startPartIndex == _parts.length) {
return '';
}
var startPart = _parts[startPartIndex];
var startOffset = start - startPart.startIndexCodeUnits;
assert(startOffset >= 0);
var substring = _parts.getRange(startPartIndex, endPartIndex + 1).join();
return substring.substring(startOffset, startOffset + end - start);
}
}
// And some quick tests to kick the tires:
void main() {
var buffer = IndexableStringBuffer();
print(buffer.substring(0, 0));
buffer
..add('hello')
..add('world')
..add('goodbye')
..add('friend');
print(buffer.substring(0, 3)); // hel
print(buffer.substring(1, 4)); // ell
print(buffer.substring(1, 5)); // ello
print(buffer.substring(5, 7)); // wo
print(buffer.substring(0, 10)); // helloworld
print(buffer.substring(3, 9)); // loworl
print(buffer.substring(10, 12)); // go
print(buffer.substring(9, 12)); // dgo
print(buffer.substring(0, 23)); // helloworldgoodbyefriend
print(buffer.substring(0, 0)); //
print(buffer.substring(23, 23)); //
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | paul maier |
Solution 2 | jamesdlin |