'Stringbuffer access by index or range

TLDR: I am looking for a datastructure with the properties of a string buffer that I can access by index in dart.

Long version: I need the properties of Stringbuffer (very efficient at adding characters/bytes) but with the added capability of accessing characters/bytes at specific indices or ranges. I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.

Fyi I am trying to build a piece table in dart. I do not think that simply calling stringbuffer.toString().substring(start, end) is an option, as this probably copies the whole string, which is the opposite of the efficiency I want.



Solution 1:[1]

To answer my own question: After playing around with the typed library and some custom data structures for a bit, it seems that a simple list of Strings is actually the fastest, assuming that I need half as many insertions as I need random read access.

Of course still open to any suggestions from anyone who knows more about dart, microbenchmarking, etc.

class IndexableStringBuffer {
  final List<String> _characters = List.empty(growable: true);

  int get length => _characters.length;

  void writeAll(String string) {
    _characters.addAll(string.characters);
  }

  String getRange(int start, int? end) {
    return _characters.sublist(start, end).join();
  }
}

Solution 2:[2]

I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.

I would not expect that to be something that many people need to do, so I am not surprised.

Here is a quick implementation to get you started. The basic principle is to keep track of starting (and/or ending) indices of each string component. When doing lookups, do a binary-search on those indices to find the relevant components. lowerBound from package:collection is useful for this.

This implementation operates in terms of UTF-16 code units (which is what Dart's [String] class uses), but it should not be too hard to make it use Runes/code-points instead if you want.

import 'package:collection/collection.dart';

/// Part of a [IndexableStringBuffer].
class _Part implements Comparable<_Part> {
  const _Part(this.startIndexCodeUnits, [this.string = '']);

  final String string;

  final int startIndexCodeUnits;

  int get endIndexCodeUnits => startIndexCodeUnits + length;
  int get length => string.length;

  @override
  String toString() => string;

  @override
  int compareTo(_Part other) =>
      endIndexCodeUnits.compareTo(other.endIndexCodeUnits);
}

/// Similar to [StringBuffer] but allows extracting a substring.
class IndexableStringBuffer {
  final _parts = <_Part>[];
  int _totalLengthCodeUnits = 0;

  IndexableStringBuffer();

  int get totalLengthCodeUnits => _totalLengthCodeUnits;

  /// Adds a new [String].
  void add(String string) {
    _parts.add(_Part(_totalLengthCodeUnits, string));
    _totalLengthCodeUnits += string.length;
  }

  /// Returns a substring in the range \[start, end), where [start] and
  /// [end] are indices in UTF-16 code units.
  String substring(int start, int end) {
    assert(start >= 0);
    assert(start <= totalLengthCodeUnits);
    assert(end >= 0);
    assert(end <= totalLengthCodeUnits);

    // Inclusive start and end indices of the relevant substrings from [_parts].
    var startPartIndex = lowerBound<_Part>(_parts, _Part(start));
    var endPartIndex = lowerBound<_Part>(_parts, _Part(end));

    if (startPartIndex == _parts.length) {
      return '';
    }

    var startPart = _parts[startPartIndex];
    var startOffset = start - startPart.startIndexCodeUnits;
    assert(startOffset >= 0);

    var substring = _parts.getRange(startPartIndex, endPartIndex + 1).join();
    return substring.substring(startOffset, startOffset + end - start);
  }
}

// And some quick tests to kick the tires:
void main() {
  var buffer = IndexableStringBuffer();
  print(buffer.substring(0, 0));
  
  buffer
    ..add('hello')
    ..add('world')
    ..add('goodbye')
    ..add('friend');

  print(buffer.substring(0, 3)); // hel
  print(buffer.substring(1, 4)); // ell
  print(buffer.substring(1, 5)); // ello
  print(buffer.substring(5, 7)); // wo
  print(buffer.substring(0, 10)); // helloworld
  print(buffer.substring(3, 9)); // loworl
  print(buffer.substring(10, 12)); // go
  print(buffer.substring(9, 12)); // dgo
  print(buffer.substring(0, 23)); // helloworldgoodbyefriend
  print(buffer.substring(0, 0)); //
  print(buffer.substring(23, 23)); //
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 paul maier
Solution 2 jamesdlin