'Decode base45 string

We are trying to implement the verification of the new EU corona virus test/vaccination certificates, but can't get the base45 decoding working.

Specification is here: https://datatracker.ietf.org/doc/draft-faltstrom-base45/

We nearly finished our class, but we sometimes get wrong values back..

Target is this:

Encoding example 1: The string "AB" is the byte sequence [65 66].
The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals 11 + 45 * 11
+ 45 * 45 * 8 so the sequence in base 45 is [11 11 8].  By looking up
these values in the Table 1 we get the encoded string "BB8".

Encoding example 2: The string "Hello!!" as ASCII is the byte
sequence [72 101 108 108 111 33 33].  If we look at each 16 bit
value, it is [18533 27756 28449 33].  Note the 33 for the last byte.
When looking at the values modulo 45, we get [[38 6 9] [36 31 13] [9
2 14] [33 0]] where the last byte is represented by two.  By looking
up these values in the Table 1 we get the encoded string "%69
VD92EX0".

Encoding example 3: The string "base-45" as ASCII is the byte
sequence [98 97 115 101 45 52 53].  If we look at each 16 bit value,
it is [25185 29541 11572 53].  Note the 53 for the last byte.  When
looking at the values modulo 45, we get [[30 19 12] [21 26 14] [7 32
5] [8 1]] where the last byte is represented by two.  By looking up
these values in the Table 1 we get the encoded string "UJCLQE7W581".

Here is my current code, which produces wrong values:

class Base45

  ALPHABET = {
    "00" => "0",
    "01" => "1",
    "02" => "2",
    "03" => "3",
    "04" => "4",
    "05" => "5",
    "06" => "6",
    "07" => "7",
    "08" => "8",
    "09" => "9",
    "10" => "A",
    "11" => "B",
    "12" => "C",
    "13" => "D",
    "14" => "E",
    "15" => "F",
    "16" => "G",
    "17" => "H",
    "18" => "I",
    "19" => "J",
    "20" => "K",
    "21" => "L",
    "22" => "M",
    "23" => "N",
    "24" => "O",
    "25" => "P",
    "26" => "Q",
    "27" => "R",
    "28" => "S",
    "29" => "T",
    "30" => "U",
    "31" => "V",
    "32" => "W",
    "33" => "X",
    "34" => "Y",
    "35" => "Z",
    "36" => " ",
    "37" => "$",
    "38" => "%",
    "39" => "*",
    "40" => "+",
    "41" => "-",
    "42" => ".",
    "43" => "/",
    "44" => ":"
  }.freeze

  def self.encode_base45(text)
    restsumme = text.unpack('S>*')

    # not sure what this is doing, but without it, it works worse :D
    restsumme << text.bytes[-1] if text.bytes.size > 2 && text.bytes[-1] < 256

    bytearr = restsumme.map do |bytes|
      arr = []
      multiplier, rest = bytes.divmod(45**2)
      arr << multiplier if multiplier > 0

      multiplier, rest = rest.divmod(45)
      arr << multiplier if multiplier > 0
      arr << rest if rest > 0
      arr.reverse
    end
    return bytearr.flatten.map{|a| ALPHABET[a.to_s.rjust(2, "0")]}.join
  end

  def self.decode_base45(text)
    arr = text.split("").map do |char|
      ALPHABET.invert[char]
    end
    textarr = arr.each_slice(3).to_a.map do |group|
      subarr = group.map.with_index do |val, index|
        val.to_i * (45**index)
      end
      ap subarr
      subarr.sum
    end

    return textarr.pack("S>*") # returns wrong values
  end
end

Results:

Base45.encode_base45("AB")
=> "BB8" # works
Base45.decode_base45("BB8")
=> "AB" # works

Base45.encode_base45("Hello!!")
=> "%69 VD92EX" # works
Base45.decode_base45("BB8")
=> "Hello!\x00!" # wrong \x00


Base45.encode_base45("base-45")
=> "UJCLQE7W581" # works
Base45.decode_base45("UJCLQE7W581")
=> "base-4\x005" # wrong \x00

Any hints appreciated :(



Solution 1:[1]

if you'd like a bodgy way of doing this:

return textarr.map{|x| x<256 ? [x].pack("C*") : [x].pack("n*") }.join

looking at this scheme, it feels like a weird way to encode, as we're working with numbers ... if it were me, I'd have started at the tail of the string and worked towards the head, but that's because we're using numbers.

anyway, the reason that my bodge works is that it treats small elements/numbers as 8-bit unsigned instead of 16-bit unsigned.

...

slightly more pleasing to the eye, but probably no better:

def self.decode_base45(text)
  arr = text.split("").map do |char|
    ALPHABET.invert[char]
  end
  textarr = arr.each_slice(3).to_a.map do |group|
    subarr = group.map.with_index do |val, index|
      val.to_i * (45**index)
    end
    ap subarr
    subarr.sum.divmod(256)
  end.flatten.reject(&:zero?)

  return textarr.pack("C*") # returns wrong values
end

Solution 2:[2]

After struggling to get the other answers to work, I made my own method based on your question and this snippet. Abovementioned answers worked in most cases but not in all of them, especially when the string length mod 3 = 2.

class Base45
  ALPHABET = {
    0 => "0",
    1 => "1",
    2 => "2",
    3 => "3",
    4 => "4",
    5 => "5",
    6 => "6",
    7 => "7",
    8 => "8",
    9 => "9",
    10 => "A",
    11 => "B",
    12 => "C",
    13 => "D",
    14 => "E",
    15 => "F",
    16 => "G",
    17 => "H",
    18 => "I",
    19 => "J",
    20 => "K",
    21 => "L",
    22 => "M",
    23 => "N",
    24 => "O",
    25 => "P",
    26 => "Q",
    27 => "R",
    28 => "S",
    29 => "T",
    30 => "U",
    31 => "V",
    32 => "W",
    33 => "X",
    34 => "Y",
    35 => "Z",
    36 => " ",
    37 => "$",
    38 => "%",
    39 => "*",
    40 => "+",
    41 => "-",
    42 => ".",
    43 => "/",
    44 => ":"
  }.freeze

  def self.decode_base45(text)
    raise ArgumentError, "invalid base45 string" if text.size % 3 == 1

    arr = text.split("").map do |char|
      ALPHABET.invert[char]
    end

    arr.each_slice(3).to_a.map do |group|
      if group.size == 3
        x = group[0] + group[1] * 45 + group[2] * 45 * 45
        raise ArgumentError, "invalid base45 string" if x > 0xFFFF
        x.divmod(256)
      else
        x = group[0] + group[1] * 45
        raise ArgumentError, "invalid base45 string" if x > 0xFF
        x
      end
    end.flatten.pack("C*")
  end
end

Solution 3:[3]

It might not be a proper solution to the problem here.

But adding textarr.pack("S>*").gsub(/\x00/, "") solved the problem for the given decoding examples. Also it's really weird that your encode version didn't work well for me (had wrong results in two first examples).

Anyway, this thread led me to contribute a bit by making this as a gem.

Solution 4:[4]

Let's take QED8WEX0 and QED8WEX00 for example, you got [[26, 14, 13], [8, 32, 14], [33, 0]] and [[26, 14, 13], [8, 32, 14], [33, 0, 0]] respectively.

The problem here is their final form are the same: [26981, 29798, 33] and you can't determine whether 33 represents for 1 bytes or 2 bytes.

We can solve it by passing dynamic argument based on input length to Array#pack, for example:

[26981, 29798, 33].pack((input.length % 3).zero? ? 'n*' : "n#{input.length / 3}C")

Here is my implementation:

# frozen_string_literal: true
class Base45Lite
  MAX_UINT18 = 2**16 - 1
  SQUARED_45 = 45**2
  MAPPING = [
    *'0'..'9',
    *'A'..'Z',
    ' ', '$', '%', '*', '+', '-', '.', '/', ':'
  ].map!.with_index { |x, i| [i, x] }.to_h.freeze
  REVERSE_MAPPING = MAPPING.invert.freeze

  def self.encode(input)
    sequence = []
    input.unpack('n*').map! do |uint16|
      i, c = uint16.divmod(45)
      i, d = i.divmod(45)
      _, e = i.divmod(45)
      sequence.push(c, d, e)
    end
    if input.bytesize.odd?
      i, c = input.getbyte(-1).divmod(45)
      _, d = i.divmod(45)
      sequence.push(c, d)
    end
    sequence.map!{ MAPPING[_1] }.join
  end

  def self.decode(input)
    input
      .chars.map! { REVERSE_MAPPING[_1] }
      .each_slice(3).map do |slice|
        c, d, e = slice

        sum = c + d * 45
        sum += e * SQUARED_45 if e

        sum
      end
      .pack((input.length % 3).zero? ? 'n*' : "n#{input.length / 3}C")
  end
end

if __FILE__ == $PROGRAM_NAME
  require 'minitest/autorun'

  class Base45Test < Minitest::Test
    parallelize_me!

    def test_encode
      assert_equal 'BB8', Base45Lite.encode('AB')
      assert_equal '%69 VD92EX0', Base45Lite.encode('Hello!!')
      assert_equal 'UJCLQE7W581', Base45Lite.encode('base-45')
    end

    def test_decode
      assert_equal 'ietf!', Base45Lite.decode('QED8WEX0')
      assert_equal 'AB', Base45Lite.decode('BB8')
      assert_equal 'Hello!!', Base45Lite.decode('%69 VD92EX0')
      assert_equal 'base-45', Base45Lite.decode('UJCLQE7W581')
    end
  end
end

You can find a more comprehensive implementation here https://gist.github.com/tonytonyjan/5eefdfbe7a79cd676e75c138466e921d

You can also install the Ruby gem https://rubygems.org/gems/base45_lite

Benchmark

Compare with https://github.com/wattswing/base45

require 'benchmark/ips'
require 'base45'
require_relative 'base45_lite'


message = 'base-45'
encoded = 'UJCLQE7W581'

Benchmark.ips do |x|
  x.report('Base45Lite.encode'){ |n| n.times { Base45Lite.encode(message) } }
  x.report('Base45.encode'){ |n| n.times { Base45.encode(message) } }
  x.compare!
end

Benchmark.ips do |x|
  x.report('Base45Lite.decode'){ |n| n.times { Base45Lite.decode(encoded) } }
  x.report('Base45.decode'){ |n| n.times { Base45.decode(encoded) } }
  x.compare!
end
Warming up --------------------------------------
   Base45Lite.encode    37.721k i/100ms
       Base45.encode    16.121k i/100ms
Calculating -------------------------------------
   Base45Lite.encode    348.105k (± 6.7%) i/s -      1.735M in   5.008303s
       Base45.encode    148.871k (± 7.1%) i/s -    741.566k in   5.008783s

Comparison:
   Base45Lite.encode:   348105.0 i/s
       Base45.encode:   148870.7 i/s - 2.34x  (± 0.00) slower

Warming up --------------------------------------
   Base45Lite.decode    25.909k i/100ms
       Base45.decode    16.972k i/100ms
Calculating -------------------------------------
   Base45Lite.decode    231.134k (± 6.9%) i/s -      1.166M in   5.068220s
       Base45.decode    148.288k (± 6.5%) i/s -    746.768k in   5.057694s

Comparison:
   Base45Lite.decode:   231134.5 i/s
       Base45.decode:   148287.8 i/s - 1.56x  (± 0.00) slower

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Draken
Solution 4