'Decode base45 string
We are trying to implement the verification of the new EU corona virus test/vaccination certificates, but can't get the base45 decoding working.
Specification is here: https://datatracker.ietf.org/doc/draft-faltstrom-base45/
We nearly finished our class, but we sometimes get wrong values back..
Target is this:
Encoding example 1: The string "AB" is the byte sequence [65 66].
The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals 11 + 45 * 11
+ 45 * 45 * 8 so the sequence in base 45 is [11 11 8]. By looking up
these values in the Table 1 we get the encoded string "BB8".
Encoding example 2: The string "Hello!!" as ASCII is the byte
sequence [72 101 108 108 111 33 33]. If we look at each 16 bit
value, it is [18533 27756 28449 33]. Note the 33 for the last byte.
When looking at the values modulo 45, we get [[38 6 9] [36 31 13] [9
2 14] [33 0]] where the last byte is represented by two. By looking
up these values in the Table 1 we get the encoded string "%69
VD92EX0".
Encoding example 3: The string "base-45" as ASCII is the byte
sequence [98 97 115 101 45 52 53]. If we look at each 16 bit value,
it is [25185 29541 11572 53]. Note the 53 for the last byte. When
looking at the values modulo 45, we get [[30 19 12] [21 26 14] [7 32
5] [8 1]] where the last byte is represented by two. By looking up
these values in the Table 1 we get the encoded string "UJCLQE7W581".
Here is my current code, which produces wrong values:
class Base45
ALPHABET = {
"00" => "0",
"01" => "1",
"02" => "2",
"03" => "3",
"04" => "4",
"05" => "5",
"06" => "6",
"07" => "7",
"08" => "8",
"09" => "9",
"10" => "A",
"11" => "B",
"12" => "C",
"13" => "D",
"14" => "E",
"15" => "F",
"16" => "G",
"17" => "H",
"18" => "I",
"19" => "J",
"20" => "K",
"21" => "L",
"22" => "M",
"23" => "N",
"24" => "O",
"25" => "P",
"26" => "Q",
"27" => "R",
"28" => "S",
"29" => "T",
"30" => "U",
"31" => "V",
"32" => "W",
"33" => "X",
"34" => "Y",
"35" => "Z",
"36" => " ",
"37" => "$",
"38" => "%",
"39" => "*",
"40" => "+",
"41" => "-",
"42" => ".",
"43" => "/",
"44" => ":"
}.freeze
def self.encode_base45(text)
restsumme = text.unpack('S>*')
# not sure what this is doing, but without it, it works worse :D
restsumme << text.bytes[-1] if text.bytes.size > 2 && text.bytes[-1] < 256
bytearr = restsumme.map do |bytes|
arr = []
multiplier, rest = bytes.divmod(45**2)
arr << multiplier if multiplier > 0
multiplier, rest = rest.divmod(45)
arr << multiplier if multiplier > 0
arr << rest if rest > 0
arr.reverse
end
return bytearr.flatten.map{|a| ALPHABET[a.to_s.rjust(2, "0")]}.join
end
def self.decode_base45(text)
arr = text.split("").map do |char|
ALPHABET.invert[char]
end
textarr = arr.each_slice(3).to_a.map do |group|
subarr = group.map.with_index do |val, index|
val.to_i * (45**index)
end
ap subarr
subarr.sum
end
return textarr.pack("S>*") # returns wrong values
end
end
Results:
Base45.encode_base45("AB")
=> "BB8" # works
Base45.decode_base45("BB8")
=> "AB" # works
Base45.encode_base45("Hello!!")
=> "%69 VD92EX" # works
Base45.decode_base45("BB8")
=> "Hello!\x00!" # wrong \x00
Base45.encode_base45("base-45")
=> "UJCLQE7W581" # works
Base45.decode_base45("UJCLQE7W581")
=> "base-4\x005" # wrong \x00
Any hints appreciated :(
Solution 1:[1]
if you'd like a bodgy way of doing this:
return textarr.map{|x| x<256 ? [x].pack("C*") : [x].pack("n*") }.join
looking at this scheme, it feels like a weird way to encode, as we're working with numbers ... if it were me, I'd have started at the tail of the string and worked towards the head, but that's because we're using numbers.
anyway, the reason that my bodge works is that it treats small elements/numbers as 8-bit unsigned instead of 16-bit unsigned.
...
slightly more pleasing to the eye, but probably no better:
def self.decode_base45(text)
arr = text.split("").map do |char|
ALPHABET.invert[char]
end
textarr = arr.each_slice(3).to_a.map do |group|
subarr = group.map.with_index do |val, index|
val.to_i * (45**index)
end
ap subarr
subarr.sum.divmod(256)
end.flatten.reject(&:zero?)
return textarr.pack("C*") # returns wrong values
end
Solution 2:[2]
After struggling to get the other answers to work, I made my own method based on your question and this snippet. Abovementioned answers worked in most cases but not in all of them, especially when the string length mod 3 = 2.
class Base45
ALPHABET = {
0 => "0",
1 => "1",
2 => "2",
3 => "3",
4 => "4",
5 => "5",
6 => "6",
7 => "7",
8 => "8",
9 => "9",
10 => "A",
11 => "B",
12 => "C",
13 => "D",
14 => "E",
15 => "F",
16 => "G",
17 => "H",
18 => "I",
19 => "J",
20 => "K",
21 => "L",
22 => "M",
23 => "N",
24 => "O",
25 => "P",
26 => "Q",
27 => "R",
28 => "S",
29 => "T",
30 => "U",
31 => "V",
32 => "W",
33 => "X",
34 => "Y",
35 => "Z",
36 => " ",
37 => "$",
38 => "%",
39 => "*",
40 => "+",
41 => "-",
42 => ".",
43 => "/",
44 => ":"
}.freeze
def self.decode_base45(text)
raise ArgumentError, "invalid base45 string" if text.size % 3 == 1
arr = text.split("").map do |char|
ALPHABET.invert[char]
end
arr.each_slice(3).to_a.map do |group|
if group.size == 3
x = group[0] + group[1] * 45 + group[2] * 45 * 45
raise ArgumentError, "invalid base45 string" if x > 0xFFFF
x.divmod(256)
else
x = group[0] + group[1] * 45
raise ArgumentError, "invalid base45 string" if x > 0xFF
x
end
end.flatten.pack("C*")
end
end
Solution 3:[3]
It might not be a proper solution to the problem here.
But adding textarr.pack("S>*").gsub(/\x00/, "")
solved the problem for the given decoding examples.
Also it's really weird that your encode
version didn't work well for me (had wrong results in two first examples).
Anyway, this thread led me to contribute a bit by making this as a gem.
Solution 4:[4]
Let's take QED8WEX0
and QED8WEX00
for example, you got [[26, 14, 13], [8, 32, 14], [33, 0]]
and [[26, 14, 13], [8, 32, 14], [33, 0, 0]]
respectively.
The problem here is their final form are the same: [26981, 29798, 33]
and you can't determine whether 33
represents for 1 bytes or 2 bytes.
We can solve it by passing dynamic argument based on input length to Array#pack
, for example:
[26981, 29798, 33].pack((input.length % 3).zero? ? 'n*' : "n#{input.length / 3}C")
Here is my implementation:
# frozen_string_literal: true
class Base45Lite
MAX_UINT18 = 2**16 - 1
SQUARED_45 = 45**2
MAPPING = [
*'0'..'9',
*'A'..'Z',
' ', '$', '%', '*', '+', '-', '.', '/', ':'
].map!.with_index { |x, i| [i, x] }.to_h.freeze
REVERSE_MAPPING = MAPPING.invert.freeze
def self.encode(input)
sequence = []
input.unpack('n*').map! do |uint16|
i, c = uint16.divmod(45)
i, d = i.divmod(45)
_, e = i.divmod(45)
sequence.push(c, d, e)
end
if input.bytesize.odd?
i, c = input.getbyte(-1).divmod(45)
_, d = i.divmod(45)
sequence.push(c, d)
end
sequence.map!{ MAPPING[_1] }.join
end
def self.decode(input)
input
.chars.map! { REVERSE_MAPPING[_1] }
.each_slice(3).map do |slice|
c, d, e = slice
sum = c + d * 45
sum += e * SQUARED_45 if e
sum
end
.pack((input.length % 3).zero? ? 'n*' : "n#{input.length / 3}C")
end
end
if __FILE__ == $PROGRAM_NAME
require 'minitest/autorun'
class Base45Test < Minitest::Test
parallelize_me!
def test_encode
assert_equal 'BB8', Base45Lite.encode('AB')
assert_equal '%69 VD92EX0', Base45Lite.encode('Hello!!')
assert_equal 'UJCLQE7W581', Base45Lite.encode('base-45')
end
def test_decode
assert_equal 'ietf!', Base45Lite.decode('QED8WEX0')
assert_equal 'AB', Base45Lite.decode('BB8')
assert_equal 'Hello!!', Base45Lite.decode('%69 VD92EX0')
assert_equal 'base-45', Base45Lite.decode('UJCLQE7W581')
end
end
end
You can find a more comprehensive implementation here https://gist.github.com/tonytonyjan/5eefdfbe7a79cd676e75c138466e921d
You can also install the Ruby gem https://rubygems.org/gems/base45_lite
Benchmark
Compare with https://github.com/wattswing/base45
require 'benchmark/ips'
require 'base45'
require_relative 'base45_lite'
message = 'base-45'
encoded = 'UJCLQE7W581'
Benchmark.ips do |x|
x.report('Base45Lite.encode'){ |n| n.times { Base45Lite.encode(message) } }
x.report('Base45.encode'){ |n| n.times { Base45.encode(message) } }
x.compare!
end
Benchmark.ips do |x|
x.report('Base45Lite.decode'){ |n| n.times { Base45Lite.decode(encoded) } }
x.report('Base45.decode'){ |n| n.times { Base45.decode(encoded) } }
x.compare!
end
Warming up --------------------------------------
Base45Lite.encode 37.721k i/100ms
Base45.encode 16.121k i/100ms
Calculating -------------------------------------
Base45Lite.encode 348.105k (± 6.7%) i/s - 1.735M in 5.008303s
Base45.encode 148.871k (± 7.1%) i/s - 741.566k in 5.008783s
Comparison:
Base45Lite.encode: 348105.0 i/s
Base45.encode: 148870.7 i/s - 2.34x (± 0.00) slower
Warming up --------------------------------------
Base45Lite.decode 25.909k i/100ms
Base45.decode 16.972k i/100ms
Calculating -------------------------------------
Base45Lite.decode 231.134k (± 6.9%) i/s - 1.166M in 5.068220s
Base45.decode 148.288k (± 6.5%) i/s - 746.768k in 5.057694s
Comparison:
Base45Lite.decode: 231134.5 i/s
Base45.decode: 148287.8 i/s - 1.56x (± 0.00) slower
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Draken |
Solution 4 |