'python string split by separator all possible permutations
This might be heavily related to similar questions as Python 3.3: Split string and create all combinations , but I can't infer a pythonic solution out of this.
Question is:
Let there be a str such as 'hi|guys|whats|app'
, and I need all permutations of splitting that str by a separator. Example:
#splitting only once
['hi','guys|whats|app']
['hi|guys','whats|app']
['hi|guys|whats','app']
#splitting only twice
['hi','guys','whats|app']
['hi','guys|whats','app']
#splitting only three times
...
etc
I could write a backtracking algorithm, but does python (itertools, e.g.) offer a library that simplifies this algorithm?
Thanks in advance!!
Solution 1:[1]
An approach, once you have split the string is to use itertools.combinations
to define the split points in the list, the other positions should be fused again.
def lst_merge(lst, positions, sep='|'):
'''merges a list on points other than positions'''
'''A, B, C, D and 0, 1 -> A, B, C|D'''
a = -1
out = []
for b in list(positions)+[len(lst)-1]:
out.append('|'.join(lst[a+1:b+1]))
a = b
return out
def split_comb(s, split=1, sep='|'):
from itertools import combinations
l = s.split(sep)
return [lst_merge(l, pos, sep=sep)
for pos in combinations(range(len(l)-1), split)]
examples
>>> split_comb('hi|guys|whats|app', 0)
[['hi|guys|whats|app']]
>>> split_comb('hi|guys|whats|app', 1)
[['hi', 'guys|whats|app'],
['hi|guys', 'whats|app'],
['hi|guys|whats', 'app']]
>>> split_comb('hi|guys|whats|app', 2)
[['hi', 'guys', 'whats|app'],
['hi', 'guys|whats', 'app'],
['hi|guys', 'whats', 'app']]
>>> split_comb('hi|guys|whats|app', 3)
[['hi', 'guys', 'whats', 'app']]
>>> split_comb('hi|guys|whats|app', 4)
[] ## impossible
rationale
ABCD -> A B C D
0 1 2
combinations of split points: 0/1 or 0/2 or 1/2
0/1 -> merge on 2 -> A B CD
0/2 -> merge on 1 -> A BC D
1/2 -> merge on 0 -> AB C D
generic function
Here is a generic version, working like above but also taking -1
as parameter for split
, in which case it will output all combinations
def lst_merge(lst, positions, sep='|'):
a = -1
out = []
for b in list(positions)+[len(lst)-1]:
out.append('|'.join(lst[a+1:b+1]))
a = b
return out
def split_comb(s, split=1, sep='|'):
from itertools import combinations, chain
l = s.split(sep)
if split == -1:
pos = chain.from_iterable(combinations(range(len(l)-1), r)
for r in range(len(l)+1))
else:
pos = combinations(range(len(l)-1), split)
return [lst_merge(l, pos, sep=sep)
for pos in pos]
example:
>>> split_comb('hi|guys|whats|app', -1)
[['hi|guys|whats|app'],
['hi', 'guys|whats|app'],
['hi|guys', 'whats|app'],
['hi|guys|whats', 'app'],
['hi', 'guys', 'whats|app'],
['hi', 'guys|whats', 'app'],
['hi|guys', 'whats', 'app'],
['hi', 'guys', 'whats', 'app']]
Solution 2:[2]
Here is a recursive function I came up with:
def splitperms(string, i=0):
if len(string) == i:
return [[string]]
elif string[i] == "|":
return [*[[string[:i]] + split for split in splitperms(string[i + 1:])], *splitperms(string, i + 1)]
else:
return splitperms(string, i + 1)
Output:
>>> splitperms('hi|guys|whats|app')
[['hi', 'guys', 'whats', 'app'], ['hi', 'guys', 'whats|app'], ['hi', 'guys|whats', 'app'], ['hi', 'guys|whats|app'], ['hi|guys', 'whats', 'app'], ['hi|guys', 'whats|app'], ['hi|guys|whats', 'app'], ['hi|guys|whats|app']]
>>>
Solution 3:[3]
One approach using combinations
and chain
from itertools import combinations, chain
def partition(alist, indices):
# https://stackoverflow.com/a/1198876/4001592
pairs = zip(chain([0], indices), chain(indices, [None]))
return (alist[i:j] for i, j in pairs)
s = 'hi|guys|whats|app'
delimiter_count = s.count("|")
splits = s.split("|")
for i in range(1, delimiter_count + 1):
print("split", i)
for combination in combinations(range(1, delimiter_count + 1), i):
res = ["|".join(part) for part in partition(splits, combination)]
print(res)
Output
split 1
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
split 2
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
split 3
['hi', 'guys', 'whats', 'app']
The idea is to generate all the ways to pick (or remove) a delimiter 1, 2, 3 times and generate the partitions from there.
Solution 4:[4]
You can find index
all '|'
then in all combination replace '|'
with ','
then split base ','
like below:
>>> from itertools import combinations
>>> st = 'hi|guys|whats|app'
>>> idxs_rep = [idx for idx, s in enumerate(st) if s=='|']
>>> def combs(x):
... return [c for i in range(len(x)+1) for c in combinations(x,i)]
>>> for idxs in combs(idxs_rep):
... lst_st = list(st)
... for idx in idxs:
... lst_st[idx] = ','
... st2 = ''.join(lst_st)
... print(st2.split(','))
['hi|guys|whats|app']
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
['hi', 'guys', 'whats', 'app']
Solution 5:[5]
If you want all partitions, try partitions
from more-itertools:
from more_itertools import partitions
s = 'hi|guys|whats|app'
for p in partitions(s.split('|')):
print(list(map('|'.join, p)))
Output:
['hi|guys|whats|app']
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
['hi', 'guys', 'whats', 'app']
If you only want a certain number of splits, then instead of splitting at all separators and then re-joining the parts, you could just get combinations of the separator indexes and take substrings accordingly:
from itertools import combinations
s = 'hi|guys|whats|app'
splits = 2
indexes = [i for i, c in enumerate(s) if c == '|']
for I in combinations(indexes, splits):
print([s[i+1:j] for i, j in zip([-1, *I], [*I, None])])
Output:
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
Solution 6:[6]
I am surprised that most answers are using combinations
, where this is clearly a binary power sequence (that is, multiple binary cartesian products concatenated).
Let me elaborate: if we have n
separators, we have 2**n
possible strings, where each separator is either on
or off
. So if we map each bit of an integer sequence from 0
to 2**n
to each separator (0
means we don't split, 1
means we split) we can generate the whole thing quite efficiently (without running into stack depth limits, and being able to pause and resume the generator -or even run it in parallel!- using just a simple integer to keep track of progress).
def partition(index, tokens, separator):
def helper():
n = index
for token in tokens:
yield token
if n % 2:
yield separator
n //= 2
return ''.join(helper())
def all_partitions(txt, separator):
tokens = txt.split(separator)
for i in range(2**(len(tokens)-1)):
yield partition(i, tokens, separator)
for x in all_partitions('hi|guys|whats|app', '|'):
print(x)
Explanation:
hi|guys|whats|app
^ ^ ^
bit 0 1 2 (big endian representation)
hi guys whats up
^ ^ ^
0 = 0 0 0
hi|guys whats up
^ ^ ^
1 = 1 0 0
hi guys|whats up
^ ^ ^
2 = 0 1 0
hi|guys|whats up
^ ^ ^
3 = 1 1 0
hi guys whats|up
^ ^ ^
4 = 0 0 1
hi|guys whats|up
^ ^ ^
5 = 1 0 1
hi guys|whats|up
^ ^ ^
6 = 0 1 1
hi|guys|whats|up
^ ^ ^
7 = 1 1 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | U12-Forward |
Solution 3 | |
Solution 4 | I'mahdi |
Solution 5 | |
Solution 6 | fortran |