'How to find equivalent text in another with aggregated text?
Given:
const textToFind = 'Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, '
const paragraph = 'Lorem Ipsum has been the industry's [standard](wwww.meh.com) dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.'
I require to output:
Lorem Ipsum has been the industry's [standard](wwww.meh.com) dummy text ever since the 1500s,
i.e. match the textToFind
on paragraph
and then extract it.
I have figured out this regex to find markdown links: /\[([^\]]+)\]\([^)"]+\)/g
, but I'm not sure what else to do after that.
textToFind
is derived from paragraph
in the beginning, and I need it to calculate the width of each line, thus why I'm not considering modifying standard
to some unique identifier (so as to replace it later with the real text), because if the characters change, then so will the width.
Additional Info:
I am using React Native Text's
<Text onTextLayout={....} numberOfLines={x} />
to obtain the lines rendered in a paragraph x
, but this text has not been converted from markdown (if so, the links are lost, since it only parses pure text, not View
s, not Text
properties, etc.)
Currently:
I am thinking of encrypting the [plainText](url)
(e.g. reversePlainText().QueueShiftTwoCharacters()
),
and save this encryption in a parallel
recordedLinks = Queue<Record<encryptedPlainText, originalUnparsedMarkdown>>()`
and consults it in order.
This way, when going from [plainText](url)
to encryptedPlainText
(and almost losing the url and positioning), we can match recordedLinks
in order as the screen renders each of these pieces of cryptic runics line of text, it will give encryptedPlainText
s their links in a FIFO way.
Solution 1:[1]
The following is a rough and ready solution. It assumes no regex special characters in the textToFind
(if there are they can be escaped simply enough).
A regex is created from textToFind
where every word has the option to be the link text of a markdown link, for example Ipsum
becomes (?:Ipsum|\[Ipsum\]\([^)"]+\))
in the regex string.
const textToFind = 'Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, ';
const paragraph = 'Lorem Ipsum has been the industry's [standard](wwww.meh.com) dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.'
const regexString = textToFind.replace(/\w+/g, '(?:$&|\\[$&\\]\\([^)"]+\\))');
const match = paragraph.match(new RegExp(regexString, 'g'));
console.log(match);
If you explain how textToFind
is being used to calculate the width of each line then a more robust solution may be forthcoming.
Solution 2:[2]
I'll lay down the "Pythonic" approach I would have to this problem. Following I'll describe the steps I'd take:
- apply your regex for link matching on the paragraph, retrieve triple given by
(<link_match>, <word_match>, <start_index>)
- transform the paragraph into its shape without links and update the triples with the new
<start_index>
value - if
text_to_find
can be found inside the updatedparagraph
then for each triple - update
text_to_find
hot word with its link-like version.
Here's the code:
import re
text_to_find = 'Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, '
paragraph = 'Lorem [Ipsum](lalala) has been the industry's [standard](www.meh.com) dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to [make](hello?) a type [specimen](yes) book.'
# generate triple (<link_match>, <word_match>, <start_index>)
lst = [(m.group(0), m.group(1), m.start(0)) for m in re.finditer(r'\[([^\]]+)\]\([^)"]+\)', paragraph)]
# update paragraph and triple
subtract = 0
for idx, (link, match, i) in enumerate(lst):
paragraph = paragraph.replace(link, match, 1)
lst[idx] = (link, match, i-1+subtract)
subtract += len(match) - len(link)
# update text with links
if not paragraph.find(text_to_find)+1:
print('no reference')
else:
for (link, match, i) in lst[::-1]:
if i < len(text_to_find):
text_to_find = text_to_find[:i+1] + link + text_to_find[i+1+len(match):]
print(text_to_find)
If you have more than one text_to_find
in different paragraphs, you can store the paragraph conversion and the text_to_find
translation into two different functions and call them within a cycle over paragraphs and texts to be found accordingly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | lemon |