'python regex to print text from a specific pattern to another pattern, but in condition that a specific string should exist in between
So I have a file like :
<html>
<div>
<h1>HOiihilasdl</h1>
</div>
<script src=https://example.com/file.js></script>
<script>
blabla
blabla
blabla
blabla
blabla
</script>
<script src=https://example.com/file.js></script>
<script>
blabla
blabla
cow
blabla
blabla
</script>
</html>
And I want to print from <script>
to </script>
but only print if the word cow exists in between ( i want to do that using python regex).
The output would look like this :
<script>
blabla
blabla
cow
blabla
blabla
</script>
I've searched many answers but i didn't find the one that solves my problem.
I am also wondering If it is possible that If the word "cow" exists between <script>
and </script>
to just return me "script"
I'm using Python 3.10.4
Solution 1:[1]
I am not completely certain what you are going for here. If you are simply going for scenarios such as those you explicitly present in your question, a solution could look as follows, in which you iterate through each line of the file, and keep track of opening/closing tags. Whenever you meet a closing tag, you begin storing lines. If a pattern such as "cow" is not found before the next closing tag, the search starts over when the next opening tag is met.
Note: The solution below does not work for nested tags, but can easily be altered to do so.
def find_pattern(file, pattern):
with open(file, 'r') as f:
lines = []
start = False
found_pattern = False
# Iterate through the lines in the file
for line in f:
# Remove the newline character
line = line.replace("\n", "")
# Remove the leading whitespaces
stripped_line = line.lstrip()
# If we met the start of a tag such as <script>, we need to keep track of the lines until we met the end tag
if start is False and stripped_line.startswith("<") and not "</" in line:
start = True
# We only append lines, whenever we start keeping track
if start:
lines.append(line)
# If we find the pattern, we set a flag to true
if pattern in line:
found_pattern = True
# If we met an end tag, we have two possibilities:
# If we found the pattern we break and print. Otherwise, we keep searching.
if stripped_line.startswith("</"):
if found_pattern:
break
else:
lines = []
start = False
# If the lines are not empty, i.e. we found the pattern, we print them
if lines:
for line in lines:
print(line)
find_pattern(file="t.txt", pattern="cow")
Output:
<script>
blabla
blabla
cow
blabla
blabla
</script>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |