'Python: find the start index of a specific word number in a string

I have this string:

myString = "Tomorrow will be very very rainy"

I would like to get the start index of the word number 5 (very).

What I do currently, I do split myString into words:

words = re.findall( r'\w+|[^\s\w]+', myString)

But I am not sure on how to get the start index of the word number 5: words[5].

Using the index() is not working as it finds the first occurrence:

start_index = myString.index(words[5])

Solution 1:^[1]

Not very elegant, but loop through the list of split words and calculate the index based on the word length and the split character (in this case a space). This answer will target the fifth word in the sentence.

myString = "Tomorrow will be very very rainy"

target_word = 5

split_string = myString.split()

idx_start = 0

for i in range(target_word-1):
    idx_start += len(split_string[i])
    if myString[idx_start] == " ":
        idx_start += 1

idx_end = idx_start + len(split_string[target_word-1]) + 1

print(idx_start, idx_end, myString[idx_start:idx_end])

Solution 2:^[2]

wordnum = 5
l = [x.span()[1] for x in re.finditer(" +", string)]
pos = l[wordnum-2]
print(pos)

output

Solution 3:^[3]

If only single spaces between words:

Sum all word lengths before the wanted word
Add amount of spaces

word_idx = 4  # zero based index
words = myString.split()
start_index = sum(len(word) for word in words[:word_idx]) + word_idx

Result:

Solution 4:^[4]

If the string starts with 5 words, you can match the first 4 words and capture the fifth one.

The you can use the start method and pass 1 to it for the first capture group of the Match Object.

^(?:\w+\s+){4}(\w+)

Explanation

^ Start of string
(?:\w+\s+){4} Repeat 4 times matching 1+ word characters and 1+ whitspace chars
(\w+) Capture group 1, match 1+ word characters

Example

import re

myString = "Tomorrow will be very very rainy"
pattern = r"^(?:\w+\s+){4}(\w+)"
m = re.match(pattern, myString)
if m:
    print(m.start(1))

Output

For a broader match you can use \S+ to match one or more non whitespace characters.

pattern = r"^(?:\S+\s+){4}(\S+)"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Milos Cuculovic
Solution 2
Solution 3	ivvija
Solution 4	The fourth bird

'Python: find the start index of a specific word number in a string

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]