'Error 'power iteration failed to converge within 100 iterations') when I tried to summarize a text document using python networkx
I got an PowerIterationFailedConvergence:(PowerIterationFailedConvergence(...), 'power iteration failed to converge within 100 iterations') when I tried to summarize a text document using python networkx as shown in the code below. the error shown at the code "scores = nx.pagerank(sentence_similarity_graph)"
def read_article(file_name):
file = open(file_name, "r",encoding="utf8")
filedata = file.readlines()
text=""
for s in filedata:
text=text+s.replace("\n","")
text=re.sub(' +', ' ', text) #remove space
text=re.sub('—',' ',text)
article = text.split(". ")
sentences = []
for sentence in article:
# print(sentence)
sentences.append(sentence.replace("[^a-zA-Z]", "").split(" "))
sentences.pop()
new_sent=[]
for lst in sentences:
newlst=[]
for i in range(len(lst)):
if lst[i].lower()!=lst[i-1].lower():
newlst.append(lst[i])
else:
newlst=newlst
new_sent.append(newlst)
return new_sent
def sentence_similarity(sent1, sent2, stopwords=None):
if stopwords is None:
stopwords = []
sent1 = [w.lower() for w in sent1]
sent2 = [w.lower() for w in sent2]
all_words = list(set(sent1 + sent2))
vector1 = [0] * len(all_words)
vector2 = [0] * len(all_words)
# build the vector for the first sentence
for w in sent1:
if w in stopwords:
continue
vector1[all_words.index(w)] += 1
# build the vector for the second sentence
for w in sent2:
if w in stopwords:
continue
vector2[all_words.index(w)] += 1
return 1 - cosine_distance(vector1, vector2)
def build_similarity_matrix(sentences, stop_words):
# Create an empty similarity matrix
similarity_matrix = np.zeros((len(sentences), len(sentences)))
for idx1 in range(len(sentences)):
for idx2 in range(len(sentences)):
if idx1 == idx2: #ignore if both are same sentences
continue
similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2], stop_words)
return similarity_matrix
stop_words = stopwords.words('english')
summarize_text = []
# Step 1 - Read text anc split it
new_sent = read_article("C:\\Users\\Documents\\fedPressConference_0620.txt")
# Step 2 - Generate Similary Martix across sentences
sentence_similarity_martix = build_similarity_matrix(new_sent1, stop_words)
# Step 3 - Rank sentences in similarity martix
sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix)
scores = nx.pagerank(sentence_similarity_graph)
# Step 4 - Sort the rank and pick top sentences
ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(new_sent1)), reverse=True)
print("Indexes of top ranked_sentence order are ", ranked_sentence)
for i in range(10):
summarize_text.append(" ".join(ranked_sentence[i][1]))
# Step 5 - Offcourse, output the summarize texr
print("Summarize Text: \n", ". ".join(summarize_text))
Solution 1:[1]
Maybe you have solved it by now.
The problem is that you are using vectors too long. Your vectors are being built using the entire vocabulary, which may be too long for the model to converge in only 100 cycles (which is the default value for pagerank).
You can either reduce the length of the vocabulary (did you check if it is removing the stopwords correctly?) or using any other technique like reducing the less frequent words, or using TF-IDF.
In my case, I faced the same problem, but using Glove word embeddings. With 300 dimensions, I couldn't get convergence, which is easily solvable by using the 100 dimension model.
The other thing that you could try is to extend the max_iter parameter when calling nx.pagerank:
nx.pagerank(nx_graph, max_iter=600) # Or any number that will work for you.
The default value es 100 cycles.
Solution 2:[2]
This error happens when the algorithm fails to converge to the specified tolerance within the specified number of iterations of the power iteration method. So you can increase Error tolerance in nx.pagerank() like this:
nx.pagerank(sentence_similarity_graph, tol=1.0e-3)
Default value is 1.0e-6.
You can check networkx documentation as well.
PageRank Networkx
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Afrooz Sheikholeslami |