'spaCy library to extract noun phrase - ValueError: [E866] Expected a string or 'Doc' as input, but got: <class 'float'>

currently I'm trying to extract noun phrase from sentences. The sentences were stored in a column in excel file. Here the code using python:

import pandas as pd
import spacy

df = pd.read_excel("xxx.xlsx")

nlp = spacy.load("en_core_web_md")
for row in range(len(df)):
    doc = nlp(df.loc[row, "Title"])
    for np in doc.noun_chunks:
        print(np.text)

But I got this error:

Traceback (most recent call last):
  File "/Users/pusinov/PycharmProjects/textsummarizer/paper_term_extraction.py", line 10, in <module>
    doc = nlp(df.loc[row, "Title"])
  File "/Users/pusinov/PycharmProjects/textsummarizer/venv/lib/python3.9/site-packages/spacy/language.py", line 1002, in __call__
    doc = self._ensure_doc(text)
  File "/Users/pusinov/PycharmProjects/textsummarizer/venv/lib/python3.9/site-packages/spacy/language.py", line 1093, in _ensure_doc
    raise ValueError(Errors.E866.format(type=type(doc_like)))
ValueError: [E866] Expected a string or 'Doc' as input, but got: <class 'float'>.

Can anyone help me to make better code? Thank you very much.

p.s. I'm still newbie in python



Solution 1:[1]

I faced a similar issue and I fixed it using

df['Title']= df['Title'].astype(str)

The use of this code will fix the problem. As you have to convert all the data values to str format (usually it happens as comment might be number, or nan or null).

Solution 2:[2]

Do null-value analysis. if you have any null values in your dataset, drop them.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Suraj Rao
Solution 2 Victor S