'Removing [' and '] from CSV

I have several GB of CSV files where values in one of the columns look like this:

URL in brackets

Which is a consequence of this:

urls.append(re.findall(r'http\S+', hashtags_rem))
...
merger = {'Content': clean, 'AttrURL': urls}
cleandf = pd.DataFrame(merger)
...
df.insert(3, "AssocURL", cleandf['AttrURL'])

It took me a while to generate these files and, looking back, I'd certainly write this part differently, but doing it again is a very time-consuming and simply unnecessary endeavour.

Is there another efficient way to remove [' and '] from this column using pandas or csv?



Solution 1:[1]

You can use pandas.DataFrame.apply to remove the squared parentheses. It should be something like this:

df.apply(lambda string: string[2:-2])

Solution 2:[2]

Not a very attractive answer, but how about just with .str.replace ?

df['AssocURL'].str.replace("\'","").str.replace("[","").str.replace("]","")

Solution 3:[3]

From the question it's unclear if the column is a string or if it contains elements which are themselves a list which contains a single string. re.findall returns the second option. If it is the second option eg.,

df = pd.DataFrame({'AssocURL': [['link1'], ['link2']]})
#   AssocURL
# 0  [link1]
# 1  [link2]

You can use explode:

df['AssocURL'] = df['AssocURL'].explode()
#   AssocURL
# 0    link1
# 1    link2

Solution 4:[4]

Super simple, just do:

df['AssocURL'].replace("['", '').replace("']", '')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Daniel Weigel
Solution 3 Kraigolas
Solution 4 thenoob ofsome number of noobs