Category "text-segmentation"

Parsing HTML into sentences - how to handle tables/lists/headings/etc?

How do you go about parsing an HTML page with free text, lists, tables, headings, etc., into sentences? Take this wikipedia page for example. There is/are: fr

Python: Cut off the last word of a sentence?

What's the best way to slice the last word from a block of text? I can think of Split it to a list (by spaces) and removing the last item, then reconcatenat