'Python and PowerPoint replace '<br><br>' becomes '_x000D_'

I'm creating a PowerPoint with Python pptx and my query result string contains the html '<br><br>' and I'm trying to replace it with '\n' like:

TDsFirst = "\n" + self.TxtStringFromSQLserver.replace('<br><br>', '\n')
TDPs = TDPsFirst.replace('<br>', '\n')
TipDPsText_run.text = TDPs

This results in the lines ending with '_x000D_'

What am I doing wrong? How can I convert the '<br>' to returns?



Solution 1:[1]

This behavior is a little bit new, but is the expected behavior:
https://python-pptx.readthedocs.io/en/latest/api/text.html#pptx.text.text._Run.text

A run can only contain text. A line-break or paragraph boundary happens at a higher level. In particular, a line-break can only occur between runs, inside a paragraph. A paragraph "break" can only occur in a text-frame, between, well, paragraphs.

So depending on what you're trying to do, the solution may just be to make the assignment at the text-frame level rather than the run level as your variable-name TipDPsText_run suggests. Line-feed characters (\n) are accepted by TextFrame.text and are turned into paragraph boundaries.

That may not entirely solve the problem, but it may (I give it an 90% likelihood) and will at least change the question to one that can be solved.

UPDATE: After further review of the code, in fact a newline by itself "\x0A" is accepted by Run.text and placed unchanged into the XML where it probably looks pretty much like a line-break. This legacy courtesy does not extend to carriage-return "\x0D" which is rendered just as you see as "_x000D_". This extra CR byte is in there because you're running on Windows. Accordingly, you may be able to work around this by using "\x0A" instead of "\n" in your text assignment. But I recommend the text-frame level assignment as the more approach more consistent with PowerPoint behavior, where typing in a carriage-return creates a new paragraph.

Solution 2:[2]

The PowerPoint handles only 0x0a as a line break. When you use the "Python pptx" to create an extra line in a single placeholder, if the source text contains a 0x0d code, the output page shows a strange word _x000D_ at the end of the line. So I made a simple filter to fix that problem.

This simple code replaces \r\n with a single \n.

def office_comp(usr_txt):
    u_items = usr_txt.splitlines()
    return '\n'.join(u_items)

Then use the Python pptx such as:

new_slide.placeholders[p].text = office_comp(your_text_asis)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 richardec