'PDFBox - How to change encoding from WinAnsiEncoding to Unicode?
I am trying to find a way I could change the WinAnsiEncoding to Unicode, I've tried setting font like this,
PDDocument doc = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
doc.addPage(page);
File unicodeFileLocation = new File(getServletContext().getRealPath("/lib/ARIALUNI.TTF"));
PDTrueTypeFont unicodeFont = PDTrueTypeFont.loadTTF(doc, unicodeFileLocation);
...
// Create Table using boxable API
BaseTable table = new BaseTable(yStart, yStartNewPage, bottomMargin, tableWidth, margin, doc, page, true, drawContent);
// Title Field
Row<PDPage> titleRow = table.createRow(rowHeight);
Cell<PDPage> cell = titleRow.createCell(30, "Title");
cell = titleRow.createCell(70, TitleText);
cell.setFont(unicodeFont);
table.draw();
For simple Text this works fine, I can see the font change from Helvetica but if the text contains UTF-8 characters (e.g., U+0083 etc), I just see the following exception thrown,
java.lang.IllegalArgumentException: U+0083 is not available in this font's encoding: WinAnsiEncoding org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:371) org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:316) org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:345) be.quodlibet.boxable.text.PipelineLayer.push(PipelineLayer.java:65) be.quodlibet.boxable.Paragraph.getLines(Paragraph.java:341) be.quodlibet.boxable.Paragraph.getHeight(Paragraph.java:465) be.quodlibet.boxable.Cell.getTextHeight(Cell.java:392) be.quodlibet.boxable.Cell.getCellHeight(Cell.java:367) be.quodlibet.boxable.Row.getHeight(Row.java:166) be.quodlibet.boxable.Table.isEndOfPage(Table.java:728) be.quodlibet.boxable.Table.drawRow(Table.java:224) be.quodlibet.boxable.Table.draw(Table.java:200) com.ssl.pew.controller.ExportPEW.processRequest(ExportPEW.java:498) com.ssl.pew.controller.ExportPEW.doPost(ExportPEW.java:792) javax.servlet.http.HttpServlet.service(HttpServlet.java:648) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
When I try to see the encoding type, it's always WinAnsiEncoding which I do not need.
Encoding encoding = unicodeFont.getEncoding();
String encodingName = encoding.getEncodingName();
This gives me WinAnsiEncoding, is there any way I could change this?
To me, it seems like it's because of WinAnsiEncoding and if somehow I could change that, I might be able to solve this issue.
It seems that mostly people decided to move to iText which is not an option for me.
Solution 1:[1]
The FAQ says:
Font Handling
I’m getting java.lang.IllegalArgumentException: … is not available in this font’s encoding: WinAnsiEncoding
Check whether the character is available in WinAnsiEncoding by looking at the PDF Specification Appendix D. If not, but if it is available in this font (in windows, have a look with charmap.exe), then load the font with PDType0Font.load(), see also in the EmbeddedFonts.java example in the source code download.
It's working for me with, for example,
PDType0Font.load(document, new ClassPathResource("fonts/OpenSans-Regular.ttf").getFile());
Solution 2:[2]
Here try this
PDFont font = PDTrueTypeFont.load(document, new File(fontPath)), WinAnsiEncoding.INSTANCE);
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Community |
Solution 2 | kautilya hari |