'How to extract rotated images from PDF with iText
I need to extract images from PDF. I know that some images are rotated 90 degrees (I checked with online tools).
I'm using this code:
PdfRenderListener:
public class PdfRenderListener : IExtRenderListener
{
// other methods ...
public void RenderImage(ImageRenderInfo renderInfo)
{
try
{
var mtx = renderInfo.GetImageCTM();
var image = renderInfo.GetImage();
var fillColor = renderInfo.GetCurrentFillColor();
var color = Color.FromArgb(fillColor?.RGB ?? Color.Empty.ToArgb());
var fileType = image.GetFileType();
var extension = "." + fileType;
var bytes = image.GetImageAsBytes();
var height = mtx[Matrix.I22];
var width = mtx[Matrix.I11];
// rotated image
if (height == 0 && width == 0)
{
var h = Math.Abs(mtx[Matrix.I12]);
var w = Math.Abs(mtx[Matrix.I21]);
}
// save image
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
}
When I save images with this code the rotated images are saved with distortion.
I have read this post iText 7 ImageRenderInfo Matrix contains negative height on Even number Pages and mkl answer.
In current transfromation matrix (mtx) I have these values:
0 | 841.9 | 0 |
-595.1 | 0 | 0 |
595.1 | 0 | 1 |
I know image rotated 90 degrees. How can I transform an image to get a normal image?
Solution 1:[1]
As @mkl mentioned, the true reason was not in the rotation of the image, but with the applied filter.
I analyzed the pdf file with iText RUPS and found that the image was encoded with a CCITTFaxDecode filter: RUPS screen
Next, I looked for ways to decode this filter and found these questions
- Extracting image from PDF with /CCITTFaxDecode filter.
- How to use Bit Miracle LibTiff.Net to write the image to a MemoryStream
I used the BitMiracle.LibTiff.NET library
I wrote this method:
private byte[] DecodeInternal(byte[] rawBytes, int width, int height, int k, int bitsPerComponent)
{
var compression = GetCompression(k);
using var ms = new MemoryStream();
var tms = new TiffStream();
using var tiff = Tiff.ClientOpen("in-memory", "w", ms, tms);
tiff.SetField(TiffTag.IMAGEWIDTH, width);
tiff.SetField(TiffTag.IMAGELENGTH, height);
tiff.SetField(TiffTag.COMPRESSION, compression);
tiff.SetField(TiffTag.BITSPERSAMPLE, bitsPerComponent);
tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
var writeResult = tiff.WriteRawStrip(0, rawBytes, rawBytes.Length);
if (writeResult == -1)
{
Console.WriteLine("Decoding error");
}
tiff.CheckpointDirectory();
var decodedBytes = ms.ToArray();
tiff.Close();
return decodedBytes;
}
private Compression GetCompression(int k)
{
return k switch
{
< 0 => Compression.CCITTFAX4,
0 => Compression.CCITTFAX3,
_ => throw new NotImplementedException("K > 0"),
};
}
After decoding and rotating the image, I was able to save a normal image. Thanks everyone for the help.
Solution 2:[2]
You can try this. I'm using Itext 7 for java. Here you still need to write your own listener:
public class MyImageRenderListener implements IEventListener {
protected String path;
protected String extension;
public MyImageRenderListener (String path) {
this.path = path;
}
public void eventOccurred(IEventData data, EventType type) {
switch (type) {
case RENDER_IMAGE:
try {
String filename;
FileOutputStream os;
ImageRenderInfo renderInfo = (ImageRenderInfo) data;
PdfImageXObject image = renderInfo.getImage();
if (image == null) {
return;
}
byte[] imageByte = image.getImageBytes(true);
extension = image.identifyImageFileExtension();
filename = String.format(path, image.getPdfObject().getIndirectReference().getObjNumber(), extension);
os = new FileOutputStream(filename);
os.write(imageByte);
os.flush();
os.close();
} catch (com.itextpdf.io.exceptions.IOException | IOException e) {
System.out.println(e.getMessage());
}
break;
default:
break;
}
}
public Set<EventType> getSupportedEvents() {
return null;
}
}
I checked for a pdf with a random rotation angle, and 90 degrees, the resulting picture was obtained without distortion
public void manipulatePdf() throws IOException, SQLException, ParserConfigurationException, SAXException {
PdfDocument pdfDoc = new PdfDocument(new PdfReader("path to pdf"), new PdfWriter(new ByteArrayOutputStream()));
MyImageRenderListener listener = new MyImageRenderListener("path to resulting image");
PdfCanvasProcessor parser = new PdfCanvasProcessor(listener);
for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
parser.processPageContent(pdfDoc.getPage(i));
}
pdfDoc.close();
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Zhenya Prudnikov |