How can I check if PDF page is image(scanned) by PDFBOX, XPDF -
pdfbox problem on extract images. hi, how can check if pdf page image , extract pdfbox library, there method images if pdf page image not getting. 1 me solve problem.
xpdf problem on extract images. try extract images library xpdf strange flip on page if image. if pdf contain small image object image give me ok, if page scanned doing flip.
i want extract images pdf, if page scanned them image, if page contain plain text , images images page.
my point extract images pdf. not form page if page image extract them image not skip them how doing think pdfbox.
xpdf doing thing there problem flip(top,right) on page when export scanned page
how can solve problem thanks.
`pddocument document = pddocument.load(new file("/home/dru/ideaprojects2/pdfextractor/test/t1.pdf")); pdpagetree list = document.getpages(); (pdpage page : list) { pdresources pdresources = page.getresources(); system.out.println(pdresources.getresourcecache()); (cosname c : pdresources.getxobjectnames()) { pdxobject o = pdresources.getxobject(c); if (o instanceof org.apache.pdfbox.pdmodel.graphics.image.pdimagexobject) { file file = new file("/home/dru/ideaprojects2/pdfextractor/test/out/" + system.nanotime() + ".png"); imageio.write(((org.apache.pdfbox.pdmodel.graphics.image.pdimagexobject)o).getimage(), "png", file); } } }`
extract images properly
as updated pdf makes clear problem not have images immediately on page has form xobjects drawn onto contain images. thus, image search has recurse form xobjects.
and not all: pages in updated pdf share same resources dictionary, merely pick different of form xobjects display. thus, 1 has parse respective page content stream determine xobject (with images) present on given page.
actually pdfbox tool extractimages
does. unfortunately, though, not show page found image in question on, cf. extractimages.java test method testextractpageimagestool10948new
.
but can borrow technique used tool:
pddocument document = pddocument.load(resource); int page = 1; (final pdpage pdpage : document.getpages()) { final int currentpage = page; pdfgraphicsstreamengine pdfgraphicsstreamengine = new pdfgraphicsstreamengine(pdpage) { int index = 0; @override public void drawimage(pdimage pdimage) throws ioexception { if (pdimage instanceof pdimagexobject) { pdimagexobject image = (pdimagexobject)pdimage; file file = new file(result_folder, string.format("10948-new-engine-%s-%s.%s", currentpage, index, image.getsuffix())); imageioutil.writeimage(image.getimage(), image.getsuffix(), new fileoutputstream(file)); index++; } } @override public void appendrectangle(point2d p0, point2d p1, point2d p2, point2d p3) throws ioexception { } @override public void clip(int windingrule) throws ioexception { } @override public void moveto(float x, float y) throws ioexception { } @override public void lineto(float x, float y) throws ioexception { } @override public void curveto(float x1, float y1, float x2, float y2, float x3, float y3) throws ioexception { } @override public point2d getcurrentpoint() throws ioexception { return null; } @override public void closepath() throws ioexception { } @override public void endpath() throws ioexception { } @override public void strokepath() throws ioexception { } @override public void fillpath(int windingrule) throws ioexception { } @override public void fillandstrokepath(int windingrule) throws ioexception { } @override public void shadingfill(cosname shadingname) throws ioexception { } }; pdfgraphicsstreamengine.processpage(pdpage); page++; }
(extractimages.java test method testextractpageimages10948new
)
this code outputs images file names "10948-new-engine-1-0.tiff", "10948-new-engine-2-0.tiff", "10948-new-engine-3-0.tiff", , "10948-new-engine-4-0.tiff", i.e. 1 per page.
ps: please remember include com.github.jai-imageio:jai-imageio-core
in classpath, required tiff output.
flipped images
another issue of op images appear flipped upside-down, e.g. in case of newest sample file "t1_edited.pdf". reason images indeed stored upside-down image resources in pdf.
when images drawn onto page, current transformation matrix in effect @ time mirrors image drawn vertically , creates expected appearance.
by enhancing drawimage
implementation in code above, 1 can include indicators of such flips in names of exported images:
public void drawimage(pdimage pdimage) throws ioexception { if (pdimage instanceof pdimagexobject) { matrix ctm = getgraphicsstate().getcurrenttransformationmatrix(); string flips = ""; if (ctm.getscalex() < 0) flips += "h"; if (ctm.getscaley() < 0) flips += "v"; if (flips.length() > 0) flips = "-" + flips; pdimagexobject image = (pdimagexobject)pdimage; file file = new file(result_folder, string.format("t1_edited-engine-%s-%s%s.%s", currentpage, index, flips, image.getsuffix())); imageioutil.writeimage(image.getimage(), image.getsuffix(), new fileoutputstream(file)); index++; } }
now vertically or horizontally flipped images marked accordingly.
Comments
Post a Comment