Tag: PDFBOX

将PDF转换为SVG: 我想转换PDF到SVG，请build议一些图书馆/可执行文件，将能够有效地做到这一点。我已经使用apache PDFBox和Batik库编写了自己的java程序 – PDDocument document = PDDocument.load( pdfFile ); DOMImplementation domImpl = GenericDOMImplementation.getDOMImplementation(); // Create an instance of org.w3c.dom.Document. String svgNS = "http://www.w3.org/2000/svg"; Document svgDocument = domImpl.createDocument(svgNS, "svg", null); SVGGeneratorContext ctx = SVGGeneratorContext.createDefault(svgDocument); ctx.setEmbeddedFontsOn(true); // Ask the test to render into the SVG Graphics2D implementation. for(int i = 0 ; i < document.getNumberOfPages() ; […]

PDF查明文本是带下划线还是表格单元格: 我一直在玩PdfBox和PDFTextStripperByArea方法。如果文本是粗体或斜体，我能够提取信息，但是我无法获取下划线信息。据我所知在PDF中，下划线是通过画线来完成的。所以从理论上讲，我应该可以得到关于文本周围某些线条的某种信息。提供这些信息，我可以找出是否有文字加下划线或表格。这是我的代码到目前为止： List<TextPosition> textPos = charactersByArticle.get(index); for (TextPosition t : textPos) { if (t.getFont().getFontDescriptor() != null) { if (t.getFont().getFontDescriptor().getFontWeight() > BOLD_WEIGHT || t.getFont().getFontDescriptor().isForceBold()) { isBold = true; } if (t.getFont().getFontDescriptor().isItalic()) { isItalic = true; } } } 我试图在PDFStreamEngine类中的processEncodedText方法中处理PDGraphicsState对象，但没有在其中find行的信息。任何build议，这些信息可以从中检索？

如何将两个PDF文件合并到一个Java中？: 我想合并很多PDF文件到一个使用PDFBox ，这就是我所做的： PDDocument document = new PDDocument(); for (String pdfFile: pdfFiles) { PDDocument part = PDDocument.load(pdfFile); List<PDPage> list = part.getDocumentCatalog().getAllPages(); for (PDPage page: list) { document.addPage(page); } part.close(); } document.save("merged.pdf"); document.close(); 其中pdfFiles是包含所有PDF文件的ArrayList<String> 。当我运行上述，我总是得到： org.apache.pdfbox.exceptions.COSVisitorException: Bad file descriptor 难道我做错了什么？还有其他方法吗？

如何在pdfbox中插入另一个PDPage中的PDPage: 我使用不同的工具，如处理来创buildvector图。这些地块被写成单页或多页PDF文件。我想用pdfbox将这些地块包含在一个单一的报告中。我目前的工作stream程包括这些pdf作为图像与下面的伪代码 PDDocument inFile = PDDocument.load(file); PDPage firstPage = (PDPage) inFile.getDocumentCatalog().getAllPages().get(0); BufferedImage image = firstPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300); PDXObjectImage ximage = new PDPixelMap(document, image); PDPageContentStream contentStream = new PDPageContentStream(document, page); contentStream.drawXObject(ximage, 0, 0, ximage.getWidth(), ximage.getHeight()); contentStream.close(); 虽然这工作，它放弃了vector文件格式的好处，特别是文件/大小与打印质量。是否可以使用pdfbox作为embedded对象在页面中包含其他PDF页面（不作为单独的页面添加）？我可以使用PDStream吗？我更喜欢像PDFlatex这样的解决scheme能够将pdf数字embedded到新的pdf文档中。你可以推荐哪些Java库来完成这个任务？

使用PDFBox将UTF-8编码的string写入PDF: 我无法使用PDFBox将unicode字符写入PDF。这是一些示例代码，生成垃圾字符，而不是输出“š”。我可以添加什么来获得对UTF-8string的支持？ PDDocument document = new PDDocument(); PDPage page = new PDPage(); document.addPage(page); PDPageContentStream contentStream = new PDPageContentStream(document, page); PDType1Font font = PDType1Font.HELVETICA; contentStream.setFont(font, 12); contentStream.beginText(); contentStream.moveTextPositionByAmount(100, 400); contentStream.drawString("š"); contentStream.endText(); contentStream.close(); document.save("test.pdf"); document.close();

如何将PDFBox添加到Android项目或build议替代scheme: 我试图打开一个现有的PDF文件，然后在Android应用程序中添加另一个页面到PDF文档。在添加的页面上，我需要添加一些文本和图像。我想给PDFBox一试。其他解决scheme（如iTextPDF）由于许可条款/价格而不适合我们公司。我有一个主要代码库的图书馆项目，以及参考图书馆项目的完整和精简的项目。我从http://pdfbox.apache.org/download.html下载了jar，并将其复制到库项目库文件夹中，并将pdfbox-app-1.6.0.jar文件添加到java构buildpath库中。我能够成功import org.apache.pdfbox.pdmodel.PDDocument;图书馆，例如import org.apache.pdfbox.pdmodel.PDDocument; 并编译所有的项目。但是，当我运行应用程序崩溃PDDocument document = new PDDocument(); 与以下错误。 E / AndroidRuntime（24451）：java.lang.NoClassDefFoundError：org.apache.pdfbox.pdmodel.PDDocument 我读的地方PDFBox的1.5版以后没有与Android的工作，所以我试图下载pdfbox-app-1.4.0.jar文件，但得到了同样的问题。我也在我的完整和精简的项目中添加了库的构buildpath，但我得到了同样的错误或日食不断崩溃与内存不足的错误。谁能告诉我我做错了什么？我下载了错误的文件？我是否正确导入？谢谢，

使用PDFBoxparsingPDF文件（尤其是使用表格）: 我需要parsing一个包含表格数据的PDF文件。我正在使用PDFBox来提取文件文本以后parsing结果（string）。问题是文本提取不像我预期的表格数据那样工作。例如，我有一个包含这样一个表的文件（7列：前两个总是有数据，只有一个Complexity列有数据，只有一个Financing列有数据）： +—————————————————————-+ | AIH | Value | Complexity | Financing | | | | Medium | High | Not applicable | MAC/Other | FAE | +—————————————————————-+ | xyz | 12.43 | 12.34 | | | 12.34 | | +—————————————————————-+ | abc | 1.56 | | 1.56 | | | 1.56| +—————————————————————-+ 然后我使用PDFBox： […]

使用pdfbox从PDF中提取图像: 我尝试使用pdfbox从PDF中提取图像。这里的例子pdf 但即时通讯只有空白图像。代码即时尝试： – public static void main(String[] args) { PDFImageExtract obj = new PDFImageExtract(); try { obj.read_pdf(); } catch (IOException ex) { System.out.println("" + ex); } } void read_pdf() throws IOException { PDDocument document = null; try { document = PDDocument.load("C:\\Users\\Pradyut\\Documents\\MCS-034.pdf"); } catch (IOException ex) { System.out.println("" + ex); } List pages = […]

PdfBox编码符号货币欧元: 我用Apache PDFBox库创build了一个PDF文档。我的问题是在页面上绘制string时编码欧元货币符号，因为基本字体Helvetica不提供此字符。如何将输出“þÿ¬”转换为符号“€”？

如何使用Apache pdfbox在PDF中生成多行: 我正在使用Pdfbox来使用Java生成PDF文件。问题是，当我在文档中添加长文本内容时，显示不正确。只显示其中的一部分。这也是一条线。我想要文本在多行。我的代码如下： PDPageContentStream pdfContent=new PDPageContentStream(pdfDocument, pdfPage, true, true); pdfContent.beginText(); pdfContent.setFont(pdfFont, 11); pdfContent.moveTextPositionByAmount(30,750); pdfContent.drawString("I am trying to create a PDF file with a lot of text contents in the document. I am using PDFBox"); pdfContent.endText(); 我的输出：