如何使用iTextSharp将HTML转换为PDF

我想使用iTextSharp将下面的HTML转换为PDF，但不知道从哪里开始：

<style> .headline{font-size:200%} </style> <p> This <em>is </em> <span class="headline" style="text-decoration: underline;">some</span> <strong>sample<em> text</em></strong> <span style="color: red;">!!!</span> </p>

首先，HTML和PDF虽然是在同一时间创build的，但并不相关。 HTML旨在传达更高级别的信息，如段落和表格。虽然有方法来控制它，但最终还是要由浏览器来绘制这些更高层次的概念。 PDF旨在传达文件，文件无论在何处呈现都必须 “看起来”相同。

在HTML文档中，可能有一段100％的宽度，根据显示器的宽度，可能需要2行或10行，打印时可能是7行，当您在手机上查看时，可能会采取20行。但是，PDF文件必须独立于渲染设备，因此无论您的屏幕大小如何，都必须始终完全相同。

由于上述要求，PDF不支持“表格”或“段落”之类的抽象内容。 PDF支持三种基本function：文本，线条/形状和图像。 （还有其他的东西，如注释和电影，但我试图在这里简单。）在PDF中，你不会说“这是一个段落，浏览器做你的事情！”。相反，你说，“用这个确切的字体在这个确切的X，Y位置绘制这个文本，不用担心，我以前计算过的文本的宽度，所以我知道它将全部适合这一行”。你也不会说“这是一张桌子”，而是说“在这个确切的位置绘制这个文本，然后在这个我之前计算的另一个确切的位置绘制一个矩形，所以我知道它会出现在文本的周围”。

其次，iText和iTextSharpparsingHTML和CSS。而已。 ASP.Net，MVC，Razor，Struts，Spring等都是HTML框架，但iText / iTextSharp 100％不知道它们。与DataGridViews，Repeaters，Templates，Views等一样，这些都是框架特定的抽象。从您select的框架中获取HTML是您的责任，iText不会帮助您。如果你得到一个exception说The document has no pages或者你认为“iText不parsing我的HTML”几乎是确定的，你实际上没有HTML ，你只能认为你是。

第三，多年来一直使用的内置类是HTMLWorker但是这已经被XMLWorker （ Java / .Net ）取代。 HTMLWorker不支持CSS文件，只对最基本的CSS属性进行了有限的支持，并且实际上在特定的标签上出现中断。如果在此文件中没有看到HTML属性或CSS属性和值，则可能不受HTMLWorker支持。 XMLWorker有时可能会更复杂，但这些复杂性也使其更具可扩展性。

以下是C＃代码，演示如何将HTML标签parsing为iText抽象，并将其自动添加到正在处理的文档中。 C＃和Java是非常相似的，所以它应该是相对容易的转换。示例＃1使用内置的HTMLWorker来parsingHTMLstring。由于只支持内联样式，所以class="headline"会被忽略，但其他所有事情都应该可以正常工作。示例＃2与第一个示例相同，只是它使用了XMLWorker 。示例＃3也parsing了简单的CSS示例。

 //Create a byte array that will eventually hold our final PDF Byte[] bytes; //Boilerplate iTextSharp setup here //Create a stream that we can write to, in this case a MemoryStream using (var ms = new MemoryStream()) { //Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF using (var doc = new Document()) { //Create a writer that's bound to our PDF abstraction and our stream using (var writer = PdfWriter.GetInstance(doc, ms)) { //Open the document for writing doc.Open(); //Our sample HTML and CSS var example_html = @"<p>This <em>is </em><span class=""headline"" style=""text-decoration: underline;"">some</span> <strong>sample <em> text</em></strong><span style=""color: red;"">!!!</span></p>"; var example_css = @".headline{font-size:200%}"; /************************************************** * Example #1 * * * * Use the built-in HTMLWorker to parse the HTML. * * Only inline CSS is supported. * * ************************************************/ //Create a new HTMLWorker bound to our document using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc)) { //HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses) using (var sr = new StringReader(example_html)) { //Parse the HTML htmlWorker.Parse(sr); } } /************************************************** * Example #2 * * * * Use the XMLWorker to parse the HTML. * * Only inline CSS and absolutely linked * * CSS is supported * * ************************************************/ //XMLWorker also reads from a TextReader and not directly from a string using (var srHtml = new StringReader(example_html)) { //Parse the HTML iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml); } /************************************************** * Example #3 * * * * Use the XMLWorker to parse HTML and CSS * * ************************************************/ //In order to read CSS as a string we need to switch to a different constructor //that takes Streams instead of TextReaders. //Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css))) { using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html))) { //Parse the HTML iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss); } } doc.Close(); } } //After all of the PDF "stuff" above is done and closed but **before** we //close the MemoryStream, grab all of the active bytes from the stream bytes = ms.ToArray(); } //Now we just need to do something with those bytes. //Here I'm writing them to disk but if you were in ASP.Net you might Response.BinaryWrite() them. //You could also write the bytes to a database in a varbinary() column (but please don't) or you //could pass them to another function for further PDF processing. var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf"); System.IO.File.WriteAllBytes(testFile, bytes);

2017年的更新

HTML-to-PDF需求有好消息。正如这个答案所示， W3C标准css-break-3将解决这个问题 ……这是一个候选build议书，计划在今年转化成明确的build议，经过testing。

不太标准的解决scheme有C＃的插件，如print-css.rocks所示。

@Chris Haas已经很好地解释了如何使用itextSharp将HTML转换为PDF ，非常有帮助
我的补充是：
通过使用HtmlTextWriter我把HTML标签里面的HTML表格+内联CSS我得到我的PDF，因为我想不使用XMLWorker 。
编辑：添加示例代码：
ASPX页面：

 <asp:Panel runat="server" ID="PendingOrdersPanel"> <!-- to be shown on PDF--> <table style="border-spacing: 0;border-collapse: collapse;width:100%;display:none;" > <tr><td><img src="abc.com/webimages/logo1.png" style="display: none;" width="230" /></td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:11px;color:#10466E;padding:0px;text-align:center;"><i>blablabla</i> Pending orders report<br /></td></tr> </table> <asp:GridView runat="server" ID="PendingOrdersGV" RowStyle-Wrap="false" AllowPaging="true" PageSize="10" Width="100%" CssClass="Grid" AlternatingRowStyle-CssClass="alt" AutoGenerateColumns="false" PagerStyle-CssClass="pgr" HeaderStyle-ForeColor="White" PagerStyle-HorizontalAlign="Center" HeaderStyle-HorizontalAlign="Center" RowStyle-HorizontalAlign="Center" DataKeyNames="Document#" OnPageIndexChanging="PendingOrdersGV_PageIndexChanging" OnRowDataBound="PendingOrdersGV_RowDataBound" OnRowCommand="PendingOrdersGV_RowCommand"> <EmptyDataTemplate><div style="text-align:center;">no records found</div></EmptyDataTemplate> <Columns> <asp:ButtonField CommandName="PendingOrders_Details" DataTextField="Document#" HeaderText="Document #" SortExpression="Document#" ItemStyle-ForeColor="Black" ItemStyle-Font-Underline="true"/> <asp:BoundField DataField="Order#" HeaderText="order #" SortExpression="Order#"/> <asp:BoundField DataField="Order Date" HeaderText="Order Date" SortExpression="Order Date" DataFormatString="{0:d}"></asp:BoundField> <asp:BoundField DataField="Status" HeaderText="Status" SortExpression="Status"></asp:BoundField> <asp:BoundField DataField="Amount" HeaderText="Amount" SortExpression="Amount" DataFormatString="{0:C2}"></asp:BoundField> </Columns> </asp:GridView> </asp:Panel>

C＃代码：

 protected void PendingOrdersPDF_Click(object sender, EventArgs e) { if (PendingOrdersGV.Rows.Count > 0) { //to allow paging=false & change style. PendingOrdersGV.HeaderStyle.ForeColor = System.Drawing.Color.Black; PendingOrdersGV.BorderColor = Color.Gray; PendingOrdersGV.Font.Name = "Tahoma"; PendingOrdersGV.DataSource = clsBP.get_PendingOrders(lbl_BP_Id.Text); PendingOrdersGV.AllowPaging = false; PendingOrdersGV.Columns[0].Visible = false; //export won't work if there's a link in the gridview PendingOrdersGV.DataBind(); //to PDF code --Sam string attachment = "attachment; filename=report.pdf"; Response.ClearContent(); Response.AddHeader("content-disposition", attachment); Response.ContentType = "application/pdf"; StringWriter stw = new StringWriter(); HtmlTextWriter htextw = new HtmlTextWriter(stw); htextw.AddStyleAttribute("font-size", "8pt"); htextw.AddStyleAttribute("color", "Grey"); PendingOrdersPanel.RenderControl(htextw); //Name of the Panel Document document = new Document(); document = new Document(PageSize.A4, 5, 5, 15, 5); FontFactory.GetFont("Tahoma", 50, iTextSharp.text.BaseColor.BLUE); PdfWriter.GetInstance(document, Response.OutputStream); document.Open(); StringReader str = new StringReader(stw.ToString()); HTMLWorker htmlworker = new HTMLWorker(document); htmlworker.Parse(str); document.Close(); Response.Write(document); } }

当然包括iTextSharp Refrences到CS文件

 using iTextSharp.text; using iTextSharp.text.pdf; using iTextSharp.text.html.simpleparser; using iTextSharp.tool.xml;

希望这可以帮助！
谢谢

这是我用作指南的链接。希望这可以帮助！

使用ITextSharp将HTML转换为PDF

 protected void Page_Load(object sender, EventArgs e) { try { string strHtml = string.Empty; //HTML File path -http://aspnettutorialonline.blogspot.com/ string htmlFileName = Server.MapPath("~") + "\\files\\" + "ConvertHTMLToPDF.htm"; //pdf file path. -http://aspnettutorialonline.blogspot.com/ string pdfFileName = Request.PhysicalApplicationPath + "\\files\\" + "ConvertHTMLToPDF.pdf"; //reading html code from html file FileStream fsHTMLDocument = new FileStream(htmlFileName, FileMode.Open, FileAccess.Read); StreamReader srHTMLDocument = new StreamReader(fsHTMLDocument); strHtml = srHTMLDocument.ReadToEnd(); srHTMLDocument.Close(); strHtml = strHtml.Replace("\r\n", ""); strHtml = strHtml.Replace("\0", ""); CreatePDFFromHTMLFile(strHtml, pdfFileName); Response.Write("pdf creation successfully with password -http://aspnettutorialonline.blogspot.com/"); } catch (Exception ex) { Response.Write(ex.Message); } } public void CreatePDFFromHTMLFile(string HtmlStream, string FileName) { try { object TargetFile = FileName; string ModifiedFileName = string.Empty; string FinalFileName = string.Empty; /* To add a Password to PDF -http://aspnettutorialonline.blogspot.com/ */ TestPDF.HtmlToPdfBuilder builder = new TestPDF.HtmlToPdfBuilder(iTextSharp.text.PageSize.A4); TestPDF.HtmlPdfPage first = builder.AddPage(); first.AppendHtml(HtmlStream); byte[] file = builder.RenderPdf(); File.WriteAllBytes(TargetFile.ToString(), file); iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(TargetFile.ToString()); ModifiedFileName = TargetFile.ToString(); ModifiedFileName = ModifiedFileName.Insert(ModifiedFileName.Length - 4, "1"); string password = "password"; iTextSharp.text.pdf.PdfEncryptor.Encrypt(reader, new FileStream(ModifiedFileName, FileMode.Append), iTextSharp.text.pdf.PdfWriter.STRENGTH128BITS, password, "", iTextSharp.text.pdf.PdfWriter.AllowPrinting); //http://aspnettutorialonline.blogspot.com/ reader.Close(); if (File.Exists(TargetFile.ToString())) File.Delete(TargetFile.ToString()); FinalFileName = ModifiedFileName.Remove(ModifiedFileName.Length - 5, 1); File.Copy(ModifiedFileName, FinalFileName); if (File.Exists(ModifiedFileName)) File.Delete(ModifiedFileName); } catch (Exception ex) { throw ex; } }

您可以下载示例文件。只要把你想要转换的html files夹中运行。它会自动生成PDF文件，并将其放在同一个文件夹中。但在你的情况下，你可以在htmlFileNamevariables中指定你的htmlpath。

如何使用iTextSharp将HTML转换为PDF

2017年的更新

超时过期。操作完成之前超时的时间或服务器没有响应。声明已经终止

生成密码安全的身份validation令牌

备用属性名称，而反序列化

在C和C ++中返回voidtypes

atol（）v / s。与strtol（）

std :: system_clock和std :: steady_clock之间的区别？

std :: map默认值

如何从urlstring中删除端口号

正则expression式匹配超过2个空格，但不是新行

C＃属性：如何使用自定义设置属性没有私人领域？

如何使用iTextSharp将HTML转换为PDF

2017年的更新

超时过期。 操作完成之前超时的时间或服务器没有响应。 声明已经终止

生成密码安全的身份validation令牌

备用属性名称，而反序列化

在C和C ++中返回voidtypes

atol（）v / s。 与strtol（）

std :: system_clock和std :: steady_clock之间的区别？

std :: map默认值

如何从urlstring中删除端口号

正则expression式匹配超过2个空格，但不是新行

C＃属性：如何使用自定义设置属性没有私人领域？

超时过期。操作完成之前超时的时间或服务器没有响应。声明已经终止

atol（）v / s。与strtol（）