HTML to PDF Online API. Installation; Getting Started. HTML to PDF Converter is one of the apps that could convert HTML files to PDF documents with minimum user input. And truth is, the application was especially developed to be easy to use and you can clearly see that.Convert HTML content to PDF format. Making Web content available as PDF is one way to facilitate the dissemination of content. In some industries, providing access to print- formatted documents, such as employee benefit descriptions, is mandatory. The law actually dictates that summary plan descriptions (SPDs) be made available in print format even though the content may be provided online. Just printing the Webpage is not sufficient because the print format must include a table of contents with page number references. To add such functionality to a Webpage, developers can convert the HTML content to PDF format; this article illustrates how. The method illustrated here to perform the conversion uses only open source components. Commercial products also support dynamic document generation. Adobe has the Document Server product line, for example; however, its cost is substantial. Using an open source solution mitigates the cost factor while adding source code transparency. The conversion consists of three steps: Convert the HTML to XHTMLConvert the XHTML document to XSL- FO (Extensible Stylesheet Language Formatting Objects) using an XSL stylesheet and an XSLT transformer. Pass the XSL- FO document to a formatter to generate the target PDF document. This article demonstrates how to perform the translations using the command line interfaces provided by the tools and then introduces a Java program that uses the DOM (Document Object Model) interfaces. Component versions. The code in this article was tested with the following versions: Component. Version. JDK1. 5. The inputs and outputs of the steps are shown in the figure below. Translation. Using the three tools' command line interfaces allows for an easy way to get started. However, this approach is not suitable for a production- level system because of the temporary intermediate files that would be written to disk. This extra I/O would result in poor performance. Later in this article, the issue of temporary files becomes moot when the three tools are invoked by a Java program. Step 1: HTML to XHTMLThe first step is to translate the HTML file to a new XHTML file. Of course if the starting point for the conversion is already XHTML, then this step does not apply. I used JTidy to perform the translation. JTidy is a Java port of the Tidy HTML parser. In the process of translating to XHTML, JTidy also adds missing close tags to create a well- formed XML document. I used the most recent version listed (r. Source. Forge Website. To run JTidy, use the following tidy. Tidy. jar org. w. Tidy - asxml >. This script sets the CLASSPATH variable and invokes JTidy. To run JTidy, the input file is passed as a command line argument. By default, the generated XHTML is directed to standard output. The - modify switch can also be used to overwrite the input file. The - asxml switch directs JTidy to output well- formed XML as opposed to HTML. The script is invoked as: tidy. And the files hello. Hello World. < p> Hello World! To achieve the transformation, I apply an XSL stylesheet processed by an XSLT transformer (Apache Xalan). The stylesheet I used as a starting point is xhtml. Antenna House, a company that sells a commercial formatter for XSL- FO. The xhtml. 2fo. xsl stylesheet specifies how each of the HTML tags is to be translated to the corresponding sequence of XSL- FO formatting commands. For instance, in case of an HTML H2 tag, the translation is defined as: < xsl: template match=. The html: prefix indicates that the H2 tag is in the HTML namespace. The stylesheet's namespaces are specified as attributes to the top- level xsl: stylesheet directive. Looking at the top of the xhtml. One for each of the XSL, XSL- FO, and HTML languages: < xsl: stylesheet version=. An XSL- FO block is a region of text rendered based on the values specified for the block's attributes. The attribute set for H2 is defined in the stylesheet as: < xsl: attribute- set name=. Using an attribute set makes it easy to change the appearance of all blocks in the PDF document that correspond to the same HTML input tag. Simply change the settings in the attribute set; the output of all translations that use the attribute set will change too. The next directive in the translation calls the template named . Its purpose is to check for some common HTML attributes (i. XSL- FO directives. To trigger the translation of any tags nested within the top- level H2 tag, the process- common- attributes- and- children template then calls: < xsl: apply- templates/>. Hence, if the input is < h. Hello < em> there < /em> < /h. H2 would trigger the invocation of the template that translates the < em> tag. The output of translating an H2 tag is: < fo: block start- indent=. The Unix script xalan. CLASSPATH variable with the required jar files before calling Xalan: #/bin/shexport CLASSPATH='../lib/xalan. Impl. jar./lib/xml- apis. CLASSPATH org. apache. Process - IN - XSL xhtml. OUT - tt. Since Xalan requires an XML parser, the Apache Xerces parser and xml- api JARs are referenced in addition to xalan. All of the jar files are bundled with the Xalan distribution. To create an XSL- FO file by applying the stylesheet to the XHTML input, invoke the script: xalan. I like to use the trace option (- tt) with Xalan to display a trace of the templates that are applied. The file hello. fo is shown below: < ? I used Apache FOP (Formatting Objects Processor). FOP partially implements the XSL- FO standard and best supports the PDF output format. There is nascent support for Postscript, and support for Microsoft's RTF (rich text format) is planned. The FOP distribution contains the shell script fop. XSL- FO file as input and generates the specified PDF output file. The Unix script can be run, for example, by: fop. The only prerequisite is to set the environment variable used by the script for the FOP home directory's path. The file hello. pdf contains the output from FOP and is included in this article's source code, which can be downloaded from Resources. Since FOP doesn't currently fully implement the XSL- FO standard, there are some limitations. The subset of the standard that is supported is described in detail in the compliance section of the FOP Website. Java program. Utilizing the DOM APIs of the three tools used in the steps above, I'll now present a Java program that requires two command line arguments (HTML input file and stylesheet) and creates a corresponding PDF. No temporary files are created. First the program creates an Input. Stream for the HTML file. The Input. Stream object is then passed to JTidy. JTidy has method parse. DOM(), which is called to obtain the output XHTML content as a Document object: public static void main(String. For example, instead of < xsl: template match=. The Document that represents the result of the transform is then returned.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
January 2017
Categories |