To install the package on Ubuntu, use: There are 31 tools in this package, here is a summary of what they can do:
cexport – create headerfile of exported declarations from a C file hxaddid – add ID’s to selected elements hxcite- replace bibliographic references by hyperlinks hxcite-mkbib- expand references and create bibliography hxcopy- copy an HTML file while preserving relative links hxcount – count elements and attributes in HTML or XML files hxextract – extract selected elements hxclean – apply heuristics to correct an HTML file hxprune – remove marked elements from an HTML file hxincl- expand included HTML or XML files hxindex – create an alphabetically sorted index hxmkbib – create bibliography from a template hxmultitoc- create a table of contents for a set of HTML files hxname2id- move some ID= or NAME= from A elements to their parents hxnormalize – pretty-print an HTML file hxnum – number section headings in an HTML file hxpipe- convert XML to a format easier to parse with Perl or AWK hxprintlinks- number links & add table of URLs at end of an HTML file hxremove- remove selected elements from an XML file hxtabletrans- transpose an HTML or XHTML table hxtoc – insert a table of contents in an HTML file hxuncdata – replace CDATA sections by character entities hxunent – replace HTML predefined character entities to UTF-8 hxunpipe- convert output of pipe back to XML format hxunxmlns – replace “global names” by XML Namespace prefixes hxwls – list links in an HTML file hxxmlns – replace XML Namespace prefixes by “global names” asc2xml, xml2asc- convert between UTF8 and entities hxref – generate cross-references hxselect- extract elements that match a (CSS) selector
To introduce you to the power of this tool set, here are some examples on how you would use a few of the commands. The “hxnormalize” command will reformat an HTML file so that it is easy to read and nicely formatted. To test this command, we will create an ugly HTML. Select and copy the following lines and paste them directly into a terminal window. This will create a file called test.html. The HTML is missing some of the closing tags and is all written in one line. The hxnormalize command will reformat the file and write the pretty version to the standard output (stdout). Here is how you run the command: The “-e” flag tells hxnormalize to insert any missing closing tags.
You can also run the command against a web page by replacing “test.html” with a URL, for example: The hxwls command will parse a local HTML file or a website, and list the links within the HTML. For example: Here is the first few lines of output for the Make Tech Easier website:
The hxtabletrans command changes a table so that rows become columns and columns become rows. Let’s create an HTML file with a simple table. Select and copy the following lines, and then paste them directly into a terminal window. The result is a file called table.html. In a web browser the table would look something like this: If you run the hxtabletrans command, then it will write the transposed table to the standard output. The results can be redirected to another file like this: The new file, table2.html, will show Jill Smith and Eve Jackson in columns, rather than in rows as in the original. The resulting table will be something like this: Most of the commands are used in a similar way to the examples above, i.e. you need to specify a file or URL to process and the output is written to the stdout. Try experimenting with the different commands as you might find them useful. If you have any questions about the HTML-XML utilities then please feel free to ask them in the comments below and we will see if we can help.