After a bit of googling, it turns out that I would have been wrong to say it wasn't possible, because XSLT can do almost everything to plain text files that Perl or other text-processing languages can. In particular, you can apply regular expressions to the plain text to do things like removing whitespace or substituting characters.
I don't think I'm the only one who was unaware of this capability: Wikipedia says that XSLT is
used for the transformation of XML documents, while OxygenXML's XSLT debugger doesn't even work if you select a text file as input; you have to pass the name of the text file to the XSLT file as an input parameter. More on this later.
To process plain text with XSLT, you need to use a couple of functions that were new with XSLT 2.0, so you must use an XSLT 2.0 processor such as Saxon 9. For the examples that follow, I used the Saxon EE-9.3.0.5 that comes with OxygenXML 13.2.
The following example is based on an FAQ page maintained by Dave Pawson:
- Start OxygenXML and create a new XSL 2.0 file.
- Specify XML as the output method and declare a parameter called input that will hold the name of the input file (because we need to pass the name of the text file as an input parameter, remember?):
<xsl:stylesheet xmlns:xsl= http://www.w3.org/1999/XSL/Transform xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:output method="xml" indent="yes" encoding="utf-8" /> <xsl:param name="input" as="xs:string" required="yes"/>
- Next, declare a variable that will contain the entire contents of the text file as a single string. I've wrapped everything in
<doc>
tags:
<xsl:variable name="src"> <doc> <xsl:for-each select="tokenize(unparsed-text($input, 'iso-8859-1'), '\r\n')"> <line><xsl:value-of select="."/></line> </xsl:for-each> </doc> </xsl:variable>
Several XSLT functions are used here:
unparsed-text
reads the contents of the text file (identified by the variable$input
) into a stringtokenize
then splits this string up into a series of strings at each CRLF character, that is, at the end of each line.- Finally, the
<xsl:for-each.. />
instruction processes each string in turn and wraps it in<line>
tags to make the XML output a bit more legible.
- The only template necessary generates the XML output file and copyies the modified contents of the
$src
variable to it:
<xsl:template match="/"> <xsl:result-document href = "src1.xml"> <xsl:copy-of select="$src"/> </xsl:result-document> </xsl:template>
- Before running this script through OxygenXML's debugger, you need to define the input parameter:
- Switch to the XSLT Debugger (Window < Open Perspective < XSLT Debugger)
- Click the Configure parameters button
on the toolbar.
- Click New
- Type
input for the Name and the name of a text file as the Value, for exampleregex.txt
- Click OK twice.
- Switch to the XSLT Debugger (Window < Open Perspective < XSLT Debugger)
- Now select the XSL file in the XSL box, select any XSL file in the XML box (OxygenXML ignores it), and run the debugger. The output appears in the results window, for example:
<doc> <line>27/09/2003 12:36 4,500 andAndOr.xml</line> <line>27/09/2003 12:36 2,565 apply-imports.xml</line> <line>27/09/2003 12:36 2,054 applytemplates.xml</line> <line>22/03/2004 15:53 16,141 approaches.xml</line> ... </doc>