Fri, 23 Apr 2021 21:03:34 +0100
README updates
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>LuaExpat: XML Expat parsing for the Lua programming language</title> <link rel="stylesheet" href="../doc.css" type="text/css"/> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> </head> <body> <div id="container"> <div id="product"> <div id="product_logo"><a href="http://www.keplerproject.org"> <img alt="LuaExpat logo" src="luaexpat.png"/> </a></div> <div id="product_name"><big><strong>LuaExpat</strong></big></div> <div id="product_description">XML Expat parsing for the Lua programming language</div> </div> <!-- id="product" --> <div id="main"> <div id="navigation"> <h1>LuaExpat</h1> <ul> <li><a href="index.html">Home</a> <ul> <li><a href="index.html#overview">Overview</a></li> <li><a href="index.html#status">Status</a></li> <li><a href="index.html#download">Download</a></li> <li><a href="index.html#history">History</a></li> <li><a href="index.html#references">References</a></li> <li><a href="index.html#credits">Credits</a></li> <li><a href="index.html#contact">Contact</a></li> </ul> </li> <li><strong>Manual</strong> <ul> <li><a href="manual.html#introduction">Introduction</a></li> <li><a href="manual.html#installation">Installation</a></li> <li><a href="manual.html#parser">Parser Objects</a></li> </ul> </li> <li><a href="examples.html">Examples</a></li> <li><a href="lom.html">Lua Object Model</a></li> <li><a href="http://luaforge.net/projects/luaexpat/">Project</a> <ul> <li><a href="http://luaforge.net/tracker/?group_id=13">Bug Tracker</a></li> <li><a href="http://luaforge.net/scm/?group_id=13">CVS</a></li> </ul> </li> <li><a href="license.html">License</a></li> </ul> </div> <!-- id="navigation" --> <div id="content"> <h2><a name="introduction"></a>Introduction</h2> <p>LuaExpat is a <a href="http://www.saxproject.org/">SAX</a> XML parser based on the <a href="http://www.libexpat.org/">Expat</a> library. SAX is the <em>Simple API for XML</em> and allows programs to: </p> <ul> <li>process a XML document incrementally, thus being able to handle huge documents without memory penalties;</li> <li>register handler functions which are called by the parser during the processing of the document, handling the document elements or text.</li> </ul> <p>With an event-based API like SAX the XML document can be fed to the parser in chunks, and the parsing begins as soon as the parser receives the first document chunk. LuaExpat reports parsing events (such as the start and end of elements) directly to the application through callbacks. The parsing of huge documents can benefit from this piecemeal operation.</p> <p>LuaExpat is distributed as a library and a file <code>lom.lua</code> that implements the <a href="lom.html">Lua Object Model</a>.</p> <h2><a name="building"></a>Building</h2> <p> LuaExpat could be built to Lua 5.1 or to Lua 5.2. In both cases, the language library and headers files for the desired version must be installed properly. LuaExpat also depends on Expat 2.0.0+ which should also be installed. </p> <p> LuaExpat offers a Makefile and a separate configuration file, <code>config</code>, which should be edited to suit the particularities of the target platform before running <code>make</code>. The file has some definitions like paths to the external libraries, compiler options and the like. One important definition is the version of Lua language, which is not obtained from the installed software. </p> <h2><a name="installation"></a>Installation</h2> <p>The compiled binary file should be copied to a directory in your <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.cpath">C path</a>. Lua 5.0 users should also install <a href="http://www.keplerproject.org/compat">Compat-5.1</a>.</p> <p>Windows users can use the binary version of LuaExpat (<code>lxp.dll</code>, compatible with <a href="http://luabinaries.luaforge.net">LuaBinaries</a>) available at <a href="http://luaforge.net/projects/luaexpat/files">LuaForge</a>.</p> <p>The file <code>lom.lua</code> should be copied to a directory in your <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.path">Lua path</a>.</p> <h2><a name="parser"></a>Parser objects</h2> <p>Usually SAX implementations base all operations on the concept of a parser that allows the registration of callback functions. LuaExpat offers the same functionality but uses a different registration method, based on a table of callbacks. This table contains references to the callback functions which are responsible for the handling of the document parts. The parser will assume no behaviour for any undeclared callbacks.</p> <h4>Constructor</h4> <dl class="reference"> <dt><strong>lxp.new(<em>callbacks [, separator[, merge_character_data]]</em>)</strong></dt> <dd>The parser is created by a call to the function <strong>lxp.new</strong>, which returns the created parser or raises a Lua error. It receives the callbacks table and optionally the parser <a href="#separator"> separator character</a> used in the namespace expanded element names. If <em>merge_character_data</em> is false then LuaExpat will not combine multiple CharacterData calls into one. For more info on this behaviour see CharacterData below.</dd> </dl> <h4>Methods</h4> <dl class="reference"> <dt><strong>parser:close()</strong></dt> <dd>Closes the parser, freeing all memory used by it. A call to parser:close() without a previous call to parser:parse() could result in an error.</dd> <dt><strong>parser:getbase()</strong></dt> <dd>Returns the base for resolving relative URIs.</dd> <dt><strong>parser:getcallbacks()</strong></dt> <dd>Returns the callbacks table.</dd> <dt><strong>parser:parse(s)</strong></dt> <dd>Parse some more of the document. The string <em>s</em> contains part (or perhaps all) of the document. When called without arguments the document is closed (but the parser still has to be closed).<br/> The function returns a non nil value when the parser has been succesfull, and when the parser finds an error it returns five results: nil, <em>msg</em>, <em>line</em>, <em>col</em>, and <em>pos</em>, which are the error message, the line number, column number and absolute position of the error in the XML document.</dd> <dt><strong>parser:pos()</strong></dt> <dd>Returns three results: the current parsing line, column, and absolute position.</dd> <dt><strong>parser:getcurrentbytecount()</strong></dt> <dd>Return the number of bytes of input corresponding to the current event. This function can only be called inside a handler, in other contexts it will return 0. Do not use inside a CharacterData handler unless CharacterData merging has been disabled (see lxp.new).</dd> <dt><strong>parser:setbase(base)</strong></dt> <dd>Sets the <em>base</em> to be used for resolving relative URIs in system identifiers.</dd> <dt><strong>parser:setencoding(encoding)</strong></dt> <dd>Set the encoding to be used by the parser. There are four built-in encodings, passed as strings: "US-ASCII", "UTF-8", "UTF-16", and "ISO-8859-1".</dd> <dt><strong>parser:stop()</strong></dt> <dd>Abort the parser and prevent it from parsing any further through the data it was last passed. Use to halt parsing the document when an error is discovered inside a callback, for example. The parser object cannot accept more data after this call.</dd> </dl> <h4>Callbacks</h4> <p>The Lua callbacks define the handlers of the parser events. The use of a table in the parser constructor has some advantages over the registration of callbacks, since there is no need for for the API to provide a way to manipulate callbacks.</p> <p>Another difference lies in the behaviour of the callbacks during the parsing itself. The callback table contains references to the functions that can be redefined at will. The only restriction is that only the callbacks present in the table at creation time will be called.</p> <p>The callbacks table indices are named after the equivalent Expat callbacks:<br /> <em>CharacterData</em>, <em>Comment</em>, <em>Default</em>, <em>DefaultExpand</em>, <em>EndCDataSection</em>, <em>EndElement</em>, <em>EndNamespaceDecl</em>, <em>ExternalEntityRef</em>, <em>NotStandalone</em>, <em>NotationDecl</em>, <em>ProcessingInstruction</em>, <em>StartCDataSection</em>, <em>StartElement</em>, <em>StartNamespaceDecl</em>, <em>UnparsedEntityDecl</em>, <em>XmlDecl</em> and <em>StartDoctypeDecl</em>.</p> <p>These indices can be references to functions with specific signatures, as seen below. The parser constructor also checks the presence of a field called <em>_nonstrict</em> in the callbacks table. If <em>_nonstrict</em> is absent, only valid callback names are accepted as indices in the table (Defaultexpanded would be considered an error for example). If <em>_nonstrict</em> is defined, any other fieldnames can be used (even if not called at all).</p> <p>The callbacks can optionally be defined as <code>false</code>, acting thus as placeholders for future assignment of functions.</p> <p>Every callback function receives as the first parameter the calling parser itself, thus allowing the same functions to be used for more than one parser for example.</p> <dl class="reference"> <dt><strong>callbacks.CharacterData = function(parser, string)</strong></dt> <dd>Called when the <em>parser</em> recognizes an XML CDATA <em>string</em>. Note that LuaExpat automatically combines multiple CharacterData events from Expat into a single call to this handler, unless <em>merge_character_data</em> is set to false when calling lxp.new().</dd> <dt><strong>callbacks.Comment = function(parser, string)</strong></dt> <dd>Called when the <em>parser</em> recognizes an XML comment <em>string</em>.</dd> <dt><strong>callbacks.Default = function(parser, string)</strong></dt> <dd>Called when the <em>parser</em> has a <em>string</em> corresponding to any characters in the document which wouldn't otherwise be handled. Using this handler has the side effect of turning off expansion of references to internally defined general entities. Instead these references are passed to the default handler.</dd> <dt><strong>callbacks.DefaultExpand = function(parser, string)</strong></dt> <dd>Called when the <em>parser</em> has a <em>string</em> corresponding to any characters in the document which wouldn't otherwise be handled. Using this handler doesn't affect expansion of internal entity references.</dd> <dt><strong>callbacks.EndCdataSection = function(parser)</strong></dt> <dd>Called when the <em>parser</em> detects the end of a CDATA section.</dd> <dt><strong>callbacks.EndElement = function(parser, elementName)</strong></dt> <dd>Called when the <em>parser</em> detects the ending of an XML element with <em>elementName</em>.</dd> <dt><strong>callbacks.EndNamespaceDecl = function(parser, namespaceName)</strong></dt> <dd>Called when the <em>parser</em> detects the ending of an XML namespace with <em>namespaceName</em>. The handling of the end namespace is done after the handling of the end tag for the element the namespace is associated with.</dd> <dt><strong>callbacks.ExternalEntityRef = function(parser, subparser, base, systemId, publicId)</strong></dt> <dd>Called when the <em>parser</em> detects an external entity reference.<br/><br/> The <em>subparser</em> is a LuaExpat parser created with the same callbacks and Expat context as the <em>parser</em> and should be used to parse the external entity.<br/> The <em>base</em> parameter is the base to use for relative system identifiers. It is set by parser:setbase and may be nil.<br/> The <em>systemId</em> parameter is the system identifier specified in the entity declaration and is never nil.<br/> The <em>publicId</em> parameter is the public id given in the entity declaration and may be nil.</dd> <dt><strong>callbacks.NotStandalone = function(parser)</strong></dt> <dd>Called when the <em>parser</em> detects that the document is not "standalone". This happens when there is an external subset or a reference to a parameter entity, but the document does not have standalone set to "yes" in an XML declaration.</dd> <dt><strong>callbacks.NotationDecl = function(parser, notationName, base, systemId, publicId)</strong></dt> <dd>Called when the <em>parser</em> detects XML notation declarations with <em>notationName</em><br/> The <em>base</em> parameter is the base to use for relative system identifiers. It is set by parser:setbase and may be nil.<br/> The <em>systemId</em> parameter is the system identifier specified in the entity declaration and is never nil.<br/> The <em>publicId</em> parameter is the public id given in the entity declaration and may be nil.</dd> <dt><strong>callbacks.ProcessingInstruction = function(parser, target, data)</strong></dt> <dd>Called when the <em>parser</em> detects XML processing instructions. The <em>target</em> is the first word in the processing instruction. The <em>data</em> is the rest of the characters in it after skipping all whitespace after the initial word.</dd> <dt><strong>callbacks.StartCdataSection = function(parser)</strong></dt> <dd>Called when the <em>parser</em> detects the begining of an XML CDATA section.</dd> <dt><strong>callbacks.XmlDecl = function(parser, version, encoding)</strong></dt> <dd>Called when the <em>parser</em> encounters an XML document declaration (these are optional, and valid only at the start of the document). The callback receives the declared XML <em>version</em> and document <em>encoding</em>.</dd> <dt><strong>callbacks.StartElement = function(parser, elementName, attributes)</strong></dt> <dd>Called when the <em>parser</em> detects the begining of an XML element with <em>elementName</em>.<br/> The <em>attributes</em> parameter is a Lua table with all the element attribute names and values. The table contains an entry for every attribute in the element start tag and entries for the default attributes for that element.<br/> The attributes are listed by name (including the inherited ones) and by position (inherited attributes are not considered in the position list).<br/> As an example if the <em>book</em> element has attributes <em>author</em>, <em>title</em> and an optional <em>format</em> attribute (with "printed" as default value), <pre class="example"> <book author="Ierusalimschy, Roberto" title="Programming in Lua"> </pre> would be represented as<br/> <pre class="example"> {[1] = "Ierusalimschy, Roberto", [2] = "Programming in Lua", author = "Ierusalimschy, Roberto", format = "printed", title = "Programming in Lua"} </pre></dd> <dt><strong>callbacks.StartNamespaceDecl = function(parser, namespaceName)</strong></dt> <dd>Called when the <em>parser</em> detects an XML namespace declaration with <em>namespaceName</em>. Namespace declarations occur inside start tags, but the StartNamespaceDecl handler is called before the StartElement handler for each namespace declared in that start tag.</dd> <dt><strong>callbacks.UnparsedEntityDecl = function(parser, entityName, base, systemId, publicId, notationName)</strong></dt> <dd>Called when the <em>parser</em> receives declarations of unparsed entities. These are entity declarations that have a notation (NDATA) field.<br/> As an example, in the chunk <pre class="example"> <!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> </pre> <em>entityName</em> would be "logo", <em>systemId</em> would be "images/logo.gif" and <em>notationName</em> would be "gif". For this example the <em>publicId</em> parameter would be nil. The <em>base</em> parameter would be whatever has been set with <code>parser:setbase</code>. If not set, it would be nil.</dd> <dt><strong>callbacks.StartDoctypeDecl = function(parser, name, sysid, pubid, has_internal_subset)</strong></dt> <dd>Called when the <em>parser</em> detects the beginning of an XML DTD (DOCTYPE) section. These precede the XML root element and take the form: <pre class="example"> <!DOCTYPE root_elem PUBLIC "example"> </pre> </dd> </dl> <h4><a name="separator"></a>The separator character</h4> <p>The optional separator character in the parser constructor defines the character used in the namespace expanded element names. The separator character is optional (if not defined the parser will not handle namespaces) but if defined it must be different from the character '\0'.</p> </div> <!-- id="content" --> </div> <!-- id="main" --> </div> <!-- id="container" --> </body> </html>