doc/us/manual.html

changeset 0
24d141cb2d1e
child 5
4570d6616c99
equal deleted inserted replaced
-1:000000000000 0:24d141cb2d1e
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html>
4 <head>
5 <title>LuaExpat: XML Expat parsing for the Lua programming language</title>
6 <link rel="stylesheet" href="http://www.keplerproject.org/doc.css" type="text/css"/>
7 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
8 </head>
9 <body>
10
11 <div id="container">
12
13 <div id="product">
14 <div id="product_logo"><a href="http://www.keplerproject.org">
15 <img alt="LuaExpat logo" src="luaexpat.png"/>
16 </a></div>
17 <div id="product_name"><big><strong>LuaExpat</strong></big></div>
18 <div id="product_description">XML Expat parsing for the Lua programming language</div>
19 </div> <!-- id="product" -->
20
21 <div id="main">
22
23 <div id="navigation">
24 <h1>LuaExpat</h1>
25 <ul>
26 <li><a href="index.html">Home</a>
27 <ul>
28 <li><a href="index.html#overview">Overview</a></li>
29 <li><a href="index.html#status">Status</a></li>
30 <li><a href="index.html#download">Download</a></li>
31 <li><a href="index.html#history">History</a></li>
32 <li><a href="index.html#references">References</a></li>
33 <li><a href="index.html#credits">Credits</a></li>
34 <li><a href="index.html#contact">Contact</a></li>
35 </ul>
36 </li>
37 <li><strong>Manual</strong>
38 <ul>
39 <li><a href="manual.html#introduction">Introduction</a></li>
40 <li><a href="manual.html#installation">Installation</a></li>
41 <li><a href="manual.html#parser">Parser Objects</a></li>
42 </ul>
43 </li>
44 <li><a href="examples.html">Examples</a></li>
45 <li><a href="lom.html">Lua Object Model</a></li>
46 <li><a href="http://luaforge.net/projects/luaexpat/">Project</a>
47 <ul>
48 <li><a href="http://luaforge.net/tracker/?group_id=13">Bug Tracker</a></li>
49 <li><a href="http://luaforge.net/scm/?group_id=13">CVS</a></li>
50 </ul>
51 </li>
52 <li><a href="license.html">License</a></li>
53 </ul>
54 </div> <!-- id="navigation" -->
55
56 <div id="content">
57
58 <h2><a name="introduction"></a>Introduction</h2>
59
60 <p>LuaExpat is a <a href="http://www.saxproject.org/">SAX</a> XML
61 parser based on the <a href="http://www.libexpat.org/">Expat</a> library.
62 SAX is the <em>Simple API for XML</em> and allows programs to:
63 </p>
64
65 <ul>
66 <li>process a XML document incrementally, thus being able to handle
67 huge documents without memory penalties;</li>
68
69 <li>register handler functions which are called by the parser during
70 the processing of the document, handling the document elements or
71 text.</li>
72 </ul>
73
74 <p>With an event-based API like SAX the XML document can be fed to
75 the parser in chunks, and the parsing begins as soon as the parser
76 receives the first document chunk. LuaExpat reports parsing events
77 (such as the start and end of elements) directly to the application
78 through callbacks. The parsing of huge documents can benefit from
79 this piecemeal operation.</p>
80
81 <p>LuaExpat is distributed as a library and a file <code>lom.lua</code> that
82 implements the <a href="lom.html">Lua Object Model</a>.</p>
83
84
85 <h2><a name="building"></a>Building</h2>
86
87 <p>
88 LuaExpat could be built to Lua 5.0 or to Lua 5.1.
89 In both cases,
90 the language library and headers files for the desired version
91 must be installed properly.
92 LuaExpat also depends on Expat 2.0.0 which should also be installed.
93 </p>
94 <p>
95 LuaExpat offers a Makefile and a separate configuration file,
96 <code>config</code>,
97 which should be edited to suit the particularities of the target platform
98 before running
99 <code>make</code>.
100 The file has some definitions like paths to the external libraries,
101 compiler options and the like.
102 One important definition is the version of Lua language,
103 which is not obtained from the installed software.
104 </p>
105
106
107 <h2><a name="installation"></a>Installation</h2>
108
109 <p>The compiled binary file should be copied to a directory in your
110 <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.cpath">C path</a>.
111 Lua 5.0 users should also install
112 <a href="http://www.keplerproject.org/compat">Compat-5.1</a>.</p>
113
114 <p>Windows users can use the binary version of LuaExpat (<code>lxp.dll</code>, compatible with
115 <a href="http://luabinaries.luaforge.net">LuaBinaries</a>) available at
116 <a href="http://luaforge.net/projects/luaexpat/files">LuaForge</a>.</p>
117
118 <p>The file <code>lom.lua</code> should be copied to a directory in your
119 <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.path">Lua path</a>.</p>
120
121 <h2><a name="parser"></a>Parser objects</h2>
122
123 <p>Usually SAX implementations base all operations on the
124 concept of a parser that allows the registration of callback
125 functions. LuaExpat offers the same functionality but uses a
126 different registration method, based on a table of callbacks. This
127 table contains references to the callback functions which are
128 responsible for the handling of the document parts. The parser will
129 assume no behaviour for any undeclared callbacks.</p>
130
131 <h4>Constructor</h4>
132
133 <dl class="reference">
134 <dt><strong>lxp.new(<em>callbacks [, separator]</em>)</strong></dt>
135 <dd>The parser is created by a call to the function <strong>lxp.new</strong>,
136 which returns the created parser or raises a Lua error. It
137 receives the callbacks table and optionally the parser <a href="#separator">
138 separator character</a> used in the namespace expanded element names.</dd>
139 </dl>
140
141 <h4>Methods</h4>
142
143 <dl class="reference">
144 <dt><strong>parser:close()</strong></dt>
145 <dd>Closes the parser, freeing all memory used by it. A call to
146 parser:close() without a previous call to parser:parse() could
147 result in an error.</dd>
148
149 <dt><strong>parser:getbase()</strong></dt>
150 <dd>Returns the base for resolving relative URIs.</dd>
151
152 <dt><strong>parser:getcallbacks()</strong></dt>
153 <dd>Returns the callbacks table.</dd>
154
155 <dt><strong>parser:parse(s)</strong></dt>
156 <dd>Parse some more of the document. The string <em>s</em> contains
157 part (or perhaps all) of the document. When called without
158 arguments the document is closed (but the parser still has to be
159 closed).<br/>
160 The function returns a non nil value when the parser has been
161 succesfull, and when the parser finds an error it returns five
162 results: nil, <em>msg</em>, <em>line</em>, <em>col</em>, and
163 <em>pos</em>, which are the error message, the line number,
164 column number and absolute position of the error in the XML document.</dd>
165
166 <dt><strong>parser:pos()</strong></dt>
167 <dd>Returns three results: the current parsing line, column, and
168 absolute position.</dd>
169
170 <dt><strong>parser:setbase(base)</strong></dt>
171 <dd>Sets the <em>base</em> to be used for resolving relative URIs in
172 system identifiers.</dd>
173
174 <dt><strong>parser:setencoding(encoding)</strong></dt>
175 <dd>Set the encoding to be used by the parser. There are four
176 built-in encodings, passed as strings: "US-ASCII",
177 "UTF-8", "UTF-16", and "ISO-8859-1".</dd>
178 </dl>
179
180 <h4>Callbacks</h4>
181
182 <p>The Lua callbacks define the handlers of the parser events. The
183 use of a table in the parser constructor has some advantages over
184 the registration of callbacks, since there is no need for for the API
185 to provide a way to manipulate callbacks.</p>
186
187 <p>Another difference lies in the behaviour of the callbacks during
188 the parsing itself. The callback table contains references to the
189 functions that can be redefined at will. The only restriction is
190 that only the callbacks present in the table at creation time
191 will be called.</p>
192
193 <p>The callbacks table indices are named after the equivalent Expat
194 callbacks:<br />
195 <em>CharacterData</em>, <em>Comment</em>,
196 <em>Default</em>, <em>DefaultExpand</em>, <em>EndCDataSection</em>,
197 <em>EndElement</em>, <em>EndNamespaceDecl</em>,
198 <em>ExternalEntityRef</em>, <em>NotStandalone</em>,
199 <em>NotationDecl</em>, <em>ProcessingInstruction</em>,
200 <em>StartCDataSection</em>, <em>StartElement</em>,
201 <em>StartNamespaceDecl</em>, and <em>UnparsedEntityDecl</em>.</p>
202
203 <p>These indices can be references to functions with
204 specific signatures, as seen below. The parser constructor also
205 checks the presence of a field called <em>_nonstrict</em> in the
206 callbacks table. If <em>_nonstrict</em> is absent, only valid
207 callback names are accepted as indices in the table
208 (Defaultexpanded would be considered an error for example). If
209 <em>_nonstrict</em> is defined, any other fieldnames can be
210 used (even if not called at all).</p>
211
212 <p>The callbacks can optionally be defined as <code>false</code>,
213 acting thus as placeholders for future assignment of functions.</p>
214
215 <p>Every callback function receives as the first parameter the
216 calling parser itself, thus allowing the same functions to be used
217 for more than one parser for example.</p>
218
219 <dl class="reference">
220 <dt><strong>callbacks.CharacterData = function(parser, string)</strong></dt>
221 <dd>Called when the <em>parser</em> recognizes an XML CDATA <em>string</em>.</dd>
222
223 <dt><strong>callbacks.Comment = function(parser, string)</strong></dt>
224 <dd>Called when the <em>parser</em> recognizes an XML comment
225 <em>string</em>.</dd>
226
227 <dt><strong>callbacks.Default = function(parser, string)</strong></dt>
228 <dd>Called when the <em>parser</em> has a <em>string</em>
229 corresponding to any characters in the document which wouldn't
230 otherwise be handled. Using this handler has the side effect of
231 turning off expansion of references to internally defined general
232 entities. Instead these references are passed to the default
233 handler.</dd>
234
235 <dt><strong>callbacks.DefaultExpand = function(parser, string)</strong></dt>
236 <dd>Called when the <em>parser</em> has a <em>string</em>
237 corresponding to any characters in the document which wouldn't
238 otherwise be handled. Using this handler doesn't affect expansion
239 of internal entity references.</dd>
240
241 <dt><strong>callbacks.EndCdataSection = function(parser)</strong></dt>
242 <dd>Called when the <em>parser</em> detects the end of a CDATA
243 section.</dd>
244
245 <dt><strong>callbacks.EndElement = function(parser, elementName)</strong></dt>
246 <dd>Called when the <em>parser</em> detects the ending of an XML
247 element with <em>elementName</em>.</dd>
248
249 <dt><strong>callbacks.EndNamespaceDecl = function(parser, namespaceName)</strong></dt>
250 <dd>Called when the <em>parser</em> detects the ending of an XML
251 namespace with <em>namespaceName</em>. The handling of the end
252 namespace is done after the handling of the end tag for the element
253 the namespace is associated with.</dd>
254
255 <dt><strong>callbacks.ExternalEntityRef = function(parser, subparser, base, systemId, publicId)</strong></dt>
256 <dd>Called when the <em>parser</em> detects an external entity
257 reference.<br/><br/>
258 The <em>subparser</em> is a LuaExpat parser created with the
259 same callbacks and Expat context as the <em>parser</em> and should
260 be used to parse the external entity.<br/>
261 The <em>base</em> parameter is the base to use for relative
262 system identifiers. It is set by parser:setbase and may be nil.<br/>
263 The <em>systemId</em> parameter is the system identifier
264 specified in the entity declaration and is never nil.<br/>
265 The <em>publicId</em> parameter is the public id given in the
266 entity declaration and may be nil.</dd>
267
268 <dt><strong>callbacks.NotStandalone = function(parser)</strong></dt>
269 <dd>Called when the <em>parser</em> detects that the document is not
270 "standalone". This happens when there is an external subset or a
271 reference to a parameter entity, but the document does not have standalone set
272 to "yes" in an XML declaration.</dd>
273
274 <dt><strong>callbacks.NotationDecl = function(parser, notationName, base, systemId, publicId)</strong></dt>
275 <dd>Called when the <em>parser</em> detects XML notation
276 declarations with <em>notationName</em><br/>
277 The <em>base</em> parameter is the base to use for relative
278 system identifiers. It is set by parser:setbase and may be nil.<br/>
279 The <em>systemId</em> parameter is the system identifier
280 specified in the entity declaration and is never nil.<br/>
281 The <em>publicId</em> parameter is the public id given in the
282 entity declaration and may be nil.</dd>
283
284 <dt><strong>callbacks.ProcessingInstruction = function(parser, target, data)</strong></dt>
285 <dd>Called when the <em>parser</em> detects XML processing
286 instructions. The <em>target</em> is the first word in the
287 processing instruction. The <em>data</em> is the rest of the
288 characters in it after skipping all whitespace after the initial
289 word.</dd>
290
291 <dt><strong>callbacks.StartCdataSection = function(parser)</strong></dt>
292 <dd>Called when the <em>parser</em> detects the begining of an XML
293 CDATA section.</dd>
294
295 <dt><strong>callbacks.StartElement = function(parser, elementName, attributes)</strong></dt>
296 <dd>Called when the <em>parser</em> detects the begining of an XML
297 element with <em>elementName</em>.<br/>
298 The <em>attributes</em> parameter is a Lua table with all the
299 element attribute names and values. The table contains an entry for
300 every attribute in the element start tag and entries for the
301 default attributes for that element.<br/>
302 The attributes are listed by name (including the inherited ones)
303 and by position (inherited attributes are not considered in the
304 position list).<br/>
305 As an example if the <em>book</em> element has attributes
306 <em>author</em>, <em>title</em> and an optional <em>format</em>
307 attribute (with "printed" as default value),
308 <pre class="example">
309 &lt;book author="Ierusalimschy, Roberto" title="Programming in Lua"&gt;
310 </pre>
311 would be represented as<br/>
312 <pre class="example">
313 {[1] = "Ierusalimschy, Roberto",
314 [2] = "Programming in Lua",
315 author = "Ierusalimschy, Roberto",
316 format = "printed",
317 title = "Programming in Lua"}
318 </pre></dd>
319
320 <dt><strong>callbacks.StartNamespaceDecl = function(parser, namespaceName)</strong></dt>
321 <dd>Called when the <em>parser</em> detects an XML namespace
322 declaration with <em>namespaceName</em>. Namespace declarations
323 occur inside start tags, but the StartNamespaceDecl handler is
324 called before the StartElement handler for each namespace declared
325 in that start tag.</dd>
326
327 <dt><strong>callbacks.UnparsedEntityDecl = function(parser, entityName, base, systemId, publicId, notationName)</strong></dt>
328 <dd>Called when the <em>parser</em> receives declarations of
329 unparsed entities. These are entity declarations that have a
330 notation (NDATA) field.<br/>
331 As an example, in the chunk
332 <pre class="example">
333 &lt;!ENTITY logo SYSTEM "images/logo.gif" NDATA gif&gt;
334 </pre>
335 <em>entityName</em> would be "logo", <em>systemId</em> would be
336 "images/logo.gif" and <em>notationName</em> would be "gif".
337 For this example the <em>publicId</em> parameter would be nil.
338 The <em>base</em> parameter would be whatever has been set with
339 <code>parser:setbase</code>. If not set, it would be nil.</dd>
340 </dl>
341
342 <h4><a name="separator"></a>The separator character</h4>
343
344 <p>The optional separator character in the parser constructor
345 defines the character used in the namespace expanded element names.
346 The separator character is optional (if not defined the parser will
347 not handle namespaces) but if defined it must be different from
348 the character '\0'.</p>
349
350 </div> <!-- id="content" -->
351
352 </div> <!-- id="main" -->
353
354 <div id="about">
355 <p><a href="http://validator.w3.org/check?uri=referer">
356 <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" /></a></p>
357 <p><small>$Id: manual.html,v 1.27 2007/06/05 20:03:12 carregal Exp $</small></p>
358 </div> <!-- id="about" -->
359
360 </div> <!-- id="container" -->
361
362 </body>
363 </html>

mercurial