|
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
|
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
|
3 <html> |
|
4 <head> |
|
5 <title>LuaExpat: XML Expat parsing for the Lua programming language</title> |
|
6 <link rel="stylesheet" href="http://www.keplerproject.org/doc.css" type="text/css"/> |
|
7 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> |
|
8 </head> |
|
9 <body> |
|
10 |
|
11 <div id="container"> |
|
12 |
|
13 <div id="product"> |
|
14 <div id="product_logo"><a href="http://www.keplerproject.org"> |
|
15 <img alt="LuaExpat logo" src="luaexpat.png"/> |
|
16 </a></div> |
|
17 <div id="product_name"><big><strong>LuaExpat</strong></big></div> |
|
18 <div id="product_description">XML Expat parsing for the Lua programming language</div> |
|
19 </div> <!-- id="product" --> |
|
20 |
|
21 <div id="main"> |
|
22 |
|
23 <div id="navigation"> |
|
24 <h1>LuaExpat</h1> |
|
25 <ul> |
|
26 <li><a href="index.html">Home</a> |
|
27 <ul> |
|
28 <li><a href="index.html#overview">Overview</a></li> |
|
29 <li><a href="index.html#status">Status</a></li> |
|
30 <li><a href="index.html#download">Download</a></li> |
|
31 <li><a href="index.html#history">History</a></li> |
|
32 <li><a href="index.html#references">References</a></li> |
|
33 <li><a href="index.html#credits">Credits</a></li> |
|
34 <li><a href="index.html#contact">Contact</a></li> |
|
35 </ul> |
|
36 </li> |
|
37 <li><strong>Manual</strong> |
|
38 <ul> |
|
39 <li><a href="manual.html#introduction">Introduction</a></li> |
|
40 <li><a href="manual.html#installation">Installation</a></li> |
|
41 <li><a href="manual.html#parser">Parser Objects</a></li> |
|
42 </ul> |
|
43 </li> |
|
44 <li><a href="examples.html">Examples</a></li> |
|
45 <li><a href="lom.html">Lua Object Model</a></li> |
|
46 <li><a href="http://luaforge.net/projects/luaexpat/">Project</a> |
|
47 <ul> |
|
48 <li><a href="http://luaforge.net/tracker/?group_id=13">Bug Tracker</a></li> |
|
49 <li><a href="http://luaforge.net/scm/?group_id=13">CVS</a></li> |
|
50 </ul> |
|
51 </li> |
|
52 <li><a href="license.html">License</a></li> |
|
53 </ul> |
|
54 </div> <!-- id="navigation" --> |
|
55 |
|
56 <div id="content"> |
|
57 |
|
58 <h2><a name="introduction"></a>Introduction</h2> |
|
59 |
|
60 <p>LuaExpat is a <a href="http://www.saxproject.org/">SAX</a> XML |
|
61 parser based on the <a href="http://www.libexpat.org/">Expat</a> library. |
|
62 SAX is the <em>Simple API for XML</em> and allows programs to: |
|
63 </p> |
|
64 |
|
65 <ul> |
|
66 <li>process a XML document incrementally, thus being able to handle |
|
67 huge documents without memory penalties;</li> |
|
68 |
|
69 <li>register handler functions which are called by the parser during |
|
70 the processing of the document, handling the document elements or |
|
71 text.</li> |
|
72 </ul> |
|
73 |
|
74 <p>With an event-based API like SAX the XML document can be fed to |
|
75 the parser in chunks, and the parsing begins as soon as the parser |
|
76 receives the first document chunk. LuaExpat reports parsing events |
|
77 (such as the start and end of elements) directly to the application |
|
78 through callbacks. The parsing of huge documents can benefit from |
|
79 this piecemeal operation.</p> |
|
80 |
|
81 <p>LuaExpat is distributed as a library and a file <code>lom.lua</code> that |
|
82 implements the <a href="lom.html">Lua Object Model</a>.</p> |
|
83 |
|
84 |
|
85 <h2><a name="building"></a>Building</h2> |
|
86 |
|
87 <p> |
|
88 LuaExpat could be built to Lua 5.0 or to Lua 5.1. |
|
89 In both cases, |
|
90 the language library and headers files for the desired version |
|
91 must be installed properly. |
|
92 LuaExpat also depends on Expat 2.0.0 which should also be installed. |
|
93 </p> |
|
94 <p> |
|
95 LuaExpat offers a Makefile and a separate configuration file, |
|
96 <code>config</code>, |
|
97 which should be edited to suit the particularities of the target platform |
|
98 before running |
|
99 <code>make</code>. |
|
100 The file has some definitions like paths to the external libraries, |
|
101 compiler options and the like. |
|
102 One important definition is the version of Lua language, |
|
103 which is not obtained from the installed software. |
|
104 </p> |
|
105 |
|
106 |
|
107 <h2><a name="installation"></a>Installation</h2> |
|
108 |
|
109 <p>The compiled binary file should be copied to a directory in your |
|
110 <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.cpath">C path</a>. |
|
111 Lua 5.0 users should also install |
|
112 <a href="http://www.keplerproject.org/compat">Compat-5.1</a>.</p> |
|
113 |
|
114 <p>Windows users can use the binary version of LuaExpat (<code>lxp.dll</code>, compatible with |
|
115 <a href="http://luabinaries.luaforge.net">LuaBinaries</a>) available at |
|
116 <a href="http://luaforge.net/projects/luaexpat/files">LuaForge</a>.</p> |
|
117 |
|
118 <p>The file <code>lom.lua</code> should be copied to a directory in your |
|
119 <a href="http://www.lua.org/manual/5.1/manual.html#pdf-package.path">Lua path</a>.</p> |
|
120 |
|
121 <h2><a name="parser"></a>Parser objects</h2> |
|
122 |
|
123 <p>Usually SAX implementations base all operations on the |
|
124 concept of a parser that allows the registration of callback |
|
125 functions. LuaExpat offers the same functionality but uses a |
|
126 different registration method, based on a table of callbacks. This |
|
127 table contains references to the callback functions which are |
|
128 responsible for the handling of the document parts. The parser will |
|
129 assume no behaviour for any undeclared callbacks.</p> |
|
130 |
|
131 <h4>Constructor</h4> |
|
132 |
|
133 <dl class="reference"> |
|
134 <dt><strong>lxp.new(<em>callbacks [, separator]</em>)</strong></dt> |
|
135 <dd>The parser is created by a call to the function <strong>lxp.new</strong>, |
|
136 which returns the created parser or raises a Lua error. It |
|
137 receives the callbacks table and optionally the parser <a href="#separator"> |
|
138 separator character</a> used in the namespace expanded element names.</dd> |
|
139 </dl> |
|
140 |
|
141 <h4>Methods</h4> |
|
142 |
|
143 <dl class="reference"> |
|
144 <dt><strong>parser:close()</strong></dt> |
|
145 <dd>Closes the parser, freeing all memory used by it. A call to |
|
146 parser:close() without a previous call to parser:parse() could |
|
147 result in an error.</dd> |
|
148 |
|
149 <dt><strong>parser:getbase()</strong></dt> |
|
150 <dd>Returns the base for resolving relative URIs.</dd> |
|
151 |
|
152 <dt><strong>parser:getcallbacks()</strong></dt> |
|
153 <dd>Returns the callbacks table.</dd> |
|
154 |
|
155 <dt><strong>parser:parse(s)</strong></dt> |
|
156 <dd>Parse some more of the document. The string <em>s</em> contains |
|
157 part (or perhaps all) of the document. When called without |
|
158 arguments the document is closed (but the parser still has to be |
|
159 closed).<br/> |
|
160 The function returns a non nil value when the parser has been |
|
161 succesfull, and when the parser finds an error it returns five |
|
162 results: nil, <em>msg</em>, <em>line</em>, <em>col</em>, and |
|
163 <em>pos</em>, which are the error message, the line number, |
|
164 column number and absolute position of the error in the XML document.</dd> |
|
165 |
|
166 <dt><strong>parser:pos()</strong></dt> |
|
167 <dd>Returns three results: the current parsing line, column, and |
|
168 absolute position.</dd> |
|
169 |
|
170 <dt><strong>parser:setbase(base)</strong></dt> |
|
171 <dd>Sets the <em>base</em> to be used for resolving relative URIs in |
|
172 system identifiers.</dd> |
|
173 |
|
174 <dt><strong>parser:setencoding(encoding)</strong></dt> |
|
175 <dd>Set the encoding to be used by the parser. There are four |
|
176 built-in encodings, passed as strings: "US-ASCII", |
|
177 "UTF-8", "UTF-16", and "ISO-8859-1".</dd> |
|
178 </dl> |
|
179 |
|
180 <h4>Callbacks</h4> |
|
181 |
|
182 <p>The Lua callbacks define the handlers of the parser events. The |
|
183 use of a table in the parser constructor has some advantages over |
|
184 the registration of callbacks, since there is no need for for the API |
|
185 to provide a way to manipulate callbacks.</p> |
|
186 |
|
187 <p>Another difference lies in the behaviour of the callbacks during |
|
188 the parsing itself. The callback table contains references to the |
|
189 functions that can be redefined at will. The only restriction is |
|
190 that only the callbacks present in the table at creation time |
|
191 will be called.</p> |
|
192 |
|
193 <p>The callbacks table indices are named after the equivalent Expat |
|
194 callbacks:<br /> |
|
195 <em>CharacterData</em>, <em>Comment</em>, |
|
196 <em>Default</em>, <em>DefaultExpand</em>, <em>EndCDataSection</em>, |
|
197 <em>EndElement</em>, <em>EndNamespaceDecl</em>, |
|
198 <em>ExternalEntityRef</em>, <em>NotStandalone</em>, |
|
199 <em>NotationDecl</em>, <em>ProcessingInstruction</em>, |
|
200 <em>StartCDataSection</em>, <em>StartElement</em>, |
|
201 <em>StartNamespaceDecl</em>, and <em>UnparsedEntityDecl</em>.</p> |
|
202 |
|
203 <p>These indices can be references to functions with |
|
204 specific signatures, as seen below. The parser constructor also |
|
205 checks the presence of a field called <em>_nonstrict</em> in the |
|
206 callbacks table. If <em>_nonstrict</em> is absent, only valid |
|
207 callback names are accepted as indices in the table |
|
208 (Defaultexpanded would be considered an error for example). If |
|
209 <em>_nonstrict</em> is defined, any other fieldnames can be |
|
210 used (even if not called at all).</p> |
|
211 |
|
212 <p>The callbacks can optionally be defined as <code>false</code>, |
|
213 acting thus as placeholders for future assignment of functions.</p> |
|
214 |
|
215 <p>Every callback function receives as the first parameter the |
|
216 calling parser itself, thus allowing the same functions to be used |
|
217 for more than one parser for example.</p> |
|
218 |
|
219 <dl class="reference"> |
|
220 <dt><strong>callbacks.CharacterData = function(parser, string)</strong></dt> |
|
221 <dd>Called when the <em>parser</em> recognizes an XML CDATA <em>string</em>.</dd> |
|
222 |
|
223 <dt><strong>callbacks.Comment = function(parser, string)</strong></dt> |
|
224 <dd>Called when the <em>parser</em> recognizes an XML comment |
|
225 <em>string</em>.</dd> |
|
226 |
|
227 <dt><strong>callbacks.Default = function(parser, string)</strong></dt> |
|
228 <dd>Called when the <em>parser</em> has a <em>string</em> |
|
229 corresponding to any characters in the document which wouldn't |
|
230 otherwise be handled. Using this handler has the side effect of |
|
231 turning off expansion of references to internally defined general |
|
232 entities. Instead these references are passed to the default |
|
233 handler.</dd> |
|
234 |
|
235 <dt><strong>callbacks.DefaultExpand = function(parser, string)</strong></dt> |
|
236 <dd>Called when the <em>parser</em> has a <em>string</em> |
|
237 corresponding to any characters in the document which wouldn't |
|
238 otherwise be handled. Using this handler doesn't affect expansion |
|
239 of internal entity references.</dd> |
|
240 |
|
241 <dt><strong>callbacks.EndCdataSection = function(parser)</strong></dt> |
|
242 <dd>Called when the <em>parser</em> detects the end of a CDATA |
|
243 section.</dd> |
|
244 |
|
245 <dt><strong>callbacks.EndElement = function(parser, elementName)</strong></dt> |
|
246 <dd>Called when the <em>parser</em> detects the ending of an XML |
|
247 element with <em>elementName</em>.</dd> |
|
248 |
|
249 <dt><strong>callbacks.EndNamespaceDecl = function(parser, namespaceName)</strong></dt> |
|
250 <dd>Called when the <em>parser</em> detects the ending of an XML |
|
251 namespace with <em>namespaceName</em>. The handling of the end |
|
252 namespace is done after the handling of the end tag for the element |
|
253 the namespace is associated with.</dd> |
|
254 |
|
255 <dt><strong>callbacks.ExternalEntityRef = function(parser, subparser, base, systemId, publicId)</strong></dt> |
|
256 <dd>Called when the <em>parser</em> detects an external entity |
|
257 reference.<br/><br/> |
|
258 The <em>subparser</em> is a LuaExpat parser created with the |
|
259 same callbacks and Expat context as the <em>parser</em> and should |
|
260 be used to parse the external entity.<br/> |
|
261 The <em>base</em> parameter is the base to use for relative |
|
262 system identifiers. It is set by parser:setbase and may be nil.<br/> |
|
263 The <em>systemId</em> parameter is the system identifier |
|
264 specified in the entity declaration and is never nil.<br/> |
|
265 The <em>publicId</em> parameter is the public id given in the |
|
266 entity declaration and may be nil.</dd> |
|
267 |
|
268 <dt><strong>callbacks.NotStandalone = function(parser)</strong></dt> |
|
269 <dd>Called when the <em>parser</em> detects that the document is not |
|
270 "standalone". This happens when there is an external subset or a |
|
271 reference to a parameter entity, but the document does not have standalone set |
|
272 to "yes" in an XML declaration.</dd> |
|
273 |
|
274 <dt><strong>callbacks.NotationDecl = function(parser, notationName, base, systemId, publicId)</strong></dt> |
|
275 <dd>Called when the <em>parser</em> detects XML notation |
|
276 declarations with <em>notationName</em><br/> |
|
277 The <em>base</em> parameter is the base to use for relative |
|
278 system identifiers. It is set by parser:setbase and may be nil.<br/> |
|
279 The <em>systemId</em> parameter is the system identifier |
|
280 specified in the entity declaration and is never nil.<br/> |
|
281 The <em>publicId</em> parameter is the public id given in the |
|
282 entity declaration and may be nil.</dd> |
|
283 |
|
284 <dt><strong>callbacks.ProcessingInstruction = function(parser, target, data)</strong></dt> |
|
285 <dd>Called when the <em>parser</em> detects XML processing |
|
286 instructions. The <em>target</em> is the first word in the |
|
287 processing instruction. The <em>data</em> is the rest of the |
|
288 characters in it after skipping all whitespace after the initial |
|
289 word.</dd> |
|
290 |
|
291 <dt><strong>callbacks.StartCdataSection = function(parser)</strong></dt> |
|
292 <dd>Called when the <em>parser</em> detects the begining of an XML |
|
293 CDATA section.</dd> |
|
294 |
|
295 <dt><strong>callbacks.StartElement = function(parser, elementName, attributes)</strong></dt> |
|
296 <dd>Called when the <em>parser</em> detects the begining of an XML |
|
297 element with <em>elementName</em>.<br/> |
|
298 The <em>attributes</em> parameter is a Lua table with all the |
|
299 element attribute names and values. The table contains an entry for |
|
300 every attribute in the element start tag and entries for the |
|
301 default attributes for that element.<br/> |
|
302 The attributes are listed by name (including the inherited ones) |
|
303 and by position (inherited attributes are not considered in the |
|
304 position list).<br/> |
|
305 As an example if the <em>book</em> element has attributes |
|
306 <em>author</em>, <em>title</em> and an optional <em>format</em> |
|
307 attribute (with "printed" as default value), |
|
308 <pre class="example"> |
|
309 <book author="Ierusalimschy, Roberto" title="Programming in Lua"> |
|
310 </pre> |
|
311 would be represented as<br/> |
|
312 <pre class="example"> |
|
313 {[1] = "Ierusalimschy, Roberto", |
|
314 [2] = "Programming in Lua", |
|
315 author = "Ierusalimschy, Roberto", |
|
316 format = "printed", |
|
317 title = "Programming in Lua"} |
|
318 </pre></dd> |
|
319 |
|
320 <dt><strong>callbacks.StartNamespaceDecl = function(parser, namespaceName)</strong></dt> |
|
321 <dd>Called when the <em>parser</em> detects an XML namespace |
|
322 declaration with <em>namespaceName</em>. Namespace declarations |
|
323 occur inside start tags, but the StartNamespaceDecl handler is |
|
324 called before the StartElement handler for each namespace declared |
|
325 in that start tag.</dd> |
|
326 |
|
327 <dt><strong>callbacks.UnparsedEntityDecl = function(parser, entityName, base, systemId, publicId, notationName)</strong></dt> |
|
328 <dd>Called when the <em>parser</em> receives declarations of |
|
329 unparsed entities. These are entity declarations that have a |
|
330 notation (NDATA) field.<br/> |
|
331 As an example, in the chunk |
|
332 <pre class="example"> |
|
333 <!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> |
|
334 </pre> |
|
335 <em>entityName</em> would be "logo", <em>systemId</em> would be |
|
336 "images/logo.gif" and <em>notationName</em> would be "gif". |
|
337 For this example the <em>publicId</em> parameter would be nil. |
|
338 The <em>base</em> parameter would be whatever has been set with |
|
339 <code>parser:setbase</code>. If not set, it would be nil.</dd> |
|
340 </dl> |
|
341 |
|
342 <h4><a name="separator"></a>The separator character</h4> |
|
343 |
|
344 <p>The optional separator character in the parser constructor |
|
345 defines the character used in the namespace expanded element names. |
|
346 The separator character is optional (if not defined the parser will |
|
347 not handle namespaces) but if defined it must be different from |
|
348 the character '\0'.</p> |
|
349 |
|
350 </div> <!-- id="content" --> |
|
351 |
|
352 </div> <!-- id="main" --> |
|
353 |
|
354 <div id="about"> |
|
355 <p><a href="http://validator.w3.org/check?uri=referer"> |
|
356 <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" /></a></p> |
|
357 <p><small>$Id: manual.html,v 1.27 2007/06/05 20:03:12 carregal Exp $</small></p> |
|
358 </div> <!-- id="about" --> |
|
359 |
|
360 </div> <!-- id="container" --> |
|
361 |
|
362 </body> |
|
363 </html> |