csv.lua

Sun, 01 Apr 2012 01:56:09 +0100

author
Matthew Wild <mwild1@gmail.com>
date
Sun, 01 Apr 2012 01:56:09 +0100
changeset 3
5b24d66365ec
parent 0
0e2b5dc7ae34
permissions
-rw-r--r--

Fix handling of quoted CSV fields, allowing this year's CSV to be properly parsed

0
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
1
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
2 local lpeg = require "lpeg"
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
3 local setmetatable, tonumber =
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
4 setmetatable, tonumber;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
5 local s_char = string.char;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
6
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
7 module("csv");
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
8
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
9 local delim = lpeg.P",";
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
10
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
11 local char_escape = lpeg.R"az" + lpeg.S"\\\r\n" + delim;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
12 local numeric_escape = (lpeg.R"09"^1)^-3;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
13 local escape = (lpeg.P"\\" * (char_escape + numeric_escape));
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
14
3
5b24d66365ec Fix handling of quoted CSV fields, allowing this year's CSV to be properly parsed
Matthew Wild <mwild1@gmail.com>
parents: 0
diff changeset
15 local quoted_value = lpeg.P"\"" * ((1-lpeg.P"\"")^0) * lpeg.P"\"";
5b24d66365ec Fix handling of quoted CSV fields, allowing this year's CSV to be properly parsed
Matthew Wild <mwild1@gmail.com>
parents: 0
diff changeset
16 local value = quoted_value + (escape + (1-delim))^0;
0
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
17
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
18 local escape_map = setmetatable({
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
19 t = "\t", b = "\b", f = "\f";
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
20 n = "\n", r = "\r", v = "\v"; },
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
21 { __index = function (_, n)
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
22 if tonumber(n) then
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
23 print"n"
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
24 return s_char(tonumber(n));
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
25 else
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
26 return n;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
27 end
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
28 end
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
29 });
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
30
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
31 function read_record(line, value_callback)
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
32 local fieldpos = 0;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
33 local callback = function (v)
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
34 return value_callback(v:gsub("\\(.)", escape_map));
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
35 end;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
36 repeat
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
37 fieldpos = lpeg.match(value / callback, line, fieldpos+1);
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
38 until fieldpos >= #line;
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
39 end
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
40
0e2b5dc7ae34 Initial commit
Matthew Wild <mwild1@gmail.com>
parents:
diff changeset
41 return _M;

mercurial