Words fail me to describe just how awesome
LPeg is. Designed as a Lua implementation of the
PEG concept, it is a true programming gem! Please, if you dont't know what it is, take some time to familiarize yourself with it! It's not the easiest thing to grasp, but you will *not* regret it! It is certainly one of the most worthwhile learning efforts you can make in generic programming.
One great feature of LPeg is that it's
binary-safe, meaning that (unlike regular expressions) it can be
safely used to parse
binary data! This makes it an excellent tool for parsing binary protocols, especially network communication protocols, such as the
Action Message Format (used by Adobe Flash for making remote calls and even in FLV movie files). I'll leave it to you to explore the possibilities...
Beware that from here on, I assume that you know your way around Lua, LPeg and how they work.
The problem
That being said, this article is actually about an unusual roadblock I hit while using LPeg to build a Lua-based AMF parser, and the various solutions I found and/or came up with to overcome it (you didn't think that I mentioned AMF before by accident, did you?).
The issue is
LPeg's implementation of repetitive patterns: in particular, its inability to match (or capture) a fixed number of occurrences of a certain pattern, although it can match a minimum or a maximum number of such occurrences, which is perfect for stream-oriented parsing (such as parsing programming languages) but insufficient for binary data.
Just to clarify, here's a small list of LPeg patterns which correspond to the typical
PCRE repetitive constructs (in each case we're trying to match the string 'cloth'):
|
Nr. |
| Matching occurrences of 'cloth' |
| PCRE pattern |
| LPeg pattern |
|
|
1 |
| 0 or more (at least 0) |
| [cci_text]/(cloth)*/[/cci_text] |
| [cci_lua]lpeg.P'cloth'^0[/cci_lua] |
|
|
2 |
| 1 or more (at least 1) |
| [cci_text]/(cloth)+/[/cci_text] |
| [cci_lua]lpeg.P'cloth'^1[/cci_lua] |
|
|
3 |
| X or more (at least X) |
| [cci_text]/(cloth){X,}/[/cci_text] |
| [cci_lua]lpeg.P'cloth'^X[/cci_lua] |
|
|
4 |
| 1 or less (at most 1) |
| [cci_text]/(cloth)?/[/cci_text] |
| [cci_lua]lpeg.P'cloth'^-1[/cci_lua] |
|
|
5 |
| X or less (at most X) |
| [cci_text]/(cloth){,X}/[/cci_text] |
| [cci_lua]lpeg.P'cloth'^-X[/cci_lua] |
|
|
6 |
| precisely X (no more, no less) |
| [cci_text]/(cloth){X,X}/[/cci_text] |
| [cci_lua]-- not implemented --[/cci_lua] |
|
|
7 |
| anywhere between X and Y |
| [cci_text]/(cloth){X,Y}/[/cci_text] |
| [cci_lua]-- not implemented --[/cci_lua] |
|
|
|
For cases 6 and 7, LPeg does not offer any simple constructs so we have to find a complex one. But let's put case 7 aside for a while, and try to tackle case 6, then we'll see...