[Noisebridge-discuss] File format specification

Jacob Appelbaum jacob at appelbaum.net
Fri Mar 19 02:06:16 UTC 2010


Meredith L. Patterson wrote:
> Jacob Appelbaum wrote:
>> It does seem like XML may solve some of these problems - I've never
>> actually interfaced with any XML libraries in C, I've barely interfaced
>> with XML as a programmer at all.
> 
> I've done XML in C; it blew. If you need a DOM parser rather than a SAX
> parser, your best bet is probably going to be Sablotron; libgdome2 has
> one of the more awful interfaces in the set of awful interfaces that is
> glib, and it's not portable.
> 

That's what I suspected. I think XML is right out after discovering JSON.

> Python's native XML libraries are pretty nice. I couldn't tell you how
> many times I've used xml.dom.minidom, but it's a lot. lxml is also
> really useful and somewhat faster than the native libs -- but I have
> also made it segfault while asking it to parse a book in DocBook format,
> so while it's mature, it's not bug-free. (That said, the native SAX
> module did the trick. Go figure.)

Eek. That's not an endorsement of XML or common XML parsing libraries. :-)

> 
>> I think that I'd need to make a MagPack DTD and then simply add data to
>> corresponding fields that I have defined. Does that sound about right?
> 
> If this is all you need to do, and if JSON doesn't provide everything
> you need, you could try YAML. YAML is more compact than XML, more
> human-readable, easily machine-parseable, and a superset of JSON. I've
> mostly used it for formatted-document stuff (I wrote an XML-to-YAML
> module that I should really clean up and put on the Cheese Shop), but I
> think it's particularly well suited to protocol documentation as well.
> 

Yeah, I think that YAML or JSON are the clear win for storing the data.
I'm not entirely certain of how I'll store arbitrary binary data in a
JSON element, perhaps I'll base64 encode it and stuff it into an element
first? That's not quite human readable..

> Really what you want to do is start with a protocol definition written
> in (extended) Backus-Naur format, then use that as your basis for
> machine-readable format definitions. This gives you consistent behaviour
> among specifications and increases the likelihood of consistent
> behaviour across implementations (assuming implementors are smart and
> actually implement based on the BNF like they're supposed to).
> 

You're spot on. I was pretty tired last night and so I just dumped my
notes down for data that I thought might be encountered. It makes a lot
of sense to use EBNF; I'll start working on outlining the likely data
structures in EBNF tonight, perhaps it will bring me clarity about JSON
vs YAML vs something else...

> YMMV, but those are the experiences I've had.

Thank you Meredith! You're awesome. :-)

Best,
Jake

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 155 bytes
Desc: OpenPGP digital signature
URL: <http://lists.noisebridge.net/pipermail/noisebridge-discuss/attachments/20100318/001e9450/attachment-0003.sig>


More information about the Noisebridge-discuss mailing list