Write your own format FieldSet class
====================================

Input stream
------------

For different reasons (eg. addresses with bit granularity), Hachoir uses it's
own stream classes: ``InputStream``. But don't use it directly, use
``FileInputStream`` function (needs a filename) or ``StringInputStream``
(needs a string). Here is a small example to create a stream:

>>> from hachoir.stream import StringInputStream
>>> from hachoir.endian import BIG_ENDIAN, LITTLE_ENDIAN
>>> stream = StringInputStream("\x03abc\x02\x00")

Most interesting methods are:

>>> stream.size                              # get size in bits
48
>>> stream.readBits(0, 8, BIG_ENDIAN)        # get 8 bits at address 0
3
>>> stream.readBytes(8, 3)                   # get 3 bytes at address 8
'abc'
>>> stream.readBits(32, 16, BIG_ENDIAN)      # get 16 bits in big endian
512
>>> stream.readBits(32, 16, LITTLE_ENDIAN)   # get 16 bits in big endian
2


Support your own format using FieldSet
--------------------------------------

In the Hachoir, everything is stored in a field. The parent of all classes is
the class Field, but it can't be used directly. They are four different types
of fields:

* Bit: one bit (True/False) ;
* Bits: unsigned number with a size in bits ;
* Bytes: vector of know bytes (eg. file signature) ;
* UInt8, UInt16, UInt32, UInt64: unsigned number (size: 8, 16, 32, 64 bits) ;
* Int8, Int16, Int32, Int64: signed number (size: 8, 16, 32, 64 bits) ;
* Float32, Float64: 32/64 bits float number (IEEE 754) ;
* Enum: associate a string to a value (need another Field as argument) ;
* Character: 8 bits ASCII character ;
* PaddingBits: padding with a size in bits ;
* PaddingBytes: padding with a size in bytes ;
* String: fixed length string ;
* CString: string ending with nul byte ("\0") ;
* UnixLine: string ending with new line character ("\n") ;
* PascalString8, PascalString16 and PascalString32: string prefixed with
  length in a unsigned 8 / 16 / 32 bits integer (use parent endian) ;
* FieldSet: a ordered list of fields (contains other fields).

If you didn't found documentation about a format, use "raw" types:
* RawBits: unsigned number with a size in bits ;
* RawBytes: vector with a size in bytes.

A stream is splitted in several fields which are organised in a tree. So all
fields have a parent, except of the root. Small example which will be used to
parse the string "\x03abc":

>>> from hachoir.field import Parser, UInt8, UInt16, String
>>> from hachoir.endian import BIG_ENDIAN
>>> class MyFormat(Parser):
...     tags = {"description": "My first parser", "min_size": 3*8}
...     endian = BIG_ENDIAN
...     def createFields(self):
...             yield UInt8(self, "length", "String length")
...             yield String(self, "text", self["length"].value)
...             yield UInt16(self, "number")
...

One goal in Hachoir is to make the write of a parser the most easy that it
could be. You just have to write one method, createFields, which will create
all fields.

Another goal is to instanciate the less fields as possible. In most cases, no
field is created when a field set in instanciated. Fields are created when you
access them by their name. That's why, the special Python keyword ''yield'' is
used which permit to create only fields "on demand" (get more details in
"Hachoir internals" documentation).

Prototype of field classes are almost different, but the two first parameters
are always the same:

* First one is the parent, of type FieldSet (is None for the root) ;
* The second is the name of the field.

Ok, let's play with our new field set:

>>> format = MyFormat(stream)
>>> format.size               # get size in bits
48
>>> format["text"].value
'abc'
>>> "length" in format        # test if the field 'length' does exist
True
>>> # Easiest way to display a field set content
>>> for field in format:
...     print "%s=%s" % (field.name, field.display)
...
length=3
text="abc"
number=512

Details about Field class
-------------------------

A field contains a lot of informations, attributes are:

* *name* (read only): Field name, unique in his parent field set ;
* *size* (read only): Size in bits ;
* *address* (read only): Address in bits, relative to parent address ;
* *absolute_address* (read only): Address in bits from the beginning of
  the stream ;
* *parent* (read only): Parent of the field (is None for root field set) ;
* *root* (read only): Root of all field sets ;
* *value*: Formatted value (integer, string, boolean, ...) value of the field.
  Don't use this argument with print function, better use display attribute ;
* *display*: Human readable (and truncated) representation of the field value ;
* *path*: Full "path" of the field from the root (eg. "/header/content") ;
* *is_field_set* (read only): If the value is True, the field contains other
  fields (means that the class inherits from FieldSet).

Examples:

>>> field = format["text"]
>>> field.name
'text'
>>> field.path
'/text'
>>> field.value
'abc'
>>> field.size
24
>>> field.address
8

Classes which inherit from Field class may have other attributes. Read the API.

For example, String class has the attribute "length" which is the length of
the string in characters.

>>> format["text"].size
24
>>> format["text"].length
3

