Tuesday, November 11, 2008

I have created a space in Assembla to look closely at JvYaml and call it SnakeYAML (http://trac-hg.assembla.com/snakeyaml). The source migrated from CVS to Mercurial. Standard Maven folder structure is applied.
It is very convenient that JvYaml is a direct port of PyYAML. It is so easy to see the Python implementation and compare the deviations. It is even possible to debug two implementations in parallel on 2 computers ! (Synergy is dead useful).
Before the code is changed let us contribute tests. A lot of examples from the http://yaml.org/spec/1.1/ are created. Unfortunately a number of tests fails.
These are some deviation from original Python code:
  • Reader is dropped (in favor of java.io.Reader) and BOM is not respected. When stream is read the encoding must be known which is not always possible (and it is against the specification)
  • Scanner implementation is simplified. All the comments are removed.
  • Python implementation is not followed very closely. For instance a boolean in Python may be True, False and None. But Java implementation is using a primitive instead of the class Boolean and the third state is gone. It causes for example trimming the trailing spaces in the block scalars.
  • Python module is close to Java package. It helps separate code logically.
  • no tests are imported from PyYAML

Let us improve the implementation and try to follow the specification as close as possible.
This is what is done so far:
  • Java does not have multiple inheritance (which is very good!). The way how multiple inheritance is used in PyYAML is not very correct. Let us follow a reliable recommendation - "use composition over inheritance". Now Reader is an instance variable in Scanner.
  • Change the public interface and stay closer to PyYAML. Use Iterator instead of List. The java.io.InputStream is used and the encoding is recognized (and ignored) automatically
  • Rename classes with respect to "Python module" -> "Java package".
  • Define code formatter which can be imported to Eclipse
  • Go through ScannerImpl and try to stay as close as possible to PyYAML. A number of issues fixed. The size is almost doubled (~2000 lines), mostly because of the comments in the code.
  • some tests are imported from PyYAML

Because SnakeYAML provides some improvements over existing YAML libraries I can release the library.
Documentation is much worse then it should be. I will try to improve it later.
If somebody needs a reliable YAML parser for Java take SnakeYAML !
SnakeYAML 0.4 is born to this beautiful World...