Xml Serialization for Ruby


home

new

blogki

scrapware

papers


about

File: Readme
Download 1.0.pre4:prdownloads.sourceforge.net/clxmlserial/clxmlserial.1.0.pre4.zip
REXML*:www.germane-software.com/software/rexml/
Home Page:clabs.org/clxmlserial.htm
ViewCVS:cvs.sourceforge.net/cgi-bin/viewcvs.cgi/clxmlserial/clxmlserial/
Anon CVS:sourceforge.net/cvs/?group_id=51071

* last tested with 2.7.1

please review the Security Issues section before using.

Overview

Xml Serialization allows classes to be marshalled to and from XML.

It consists of a module (XmlSerialization) and modified standard classes which add to_xml and from_xml methods. to_xml is an instance method which returns an XML element containing the data from each instance variable in the including class. from_xml is a singleton/class method which accepts an XML element and creates an instance of the class with the data in the element.

Currently, REXML is used for XML parsing. It’s possible later versions could plug-in other XML processors.

This project is still in a pre-release state, though functional. Feel free to give me feedback (code contributions are of course always welcome).

License

Copyright © 2002-2003, Chris Morris (clxmlserial@clabs.org). BSD license.

Install

  % ruby install.rb

Usage

See the examples directory for a sample. Unit tests are also included in SITE/1.x/cl/xmlserial/xmlserialtest.rb. Here’s a quick sample:

  require 'cl/xmlserial'

  class MyClass
    include XmlSerialization
    attr_accessor :attr

    def initialize
      attr = 0
    end
  end
  doc = REXML::Document.new(File.open("class.xml"))
  c = MyClass.from_xml(doc.root)
  c.attr = 60
  f = File.new("class.xml", File::CREAT|File::TRUNC|File::RDWR)
  c.to_xml.write(f, -1)
  f.close

yields either:

  <MyClass>
    <attr>
      <Fixnum>60</Fixnum>
    </attr>
  </MyClass>

or:

  <MyClass>
    <attr>60</attr>
  </MyClass>

The XmlSerialization module includes a singleton configuration class (XmlSerialConf.instance aliased XSConf ) with an outputTypeElements setting. Setting this to false gives more concise XML (the latter example above). In order to ensure the data is read in correctly, the instance variables should be initialized in the class’s initialize method.

Attempts to correctly grok Strings and Numerics will be made for uninitialized instance vars, so the latter example above will read in 60 as a Fixnum, even if @attr is not initialized. If the value is neither a valid Integer or Float, then it’s read in as a String.

All forms of Ruby Numeric notation are supported as well. So this:

  <Array>-5.4,5.a,4e5,0xaabb,123_456</Array>

is read in as:

  [-5.4, "5.a", 400000.0, 43707, 123456]

Arrays and Hashes also work with outputTypeElements set to false, assuming the items/keys/values are all of type String or Numeric. In that case, a CSV string is output. For example:

  c = MyClass.new
  c.attr = ['a', 5]

becomes

  <MyClass>
    <attr>a,5</attr>
  </MyClass>

and

  c = MyClass.new
  c.attr = { 'a' => 5, 'b' => 6 }

becomes

  <MyClass>
    <attr>a=5,b=6</attr>
  </MyClass>

If any of the items/keys/values are neither a String or Numeric, then type elements are automagically used:

  c = MyClass.new
  c.attr = ['a', ['b', 'c']]

becomes

  <MyClass> <attr> <Array> <String>a</String> <Array> <String>b</String>
  <String>c</String> </Array> </Array> </attr> </MyClass>

As of 1.0.pre3, Xml Serialization can be used with classes that do not have a default/parameterless constructor. Set the XSConf.bypassInitialize attribute to true to have from_xml ignore the initialize method of the class. False is the default setting.

Also changed in 1.0.pre3, attribute accessors are no longer required. instance_eval is used to set attributes directly.

Currently, the following standard classes are supported:

  • String
  • Fixnum
  • Array
  • Hash
  • Time (time format can be set in XSConf.timeFormat)
  • Integer (Fixnum, Bignum)
  • Float
  • TrueClass
  • FalseClass

Security Issues

1.0.pre3 switched from requiring attribute accessors for deserialization to calling instance_eval. This is more convenient, but has a potential security hole.

If the $SAFE level is set to 1, all strings read in from a file are marked tainted, and cannot be passed to instance_eval. However, because REXML passes all strings through Array.pack and Array.unpack calls to support various xml encodings, the string’s taintedness is lost, and the instance_eval calls are allowed.

Beyond that, a $SAFE level of 3 or more will simply not allow calls to instance_eval, so the current release won’t work under those conditions.

In 1.0.pre5, I plan to re-add the original code that uses send and requires writer accessor methods, in addition to the instance_eval code, and add a XSConf switch to control this. The default setting will be required accessor methods to play it safe with the potential security hole.

I’ve been discussing this issue with Sean Russell, author of REXML, and it’s possible that REXML will be changed to retain the string’s taintedness through the encoding process. In this case, the security hole should be closed, and the option to not use instance_eval will be necessary at any $SAFE level.

Guts Overview

The Object class has a few methods appended to it, the main one being to_xml. Its primary role is to setup the base XML element node, including a type element if required. Then it calls instance_data_to_xml which must be overridden in descendant classes.

The supported standard classes all have instance_data_to_xml methods appended to them. For custom classes, the module XmlSerialization has a instance_data_to_xml method that loops through each instance variable in the including class, calling to_xml on each of them.

from_xml is a singleton method (class method) appended to each supported standard class as well as a singleton method in the XmlSerialization module. It creates a new instance of the class based on the XML element passed to it.

Contributors

  • Harry Ohlsen
    • Support for classes in modules and inner classes
    • Code to use eval instead of send for classes w/o accessors
    • Code to workaround initialize method for instantiating classes with parameterized intializers
  • Stefan Mueller
    • TrueClass and FalseClass support

Thanks

  • matz for Ruby
  • Sean Russell for REXML
  • Dave Thomas for RDoc and general guruness

Links

Change Log

1.0.pre4

  • Support for Ruby 1.8. Still works on 1.6. Minor changes to ward off warnings. Inclusion of allocate method if not using Ruby 1.6 or 1.7 in from_xml if XSConf.bypassInitialize is true.

1.0.pre3

  • Support for classes in modules and inner classes
  • instance_eval used instead of send to set instance data. Accessor methods no longer required
  • XSConf.bypassInitialize option to deserialize classes without default/parameterless initialize methods
  • TrueClass and FalseClass support

To Do

pre5

  • add back attribute accessor and a XSConf switch to support both options. Using instance_eval has a potential security hole that is not protected by $SAFE == 1 even when deserializing from an xml file. Using instance_eval is not an option in $SAFE >= 3.
  • xmlserial gets stuck in a loop if the elements in my tree have references to their parents. I had to delete the references before to_xmling the tree, and restore them afterwards. Marshal does not have this problem. [Stefan Mueller]

    Hmmm … this gets complicated fast. Basically, the xml will have to have an id system, so that a child instance can simply refer to an already serialized instance’s id. Then, during deserialization this id system will tie back to Object#id.

    Problem here is now the xml is getting cluttered and I want to keep an option for uncluttered xml — so, how to handle this properly.

  • installation bug from Bret Pettichord <bret@pettichord.com>

    > I found that i can’t install clxmlserial directly from the CD: > > F:\installs\clxmlserial>ruby install.rb > chmod 0755 /cygdrive/c/ruby1.6.5/lib/ruby/site_ruby/1.6/cl/xmlserial > chmod 0755 /cygdrive/c/ruby1.6.5/lib/ruby/site_ruby/1.6/cl/xmlserial/test > install.rb:58:in `open’: Permission denied - "clxmls_tmp" (Errno::EACCES)

1.0

  • the example is nice, but due to it being solely complex arrays, there’s no way to showcase the xml sans type elements.
  • Add tests for no type element output on Arrays & Hashes that include the delimiter in the item/key/value, and force type elements then.
  • Install example dir. It’s being ignored right now (dist, but not inst).
  • Test with latest stable REXML.

1.0?

  • RegExp class. Other standard classes?

Future/Never

  • ? Change type to be stored as attribute, not separate element (not sure how much this would buy in the case of an Array or Hash. If type info is moved to an attribute, then a placeholder node (+<Item>+ or +<Element>+) must be created to hold the attribute … that’s actually more XML).
  • Refactor to be able to test with mock xml parser, to force structure to be friendly to using other parsers
  • Work with xml_pickle (Python) guys for interchangable xml.
  • xml_pickle (Python) can deserialize an xml file into a class that isn’t predefined … it creates the class on the fly based on the data in the xml file. Cool feature.
  • Review rrr.jin.gr.jp/rwiki?cmd=view;name=Marshal to get lib up to snuff to be All It Can Be.