[Ocaml-i18n] OCamlI18N-0.3 & ldml progress
Matthieu Sozeau
mattam at mattam.org
Wed Sep 8 17:39:37 PDT 2004
Hello localizers,
I have some good news about LDML :) I have an LDML parser working
which processes all ldml-1.1 main/ data files (without collations) in 3.7
seconds, that with a memory footprint of 10 Mo, (500kb when loading just
fr_FR and parents). As i'm a lazy guy I didn't want to write the parser with
my little hands so I wrote a parser generator that take a dtd and outputs
class types and class implementations. The parser knows just about the
peculiarities of the LDML data model (inheritance and aliases) and use some
dumb data structures to customize the output (attribute conversion codes,
element list conversion code (e.g: to maps), factorization of classes...).
I'm very happy with the result, which is flexible (the code's not that nice
though). For instance i have 'lazyfied" alias resolution quickly.
About LDML proper, we now have all the classes and the inheritance mechanism
implemented. It is usable if you don't need number, dates formating and
collation data (not so much useful yet ;). I will adapt the I18N module to
the classes in some time but for now i'd like to reflect a little on the
parser generator and the problems discussed in the mail following.
On Friday 20 August 2004 15:48, Yamagata Yoriyuki wrote:
> From: Matthieu Sozeau <mattam at mattam.org>
> Subject: [Ocaml-i18n] OCamlI18N-0.2 & ldml progress
> Date: Tue, 17 Aug 2004 20:16:34 +0200
>
> > > Further it would be good to have a generic mechanism to handle
> > > locale dependent data, not limited to LDML. Your libaray would be a
> > > natural place for it. Then Camomile can govern all localle specific
> > > data through your library.
> >
> > I wonder what you mean by locale dependent data, is it just objects with
> > different versions for each language ?
>
> Essntially so. However, a collation table can be quite large, so we
> cannot assume they are all in the memory.
I have sort of a proposal here: with a generic parser generator it would be
easy to handle locale dependent data in a standard way. Take for example a
message catalogue system, you pass a DTD like this to the parser generator:
<code type="toy example">
<!ELEMENT messages (alias | (message*, special*) ) >
<!ELEMENT message ( #PCDATA ) >
<!ATTLIST message key NMTOKEN #REQUIRED >
</code>
and it generates the corresponding class hierarchy and parsing code which
handles aliases and inheritance just like LDML does. Aliases are resolved
lazily and you can perfectly say that collation data should be lazily
constructed too. You can also handcraft a class hierarchy integrated with
LDML class types. It's not a closed solution and it should provide a standard
way of dealing with locale data.
> > > In additon, Camomile needs ISO language and country code. Current
> > > approach is parsing locale name, but if you introduce non-standard
> > > locale name, then some method for getting ISO codes is necessary.
> >
> > What sort of non-standard locale names do you expect ?
>
> Using aliases for locale names appears quite common. (For example,
> catalan for ca_ES)
A canonicalization function should suffice, no ? Sure we could support that by
having a 'string -> locale' function in I18N.Locale that could be modified by
users, is it what you're asking for ?
> > ISO codes are just 15kb in total, and you don't have to load a lot of
> > ldml files to support a dozen locales in a program, apart from the
> > collation table creation/loading, what performance problems do you see ?
>
> I forget what I was thinking about, but for example Chinese locale
> definitions is rather large, so we cannot ignore their memory cost and
> loading time.
>
> Just a random thought.
I think lazyness should be enough for handling this cost, but some think a
cache would be needed, with weak pointers i suppose (benjamin ?). Is it
really a normal usage case to load all data for a particular locale and use
it only once in a run ? (In this case, you'd just have to clear the locale
pool to free memory in my humble opinion).
I just released OCamlI18N-0.3, it now requires pxp, and you can read the
README for LDML-related installation instructions.
-- mattam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : /pipermail/ocaml-i18n/attachments/20040909/5e4ba925/attachment.pgp
More information about the Ocaml-i18n
mailing list