[Ocaml-pxp-users] xml with any charset encoding
Gerd Stolpmann
info at gerd-stolpmann.de
Thu Aug 14 06:36:05 PDT 2003
Am Mit, 2003-08-13 um 21.09 schrieb Anastasia Gornostaeva:
> Hello.
>
> I want to parse RSS from websites. These RSS files can be any charset encoding
> (not only ascii or latin letters).
> I want to put them into pxp and receive UTF-8 data at output.
> How do it quickly and easily?
Simply select UTF-8 as internal encoding, e.g.
let config = { default_config with encoding = `Enc_utf8 }
Then pass this config value to the parsing function. The effect is that
PXP can represent all characters that are assigned in Unicode.
If the RSS files use different encodings, these are automatically
converted to UTF-8. Actually, this step is performed by the netstring
library, and all character encodings supported by netstring can be
processed. I recommend that you use the newest version of netstring
(included in ocamlnet-0.96), because the conversion algorithm runs much
faster than in previous versions. Furthermore, it is more difficult to
misconfigure netstring.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
gerd at gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------
More information about the Ocaml-pxp-users
mailing list