From ermine at ermine.pp.ru Wed Aug 13 12:09:23 2003 From: ermine at ermine.pp.ru (Anastasia Gornostaeva) Date: Wed, 13 Aug 2003 23:09:23 +0400 Subject: [Ocaml-pxp-users] xml with any charset encoding Message-ID: <20030813190923.GA14955@ermine.home> Hello. I want to parse RSS from websites. These RSS files can be any charset encoding (not only ascii or latin letters). I want to put them into pxp and receive UTF-8 data at output. How do it quickly and easily? ermine From info at gerd-stolpmann.de Thu Aug 14 06:36:05 2003 From: info at gerd-stolpmann.de (Gerd Stolpmann) Date: Thu, 14 Aug 2003 13:36:05 -0000 Subject: [Ocaml-pxp-users] xml with any charset encoding In-Reply-To: <20030813190923.GA14955@ermine.home> References: <20030813190923.GA14955@ermine.home> Message-ID: <1060863970.970.8.camel@ice.gerd-stolpmann.de> Am Mit, 2003-08-13 um 21.09 schrieb Anastasia Gornostaeva: > Hello. > > I want to parse RSS from websites. These RSS files can be any charset encoding > (not only ascii or latin letters). > I want to put them into pxp and receive UTF-8 data at output. > How do it quickly and easily? Simply select UTF-8 as internal encoding, e.g. let config = { default_config with encoding = `Enc_utf8 } Then pass this config value to the parsing function. The effect is that PXP can represent all characters that are assigned in Unicode. If the RSS files use different encodings, these are automatically converted to UTF-8. Actually, this step is performed by the netstring library, and all character encodings supported by netstring can be processed. I recommend that you use the newest version of netstring (included in ocamlnet-0.96), because the conversion algorithm runs much faster than in previous versions. Furthermore, it is more difficult to misconfigure netstring. Gerd -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd at gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------ From ermine at ermine.pp.ru Tue Aug 19 03:52:20 2003 From: ermine at ermine.pp.ru (Anastasia Gornostaeva) Date: Tue, 19 Aug 2003 14:52:20 +0400 Subject: [Ocaml-pxp-users] pxp and charsets Message-ID: <20030819105219.GA16563@ermine.home> Hello. >> I want to parse RSS from websites. These RSS files can be any charset encodin >> (not only ascii or latin letters). >> I want to put them into pxp and receive UTF-8 data at output. >> How do it quickly and easily? > Simply select UTF-8 as internal encoding, e.g. > let config = { default_config with encoding = `Enc_utf8 } > Then pass this config value to the parsing function. The effect is that > PXP can represent all characters that are assigned in Unicode. It seems it does not works. All non-latin letters are replaced to spaces. It is not interesting :-( BTW, in netstring recode_string works perfectly right. So, can anybody help me? ermine From ermine at ermine.pp.ru Tue Aug 19 14:08:25 2003 From: ermine at ermine.pp.ru (Anastasia Gornostaeva) Date: Wed, 20 Aug 2003 01:08:25 +0400 Subject: [Ocaml-pxp-users] pxp and charsets In-Reply-To: <20030819105219.GA16563@ermine.home> References: <20030819105219.GA16563@ermine.home> Message-ID: <20030819210825.GB18743@ermine.home> On Tue, Aug 19, 2003 at 02:52:20PM +0400, Anastasia Gornostaeva wrote: Don't read it, please. It was my big mistake in source. All works. Big thanks to Gerd Stolpmann. > Hello. > > >> I want to parse RSS from websites. These RSS files can be any charset encodin > >> (not only ascii or latin letters). > >> I want to put them into pxp and receive UTF-8 data at output. > >> How do it quickly and easily? > > Simply select UTF-8 as internal encoding, e.g. > > let config = { default_config with encoding = `Enc_utf8 } > > > Then pass this config value to the parsing function. The effect is that > > PXP can represent all characters that are assigned in Unicode. > > It seems it does not works. All non-latin letters are replaced to spaces. > It is not interesting :-( > > BTW, in netstring recode_string works perfectly right. > So, can anybody help me?