[Ocaml-pxp-users] Handling undeclared entities
Dario Teixeira
darioteixeira at yahoo.com
Thu Jul 16 09:43:35 PDT 2009
Hi,
I am using PXP to parse a small HTML-like markup. I would like to allow
the use of common HTML entities in the source text (such as €), but I
don't want to include a list of *all* of them in the DTD (note that these
are eventually checked for validity somewhere else; I just don't need this
task to be performed also by PXP).
Now, the PXP manual mentions several times that entities are automatically
converted into regular #PCDATA, and there doesn't seem to be a way of passing
them unmodified to the processing code. Therefore, if they are not declared
in the DTD I get a parsing error.
One solution I can think of is to preprocess the source file, using regexps
to replace entity references by a special node. Something like this:
"the symbol is €" -> "the symbol is <entity>euro<entity>".
This solution is of course way to kludgy and error prone. Is there a better
alternative within PXP?
Thanks!
Best regards,
Dario Teixeira
More information about the Ocaml-pxp-users
mailing list