[Svnmerge] Unicode in log messages
Benson Margulies
bimargulies at gmail.com
Fri Oct 9 12:18:26 PDT 2009
Raman,
I messed up by not rereading the code before writing that message. Yes, of
course, there's a decode. But it's a no-op, since sys.stdout.encoding is
UTF-8 on the machines I have access to.
sys.stdout.encoding is UTF-8 for me.
/Users/benson/x/verint/rex-ws/target python
Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys;
>>> print sys.stdout.encoding;
UTF-8
>>>
>>> import locale;
>>> print locale.getdefaultlocale()[1];
Here's the patch that works fine for me. The critical change is to avoid
local.getdefaultlocale() if you want to preserve the
def recode_stdout_to_file(s):
if locale.getdefaultlocale()[1] is None or not hasattr(sys.stdout,
"encoding") \
or sys.stdout.encoding is None:
return s
u = s.decode(sys.stdout.encoding)
#return u.encode(locale.getdefaultlocale()[1])
return u.encode("utf-8")
Since svn is not a python program, it is not obvious to me how
sys.stdout.encoding is related to how it writes. Practically, it seems to
write UTF-8, and then the rest of this works for me.
On Fri, Oct 9, 2009 at 3:00 PM, Raman Gupta <rocketraman at fastmail.fm> wrote:
> Please keep replies on list...
>
> Benson Margulies wrote:
> > The point is that it only uses the encoding to write the file. It reads
> > the bytes from the log raw, and pushes them into the codec to write them
> > into the file. Thus, it is assuming that the input is UTF-8, and asking
> > for the output to be in the default locale. That's how the codecs work.
> > It isn't using a codec to convert from input, only to convert the output.
>
> I'm sorry Benson, but I believe you are operating under some
> fundamental misconceptions... Of course it has to use a codec to
> convert from input ("input" here is the svn log output).
>
> Any time one reads bytes that one knows are characters (as output by
> svn log), one needs to apply a codec to the bytes to understand what
> those characters are. You contradict yourself by saying that it is
> assuming the input is UTF-8 -- UTF-8 is just another codec, no
> different from other codecs except in the actual byte value(s) used to
> represent characters. Assuming UTF-8 would indeed mean using a codec
> to decode the input.
>
> Here is what it is really doing:
>
> def recode_stdout_to_file(s):
> [... if statement snipped ...]
> u = s.decode(sys.stdout.encoding)
> return u.encode(locale.getdefaultlocale()[1])
>
> i.e. svnmerge.py is decoding the bytes of the svn log output using the
> codec returned by sys.stdout.encoding. This may be UTF-8, but it may
> be something else depending on your local platform and settings. There
> is *no assumption* of UTF-8 here. Then it is encoding those characters
> back into bytes (and eventually writing these bytes to a file), using
> the codec returned by locale.getdefaultlocale()[1]. This encoding is
> what svn expects in the content of files that it reads commit log
> messages from via the -F parameter.
>
> The possible error here is that our assumption of what encoding svn
> uses when printing a log to stdout (i.e. sys.stdout.encoding) or what
> encoding svn uses when reading a commit log file for creating a commit
> message (i.e. locale.getdefaultlocale()[1]) is wrong. If either of
> these assumptions is wrong, then yes, there is a problem that needs to
> be fixed. It has nothing to do with "assuming" UTF-8.
>
> > And this makes sense. It's completely wrong to assume that the svn log
> > messages are in the current user's default locale locale encoding. It
> > makes some sense that users would want to edit a file in their current
> > encoding, it just doesn't always work.
>
> Huh? Do you have some evidence that svn, when writing a commit log to
> standard output, does not write the data in the encoding specified by
> the python sys.stdout.encoding value? If so, great -- please provide
> such evidence and a patch with your fix.
>
> Cheers,
> Raman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/svnmerge/attachments/20091009/1a7fc711/attachment.html>
More information about the Svnmerge
mailing list