[patch] convert-ly cannot deal with accented characters

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[patch] convert-ly cannot deal with accented characters

Valentin Villenave
Administrator
Hi frogs, hi Carl, hi John,

I have been trying to address a bug with convert-ly, but it turns out
my patch breaks doc-compiling.

Here's the patch; one of you guys may have a better idea of how to
deal with it. As for me, I'm just plain stuck.
http://git.savannah.gnu.org/gitweb/?p=lilypond.git;a=patch;h=0ce2657b47ac6ba152ee022b27de26e569846db9

Cheers,
Valentin

---
----
Join the Frogs!

Reply | Threaded
Open this post in threaded view
|

Re: [patch] convert-ly cannot deal with accented characters

Graham Percival
On Thu, Mar 4, 2010 at 4:35 PM, Valentin Villenave
<[hidden email]> wrote:
> Here's the patch; one of you guys may have a better idea of how to
> deal with it. As for me, I'm just plain stuck.
> http://git.savannah.gnu.org/gitweb/?p=lilypond.git;a=patch;h=0ce2657b47ac6ba152ee022b27de26e569846db9

Dude, don't be lazy.  Here's the actual bits:

--------
diff --git a/python/lilylib.py b/python/lilylib.py
index 3bdfa1c..9915cd7 100644
--- a/python/lilylib.py
+++ b/python/lilylib.py
@@ -23,6 +23,7 @@ import re
 import shutil
 import sys
 import optparse
+import locale

 ################################################################
 # Users of python modules should include this snippet
@@ -48,7 +49,9 @@ underscore = _
 # Maybe guess encoding from LANG/LC_ALL/LC_CTYPE?

 def encoded_write(f, s):
-    f.write (s.encode (f.encoding or 'utf_8'))
+    f.write (s
+      .decode (sys.stderr.encoding or locale.getdefaultlocale()[1])
+      .encode (f.encoding or 'utf_8'))

 # ugh, Python 2.5 optparse requires Unicode strings in some argument
 # functions, and refuse them in some other places
-----------

Now, have you tried a doc build to see the error yourself?  What part
of this patch is causing the problem?  Can you avoid using that part?

I expect my 1st year students to go "duh, it doesn't work".  You can
do better than them.

- Graham

---
----
Join the Frogs!

Reply | Threaded
Open this post in threaded view
|

Re: [patch] convert-ly cannot deal with accented characters

Valentin Villenave
Administrator
On Thu, Mar 4, 2010 at 5:41 PM, Graham Percival
<[hidden email]> wrote:
> I expect my 1st year students to go "duh, it doesn't work".  You can
> do better than them.

Come on, I'm but a pianist :)

The problem comes from

-    f.write (s.encode (f.encoding or 'utf_8'))

since a) the f object *doesn't have* an "encoding" property, and
  b) the s object cannot be encoded without having been decoded first.


Hence, we need to decode it so we can re-encode it. Here comes another
problem: decode it from *what* encoding exactly? By default, Python
tries to read it as an ascii string, which fails when it contains
accented chars.

On Windows, that would be Cp1252. On Mac OSX, that would be utf-8, and
on GNU/Linux it can be anything from ISO8859-n to UTF-8.

Therefore, I'm using the global locale variable to detect the encoding
(this variable is set on all operating systems). example: en,utf8.
That requires me to import the locale module.

I need the second part of the variable (hence the [1]).

+    f.write (s
+      .decode (sys.stderr.encoding or locale.getdefaultlocale()[1])
+      .encode (f.encoding or 'utf_8'))

Last time I checked, it worked. Turns out it doesn't. Ergo: I'm gonna
give up and play some Chopin instead.

Cheers,
Valentin

---
----
Join the Frogs!

Reply | Threaded
Open this post in threaded view
|

Re: [patch] convert-ly cannot deal with accented characters

Graham Percival
On Thu, Mar 4, 2010 at 4:48 PM, Valentin Villenave
<[hidden email]> wrote:
> On Thu, Mar 4, 2010 at 5:41 PM, Graham Percival
> <[hidden email]> wrote:
>> I expect my 1st year students to go "duh, it doesn't work".  You can
>> do better than them.
>
> Come on, I'm but a pianist :)

Have you *seen* some of my students?


> The problem comes from
>
> -    f.write (s.encode (f.encoding or 'utf_8'))
>
> since a) the f object *doesn't have* an "encoding" property, and
>  b) the s object cannot be encoded without having been decoded first.

That's the original problem, which your patch solved for some cases.
But when your patch fails for other cases, how does it fail?

> I need the second part of the variable (hence the [1]).
>
> +    f.write (s
> +      .decode (sys.stderr.encoding or locale.getdefaultlocale()[1])
> +      .encode (f.encoding or 'utf_8'))
>
> Last time I checked, it worked. Turns out it doesn't.

Under what conditions?  (hint: maybe an error message about "NoneType" ?)

Then the question is either "how can we either avoid those
conditions", or "how can we make the code do something different in
those conditions".  The answer to the latter question is lab 2,
exercise 6... oh wait, sorry, you're not using my C labs (completely
rewritten by all the doc knowledge of yours truly).


You'll feel better if you solve this yourself instead of waiting for
John to fix it for you.  Trust me.

- Graham

---
----
Join the Frogs!

Reply | Threaded
Open this post in threaded view
|

Re: [patch] convert-ly cannot deal with accented characters

Valentin Villenave
Administrator
On Thu, Mar 4, 2010 at 5:55 PM, Graham Percival
<[hidden email]> wrote:
> You'll feel better if you solve this yourself instead of waiting for
> John to fix it for you.  Trust me.

Having to reboot under MS bloody Windows is most certainly *not* going
to make me feel any better. But I'll have a look nevertheless.

Cheers,
V.

---
----
Join the Frogs!