Message-ID: <3E9E5950.1060205 at lix.polytechnique.fr> Date: Thu, 17 Apr 2003 09:35:44 +0200 From: Benjamin Monate MIME-Version: 1.0 To: lablgtk at kaba.or.jp Subject: Re: Problems using the new view widget References: <20030411152228.3a45a003.maxence.guesdon at inria.fr> <20030411154505.0a899eac.maxence.guesdon at inria.fr> <20030411161409.047b3143.maxence.guesdon at inria.fr> <87of3dqcpb.dlv at wanadoo.fr> <20030411152641.GA2621 at iliana> <20030411173230.62ead13e.maxence.guesdon at inria.fr> <87llyhqc3w.dlv at wanadoo.fr> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Here are my thoughts about UTF-8 in lablgtk2, The situation is : inside GTK2 all strings are UTF-8 encoded. Only ASCII 7 bits chars are correct UTF-8 chars. If you use 8 bits chars or any other char encoding directly, GTK2 will warn you at best or even seg fault if you are out of luck. In any case, you will not be able to read your string in your gui. >>>BTW, would it not be better to have lablgtk2 call >>>Glib.Convert.locale_to_utf8 by default each time it is necessary ? >>> >>>As ocaml don't use utf8 strings, but gtk+ apparently does. >> >>I agree. An utf8 type would prevent such runtime errors. > 1) The problem with automatic translation is that your code relies on a user level configuration: the locale. This leads to highly non portable code. Very few user have set a correct locale. And calling Utf8.validate + Convert.* each time you access a string would be costly. 2) Anyway, as you cannot guess the correct encoding for a given string, you have to try to find some heuristical algorithm that will never be correct. The problem arises "only" from the programmer point of view. Any string acquired from Gtk itself is UTf-8. The programmer just has to use very carefully: -literal strings inside the code (see README :-)) -strings coming from the rest of the world (I/O) There is an utf8 string type, but it is not an abstract type. If it were, you would not be able to use directly common OCAML string operators, which would be anoying (^ ,byte length, matching, lexing, Str,...). What do you think of this ? Would it be more convenient to fully abstract UTF-8 strings and to enforce the use of conversion functions for standard operations ? Note that one cannot simply convert back utf8 to locale as it may fail when your UTF-8 strings contain special chars. Friendly Benjamin Monate