Dev/Utf-8

From Eggdrop Wiki

Jump to: navigation, search

Contents

Situation now (1.6.19)

#define USE_TCL_BYTE_ARRAYS (Tcl >=8.1)

Tcl commands are added using Tcl_CreateObjCommand and routed through utf_converter. The arguments for the binds are set to temporary Tcl variables (_raw0, _pub2, ..) using Tcl_SetVar which sets them as Tcl Strings. The arguments utf_converter gets are mangeled to ByteArrays which destroys UTF-8.

#undef USE_TCL_BYTE_ARRAYS (Tcl <= 8.0)

Tcl commands are added using Tcl_CreateCommand. The arguments those commands get are UTF-8 Strings.

The chanfile can't deal with channels with various encodings

Because it's written as char[] arrays, it contains the literal bytes of the channel names. Saving it works fine. Loading it with Tcl's source (Tcl_EvalFile) automatically makes Tcl convert that stuff from system encoding to UTF-8. This breaks it. This can be fixed by setting the System Encoding to "identity" before loading it and then restoring the old one.

Something with LC_* detection is broken

???

Facts

How the Tcl API functions we currently use to bridge C behave

This illustrates what we have to deal with and are constrained by. There're of course conversions functions available to convert utf-8<->anything. I should mention that the Tcl'ers discussed about taking away the "identity" encoding as they consider it to be calling for bugs and that it was a mistake.

Personal tools