Re: [sinhala] Re: Sinhala GNU/Linux

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Harshula
Date:  
To: Harsha Senanayake
CC: sinhala
Subject: Re: [sinhala] Re: Sinhala GNU/Linux
On Wed, 2005-03-02 at 10:58 +0600, Harsha Senanayake wrote:

> > The 3 alternate forms are pronounced the same but are, obviously,
> > rendered differently. The rendering of touching and conjunct letters do
> > NOT display the al-lakuna. But when encoding, the al-lakuna needs to be
> > there for sorting.
>
> <quote src="prev mail"> "Sooo.. the touching letters are categorized as
> cursively connected letters? Can 'ksha' be written as touching letters
> without being ligated? in that case will the inherent vowel sound need to
> be cancelled by an 'al-lakuna' ? Can you consider 'nda' (when joined) as a
> ligature, or is it just a cursive form? " </quote>


I'll reply to the previous mail rather than here.

> Sorting is not the primary reason for encoding al-lakuna, its for
> supressing the inherent vowel sound of a consonant.. otherwise it'll break
> the phonetic nature of unicode. Talking about sorting, with the ZWJ, won't
> it mess up the order if you have multilingual text to compare? And in
> Sinhala can we take the sequence of unicode characters to be correct order
> for collation?


One of the benefits of a phonetic implementation is that the sorting
should be easier. Hence, by using the al-lakuna consistently in all
three cases it should make sorting easier.

The sorting/collating algorithms can simply ignore Zero Width
characters. That should take care of it.

I don't think the Sinhala codechart is in the correct order, purely
because diacritics 0x0df2 and 0x0df3 seem to be out of order when
compared the corresponding letters 0x0d8e and 0x0d90.

You maybe interested in:
http://www.unicode.org/unicode/reports/tr10/#Tailoring

5 Tailoring
Tailoring is any well-defined syntax that takes the Default Unicode
Collation Element Table and produces another well-formed Unicode
Collation Element Table. This syntax can provide linguistically-accurate
collation, if desired. Such syntax will usually allow for the following
capabilities:

1. Reordering any character (or contraction) with respect to others
in the standard ordering. Such a reordering can represent a
Level 1 difference, Level 2 difference, Level 3 difference, or
identity (in levels 1 to 3). Since such reordering includes
sequences, arbitrary multiple mappings can be specified.

2. Setting the secondary level to be backwards (French) or forwards
(normal).

3. Set variable weighting options.

4. Customizing the exact list of variable collation elements.


cya,
#


_______________________________________________
Sinhala mailing list
Sinhala@???
https://secure.linux.lk/mailman/listinfo/sinhala