> will play a major role. For example in collation, duplicate encoding
> for similar text or sound will make some problems. Say we want to sort
> a list of name in which "prabath" is also there.
Very true. You should always try to avoid canonical equivalence between two
representations... but this it not a hard & fast rule. In Unicode there are
plenty of cases where such anomalies exist. Our script is very unique, and
behaves differently from other indic scripts. The reph feature is always
applied in Devanagari thus correct 'spelling' will be with a repaya. Leave
the smart IM aside, and concentrate on the encoding.. because rakansaya is
not encoded as a codepoint and is not the default behavior, we use ZWJ to
form rakaransaya. Rakaransaya is a valid consonant modifier used in
contemporary Sinhala, and it has not been recognized as one. If you say
rakaransaya is short hand for <al-lakuna> <ra>, then spelling wise Prabath
could be written without a <rakaransaya>, which in my opinion is wrong
according to the principles underlying accepted usage. Someone might say
spelling should be *not* be considered for encoding but the encoding should
be based on the *correct usage* of a script and it should *not* be designed
to cater for exceptional cases like 'kurakkanyaya'.
I have been thinking about this a bit, and may be we need much more smarter
fonts to handle Sinhala (more opentype lookups ie). Two example given in
one of the Dr. Gihan's papers is 'pasyala' & 'malraj'. This should not be
written with a yansaya & rakaransaya, true. Yansaya or Rakaransaya will
only be formed only with some type of consonants. Sinhala being a phonetic
language, we might be able to classify consonants in the manner we
articulate them (similar to IPA).
<quote src="
http://en.wikipedia.org/wiki/Manner_of_articulation">
Manners of articulation include:
1 Nasals, where there is a total blockage and the sound instead goes
through the nose. Examples include English /m/, /n/, etc.
2 Plosives, or stops, an "explosion" resulting from a momentary closure and
then release of air. Examples include English /p/, /b/, etc.
3 Fricatives, or spirants, where there is continuous friction at the place
of articulation. Examples include English /f/, /s/, etc. Sibilants are a
special type of fricative where the airflow is shaped by the form of the
tongue. /s/ and /z/ are sibilants in English. Lateral fricatives are yet
another type of fricative, where the friction occurs on one or both sides
of the edge of the tongue. The "ll" of the Welsh language is a lateral
fricative.
4 Approximants, (semivowels or liquids), where the sound is only partially
obstructed. Examples include English /w/, /r/, etc.
Lateral approximants, such as the English /l/, is a special type of
approximant formed at one or both sides of the tongue.
5 Taps, where a "tap" at the place of articulation results in an
instantaneous closure and reopening of the vocal tract. The "tt" of "utter"
and the "dd" of "udder" are pronounced as a tap in North American English.
6 Trills, where taps are repeated in rapid succession. The double "r" of
Spanish "perro" is a trill.
7 Ejectives, a special type of stop/plosive where the explosive mechanism
is provided by the glottis (in the throat) instead of the diaphragm.
Implosives, a special type of stop/plosive where there is an inflow of air
due to the downward movement of the glottis.
Clicks (Used in Khoisan languages) These are akin to the "tsk tsk" or "tut
tut" sound in English.
<quote> also see
http://en.wikipedia.org/wiki/Place_of_articulation.
Rakaransaya consonant modifier will be applicable for Plosives, Fricatives
but not Nasals, Approximants consonants. Therefore we can by default form
the rakaransaya for those consonants. You can think this can be a done by a
spell-checker, but what is it going to say? "According to SLS1134 you
should place a 'ZWJ'??" Unicode aware spell-checker wont even see the ZWJ
as it'll be dropped by the collation routine even before comparison!
For any script there's an accepted way of writing (encoding), ZWJ & ZWNJ
are just format enforcers used to breaking the default behavior of a
language. I am not a linguistic specialist but I feel we have not studied
the behavior of Sinhala language and because of this SLS1134 has given a
halfboiled solution which can work anyway you like. We need people like
J.B. Dissanayake and Arisen Ahubudu on this forum, they might not be aware
of the Unicode technologies but we can use their in depth knowledge in
Sinhala to encode Sinhala the best way possible.. its not too late.
> If we add a codepoint to RAKARANSAYA, then two persons typed the name
> using it and without it will appear in 2 different places. In the
> current standard (payanna + al-lakuna + rayanna + ...) and (payanna +
> rakaransaya + ...) will come to the same place while sorting and also
> it helps in searching text.
As Hashula correctly pointed out, we cannot take the sequence of character
code table as the correct sorting order. Proper sorting will happen after
normalizing the text and using a language specific collation algorithm.
<quote author="Hashula"> I don't think the Sinhala codechart is in the
correct order, purely because diacritics 0x0df2 and 0x0df3 seem to be out
of order when compared the corresponding letters 0x0d8e and 0x0d90.
</quote> What I proposed was either give it a codepoint or drop ZWJ for
forming Rakaransaya & yansaya.
> The argument is does anybody will type the name as payanna + al-lakuna
> + rayanna + ... I also think NO. But in the case of REPAYA we see that
> different people write it different ways.
Repaya, is less commonly used in contemporary Sinhala and i wonder whether
it could be taken as a consonant modifier. I feel its more of a style of
classical writing and the encoding doesn't follow the same principals as
the other two... i might be wrong though :-)
> The current encoding treats RAKARANSAYA, REPAYA and YANSAYA equally.
> So all are working well in input, rendering and collation. My question
> is only by figuring out that RAKARANSAYA can be given a codepoint
> without any problems, why we want to add a codepoint only for
> RAKARANSAYA?
I just took Rakaransaya to make a point, you are correct Yansaya should be
treated the same way.
cheers,
Harsha.
ps please let me know if my explanation is poor.. i will try to write it
more clearly :-) and please do reply as I would like to know everyone's
stand on this important matter.. I might be also wrong so please enlighten
me.
This Mail Has Been Scanned For Virus By Scanmail For Lotus Notes
_______________________________________________
Sinhala mailing list
Sinhala@???
https://secure.linux.lk/mailman/listinfo/sinhala