From sinhala-admin@linux.lk Tue Mar 08 11:16:03 2005
Return-path: <sinhala-admin@linux.lk>
Envelope-to: lurker@linux.lk
Delivery-date: Tue, 08 Mar 2005 11:16:03 +0600
Received: from localhost ([127.0.0.1] helo=penguin.lug.lk)
	by penguin.lug.lk with esmtp (Exim 3.35 #1 (Debian))
	id 1D8X4d-0002DC-00; Tue, 08 Mar 2005 11:16:03 +0600
Received: from hantana.pdn.ac.lk ([192.248.40.1])
	by penguin.lug.lk with esmtp (Exim 3.35 #1 (Debian))
	id 1D8Lus-00068O-00; Mon, 07 Mar 2005 23:21:16 +0600
Received: from tissa.learn.ac.lk (tissa.learn.ac.lk [192.248.1.164])
	by hantana.pdn.ac.lk (8.12.10/8.12.9) with ESMTP id j27HLAuN066897;
	Mon, 7 Mar 2005 23:21:10 +0600 (LKT)
Received: from localhost (localhost [127.0.0.1])
	by tissa.learn.ac.lk (Postfix) with ESMTP id 28B91344BE;
	Mon,  7 Mar 2005 23:21:11 +0600 (LKT)
Received: from tissa.learn.ac.lk ([127.0.0.1])
 by localhost (tissa.learn.ac.lk [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 09087-06; Mon,  7 Mar 2005 23:20:57 +0600 (LKT)
Received: from ns3.ewisl.net (unknown [220.247.210.88])
	by tissa.learn.ac.lk (Postfix) with SMTP id 07B9E34510;
	Mon,  7 Mar 2005 23:20:56 +0600 (LKT)
Received: from 220.247.210.92 by ns3.ewisl.net (InterScan E-Mail VirusWall NT); Mon, 07 Mar 2005 23:24:20 +0600
Received: from mail.ewisl.net (BHADRA [210.110.110.100]) by mail01.ewisl.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2656.59)
	id FV4GSY6M; Mon, 7 Mar 2005 22:58:04 +0600
Message-ID: <422C7F4A.5030103@mail.ewisl.net>
From: donald gaminitillake <semage@mail.ewisl.net>
Reply-To: semage@mail.ewisl.net
Organization: S Donald E Gaminitillake Associates
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624 Netscape/7.1
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Anuradha Ratnaweera <gnu.slash.linux@gmail.com>
Cc: Harsha Senanayake <harsha.sgit@keells.com>,
 sapumal.jayaratne@gmail.com,
 Harshula <hash@jayasolutions.cjb.net>,
 sinhala@linux.lk,
 sinhala-admin@linux.lk,
 Delan Silva <lakfoil@slt.lk>
Subject: Re: [sinhala] Re: Inscript keyboard layout for Sinhala? &  doesrakaransaya
 deserve a codepoint?
References: <OFC2700A3B.3C3B89BC-ON46256FBD.0038B83C-46256FBD.00435EBD@keells.com>
In-Reply-To: <OFC2700A3B.3C3B89BC-ON46256FBD.0038B83C-46256FBD.00435EBD@keells.com>
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by amavisd-new
Sender: sinhala-admin@linux.lk
Errors-To: sinhala-admin@linux.lk
X-BeenThere: sinhala@linux.lk
X-Mailman-Version: 2.0.11
Precedence: bulk
List-Help: <mailto:sinhala-request@linux.lk?subject=help>
List-Post: <mailto:sinhala@linux.lk>
List-Subscribe: <https://secure.linux.lk/mailman/listinfo/sinhala>,
	<mailto:sinhala-request@linux.lk?subject=subscribe>
List-Id: <sinhala.linux.lk>
List-Unsubscribe: <https://secure.linux.lk/mailman/listinfo/sinhala>,
	<mailto:sinhala-request@linux.lk?subject=unsubscribe>
List-Archive: <https://secure.linux.lk/pipermail/sinhala/>
Date: Mon, 07 Mar 2005 22:20:26 +0600

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
  <title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Dear&nbsp; Anuradha<br>
<br>
<font color="#cc0000">Quote "<br>
</font>
<pre wrap=""><font color="#cc0000">It's not necessary to mention [or rather scream] the above statements in EVERY mail of yours; once is good enough. "

Unquote</font>
</pre>
<br>
<pre wrap=""><font color="#6600cc">"The point that I am streessing is the SLSI 1134 is incomplete." this is the issue here.
<big><b>I wrote many times as you are not answering this point.</b></big></font><big><b> 
</b></big><u><b>
<big>IS SLSI 1134 is COMPLETE sinhala or NOT.</big></b></u>

<b>My answer is NO therefore we have to correct it.</b>

Once we correct the SLSI 1134 everything will fall in line.

Sapumal need not worry about the same name written in different ways etc

Chinese Korean Japanses do have many characters to the same phonetic sound
All these problem have been sorted out.

First we have to focus what has been registered in UNICODE --- SLSI 1134 has to be corrected.
This  is the prime point and you cannot avoid it.


</pre>
<b>Anuradha talks of implementation of Sinhala</b> <font
 color="#990000">(just a joke)</font>&nbsp; just visit our Hon&nbsp; Prime
Ministers web site. It is only in English!!!!&nbsp;&nbsp; Not in Tamil or in
Sinhala . He supposed to be our IT Minister!!!!&nbsp;&nbsp; <br>
<br>
A web site has to be seen by any computer OS (eg unix , Linux,&nbsp; Windows
(any version)&nbsp; and Apple)<br>
<br>
<b>Anuradha talks of implementation of Sinhala SMS</b> but it is
restricted to some modles and data is not compatible with other service
providers . (is this a game of MONOPOLY and fair trade practice)<br>
<big><b><font color="#990000"><br>
The reason for all these problems are lack of proper SLSI for SINHALA.</font></b></big><br>
<br>
First we have to correct it and then go forward.<br>
<br>
<br>
Best<br>
<br>
Donald<br>
<br>
<br>
<br>
<br>
Harsha Senanayake wrote:<br>
<blockquote type="cite"
 cite="midOFC2700A3B.3C3B89BC-ON46256FBD.0038B83C-46256FBD.00435EBD@keells.com">
  <blockquote type="cite">
    <pre wrap="">will play a major role. For example in collation, duplicate encoding
for similar text or sound will make some problems. Say we want to sort
a list of name in which "prabath" is also there.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Very true. You should always try to avoid canonical equivalence between two
representations... but this it not a hard &amp; fast rule. In Unicode there are
plenty of cases where such anomalies exist. Our script is very unique, and
behaves differently from other indic scripts. The reph feature is always
applied in Devanagari thus correct 'spelling' will be with a repaya. Leave
the smart IM aside, and concentrate on the encoding.. because rakansaya is
not encoded as a codepoint and is not the default behavior, we use ZWJ to
form rakaransaya. Rakaransaya is a valid consonant modifier used in
contemporary Sinhala, and it has not been recognized as one. If you say
rakaransaya is short hand for &lt;al-lakuna&gt; &lt;ra&gt;, then spelling wise Prabath
could be written without a &lt;rakaransaya&gt;, which in my opinion is wrong
according to the principles underlying accepted usage. Someone might say
spelling should be *not* be considered for encoding but the encoding should
be based on the *correct usage* of a script and it should *not* be designed
to cater for exceptional cases like 'kurakkanyaya'.

I have been thinking about this a bit, and may be we need much more smarter
fonts to handle Sinhala (more opentype lookups ie). Two example given in
one of the Dr. Gihan's papers is 'pasyala' &amp; 'malraj'. This should not be
written with a yansaya &amp; rakaransaya, true. Yansaya or Rakaransaya will
only be formed only with some type of consonants. Sinhala being a phonetic
language, we might be able to classify consonants in the manner we
articulate them (similar to IPA).

&lt;quote src=<a class="moz-txt-link-rfc2396E" href="http://en.wikipedia.org/wiki/Manner_of_articulation">"http://en.wikipedia.org/wiki/Manner_of_articulation"</a>&gt;

Manners of articulation include:
1 Nasals, where there is a total blockage and the sound instead goes
through the nose. Examples include English /m/, /n/, etc.
2 Plosives, or stops, an "explosion" resulting from a momentary closure and
then release of air. Examples include English /p/, /b/, etc.
3 Fricatives, or spirants, where there is continuous friction at the place
of articulation. Examples include English /f/, /s/, etc. Sibilants are a
special type of fricative where the airflow is shaped by the form of the
tongue. /s/ and /z/ are sibilants in English. Lateral fricatives are yet
another type of fricative, where the friction occurs on one or both sides
of the edge of the tongue. The "ll" of the Welsh language is a lateral
fricative.
4 Approximants, (semivowels or liquids), where the sound is only partially
obstructed. Examples include English /w/, /r/, etc.
Lateral approximants, such as the English /l/, is a special type of
approximant formed at one or both sides of the tongue.
5 Taps, where a "tap" at the place of articulation results in an
instantaneous closure and reopening of the vocal tract. The "tt" of "utter"
and the "dd" of "udder" are pronounced as a tap in North American English.
6 Trills, where taps are repeated in rapid succession. The double "r" of
Spanish "perro" is a trill.
7 Ejectives, a special type of stop/plosive where the explosive mechanism
is provided by the glottis (in the throat) instead of the diaphragm.
Implosives, a special type of stop/plosive where there is an inflow of air
due to the downward movement of the glottis.
Clicks (Used in Khoisan languages) These are akin to the "tsk tsk" or "tut
tut" sound in English.

&lt;quote&gt; also see <a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Place_of_articulation">http://en.wikipedia.org/wiki/Place_of_articulation</a>.

Rakaransaya consonant modifier will be applicable for Plosives, Fricatives
but not Nasals, Approximants consonants. Therefore we can by default form
the rakaransaya for those consonants. You can think this can be a done by a
spell-checker, but what is it going to say? "According to SLS1134 you
should place a 'ZWJ'??" Unicode aware spell-checker wont even see the ZWJ
as it'll be dropped by the collation routine even before comparison!
For any script there's an accepted way of writing (encoding), ZWJ &amp; ZWNJ
are just format enforcers used to breaking the default behavior of a
language. I am not a linguistic specialist but I feel we have not studied
the behavior of Sinhala language and because of this SLS1134 has given a
halfboiled solution which can work anyway you like. We need people like
J.B. Dissanayake and Arisen Ahubudu on this forum, they might not be aware
of the Unicode technologies but we can use their in depth knowledge in
Sinhala to encode Sinhala the best way possible.. its not too late.

  </pre>
  <blockquote type="cite">
    <pre wrap="">If we add a codepoint to RAKARANSAYA, then two persons typed the name
using it and without it will appear in 2 different places. In the
current standard (payanna + al-lakuna + rayanna + ...) and (payanna +
rakaransaya + ...) will come to the same place while sorting and also
it helps in searching text.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
As Hashula correctly pointed out, we cannot take the sequence of character
code table as the correct sorting order. Proper sorting will happen after
normalizing the text and using a language specific collation algorithm.
&lt;quote author="Hashula"&gt; I don't think the Sinhala codechart is in the
correct order, purely because diacritics 0x0df2 and 0x0df3 seem to be out
of order when compared the corresponding letters 0x0d8e and 0x0d90.
&lt;/quote&gt; What I proposed was either give it a codepoint or drop ZWJ for
forming Rakaransaya &amp; yansaya.

  </pre>
  <blockquote type="cite">
    <pre wrap="">The argument is does anybody will type the name as payanna + al-lakuna
+ rayanna + ... I also think NO. But in the case of REPAYA we see that
different people write it different ways.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Repaya, is less commonly used in contemporary Sinhala and i wonder whether
it could be taken as a consonant modifier. I feel its more of a style of
classical writing and the encoding doesn't follow the same principals as
the other two... i might be wrong though :-)

  </pre>
  <blockquote type="cite">
    <pre wrap="">The current encoding treats RAKARANSAYA, REPAYA and YANSAYA equally.
So all are working well in input, rendering and collation. My question
is only by figuring out that RAKARANSAYA can be given a codepoint
without any problems, why we want to add a codepoint only for
RAKARANSAYA?
    </pre>
  </blockquote>
  <pre wrap=""><!---->
I just took Rakaransaya to make a point, you are correct Yansaya should be
treated the same way.

cheers,
Harsha.

ps please let me know if my explanation is poor.. i will try to write it
more clearly :-) and please do reply as I would like to know everyone's
stand on this important matter.. I might be also wrong so please enlighten
me.


This Mail Has Been Scanned For Virus By Scanmail For Lotus Notes</pre>
</blockquote>
<br>
<pre wrap="">Hi,

I think there are few things that should be considered in addition to
typing and displaying text. When we want to process text the encoding
will play a major role. For example in collation, duplicate encoding
for similar text or sound will make some problems. Say we want to sort
a list of name in which "prabath" is also there.

If we add a codepoint to RAKARANSAYA, then two persons typed the name
using it and without it will appear in 2 different places. In the
current standard (payanna + al-lakuna + rayanna + ...) and (payanna +
rakaransaya + ...) will come to the same place while sorting and also
it helps in searching text.

The argument is does anybody will type the name as payanna + al-lakuna
+ rayanna + ... I also think NO. But in the case of REPAYA we see that
different people write it different ways.

The current encoding treats RAKARANSAYA, REPAYA and YANSAYA equally.
So all are working well in input, rendering and collation. My question
is only by figuring out that RAKARANSAYA can be given a codepoint
without any problems, why we want to add a codepoint only for
RAKARANSAYA? Doesn't it violate the beauty and the systematize nature
of the standard. I also guess that there may be more reasons for this
which known by language experts.

I also have another question with the "eng" key in the keyboard
layout. As my guess it is for  swtiching to English mode. If so, are
we going to use the same for switch back to Sinhala mode which is
nonsence as we need caps lock key when typing in English?

Thanks ans regards,


Well said Sapumal...  I totally agree with your comments on collation
and related matters.

In GNU/Linux input methods (QT and GTK), caps lock was proposed, but I
chose to go with ctrl+space (shift+space is used for non-breaking
space) instead because of the very reason.

On Sun, 06 Mar 2005 20:19:22 +0600, donald gaminitillake
<a class="moz-txt-link-rfc2396E" href="mailto:semage@mail.ewisl.net">&lt;semage@mail.ewisl.net&gt;</a> wrote:
</pre>
<blockquote type="cite">
  <pre wrap=""><span class="moz-txt-citetags">&gt; </span> 
<span class="moz-txt-citetags">&gt; </span> The point that I am streessing is the SLSI 1134 is incomplete.
<span class="moz-txt-citetags">&gt; </span> What Sri Lanka registered with the unicode is incomplete set of sinhala
<span class="moz-txt-citetags">&gt; </span> We got to correct this
  </pre>
</blockquote>
<!---->Dear Donald,
It's not necessary to mention [or rather scream] the above statements
in EVERY mail of yours; once is good enough. I haven't got Amnesia (
<a class="moz-txt-link-freetext"
 href="http://en.wikipedia.org/wiki/Amnesia">http://en.wikipedia.org/wiki/Amnesia</a>
), and I don't think Harshula,
Harsha or anyone else here has it either... ;-p
First, it looked like you wanted to really improve Sinhala encoding,
but to me now it looks as if you are not even reading our mails
carefully, and desperately trying to get anything (such as SLS 1134)
against your "patent" out of the way.
We implemented it for GNU/Linux successfully about an year ago
(keyboard, rendering and encoding), Microsoft has implemented it in
Windows now, and it has recently got into SMS. Those are obviously
deadly blows against your claims, and the "patent". If you are a
learned intellect, the best and respectable thing is to accept the
reality and join with the community; and that will be for the
betterment for the country. Anuradha
<pre wrap="">

        Anuradha
Sapumal.
</pre>
<br>
<blockquote type="cite"
 cite="midOFC2700A3B.3C3B89BC-ON46256FBD.0038B83C-46256FBD.00435EBD@keells.com">
  <pre wrap="">

  </pre>
</blockquote>
</body>
</html>



_______________________________________________
Sinhala mailing list
Sinhala@linux.lk
https://secure.linux.lk/mailman/listinfo/sinhala

