Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
~o~
T}lis il-vcn~iorl r~lates to cl process and a device for
thc printir~ of all larlgua-~es which usc the ~rabic-Far.~i script.
In languay~s which use the Arabic-Farsi script, the
alphabetic characters have a phonetic similarity with the English
alphabet, but each character assumes different shapes depending
on its location in a word and on the character or symbol that
precedes and follows it.
The multiplicity of shapes helps in information com-
pression, as characters need not be written in their complete and
isolated form. This advantage in the handwritten form, however,
has led to problems in printing and reading this family of
languages.
The complexity of transfer from the handwritten word
to print may be considered and solved at five levels of decrea-
sing difficulty and cultural acceptance:
I) Handwritten reproduction, using the precision and
elegance of calligraphy, with the diacritics to indicate phonetic
emphasis clearly indicated. This method has been used histori-
cally for the printing of literature and holy scriptures.
II) A simplified version of calligraphy used for everyday
writing This script is usually written without diacritics and
may be slightly different in appearance among Urdu, Farsi and
Arabic.
III) A simplified subset of the script adapted for manual
or electric typewriters. m ese, depending on their design, are
likely to have four shapes and keys for each character, i.e.
initial, final, medial and isolated; in some cases only two,
; initial (also used as medial) and final (also used as isolated).
m e user supplies the linking information, shifting the carriage
on the typewriter ke~oard in t'he middle of the word if necessary,
lV~ ~h()~j
deperl~iny o~l the posi~ion of t~JC ch~racter in the word. The
~yping proccss, hecause of this ad~e~ requirernent to reme~ber
the conteY~t, is relativcly slow.
IV) The next level of simplification is to have only one
form per character. This printed form is quite different from
the handwritten script. In communication systems that use tele
type or similar output devices, this involves minimum technical
modification. By using a modified printing head, and reversing
the direction of printing, an English teletype can be used to
print Arabic-like languages. Since the output has little resem-
blance to the written form, user acceptance would require a
radical break with deep-seated cultural tradition.
V) Yet another level of simplification is the replacement
of the Arabic script characters by a phontetically equivalent
English alphabet. The language is altered to be written in
Roman form, and is phonetically and semantically the same as
before. Visually it is radically different. This involves no
technical modification to the printing device. It is apparent
that at present functional efficiency in printing and aesthetic
quality are at opposite ends of the scale. Furthermore 5 the
choice of a particular method of printing is determined by such
diverse factors as effect on employment, cultural tradition,
requirement for high speed output, cost, appearance, equipment
reliability and availability, and resistance to change.
- At present the language is transcribed to the printed
form either by hand (level I) or by mechanical means (level III),
both of which are very slow methods compared to the printing speed
of western languages.
For telecomrnunications, solutions at level IV using
isolated character~ have been implemented on telex-type equipment
- 2 -
~v~
on an experimental basis. A~ stated earlier this is an unsuit-
able solution, since the machine output has little resemblance
to the written form.
It has been stated earlier that in the languages using
Arabic-Farsi script the shape of a character i~ dependent upon
its location and contextual posi~ion in a word. Consequently
printing devices must have multiple keys and shapes for a single
character of the alphabet. A user must, on the basis of his
knowledge of the script, make the right choice of character shape.
This makes the process of transcribing the language slow and
tedious, while, at the same time, the devices used are themselves
cumbersome and inefficient.
A feature of the present invention is to incorporate
in a logic circuit the tradition and rules of writing and the
related memory requirement of the user whereby to reproduce the
natural style of a language using the Arabic-Farsi script.
In accordance with a specific embodiment, an electronic
digital system comprises electronic circuit means for identifying
the concatenation properties and shape of character strings in
a word from languages that use the Arabic-Farsi script, sa~d
electronic circuit means having an input device, circuit means
for identifying said concatenation properties and shape of said
strings and an output device wherein said circuit means for
identifying the concatenation properties of character strings
comprises a first circuit register for storing a last symbol of
a string of three successive symbols, a record register for
storing the middle symbol of said string of three symbols under
current analysis, an output circuit for storing concatenation
information in a status register circuit after the analysis of
the fir~t character in the said string of three symbols has been
carried out, a memory device for verifying the concatenation
property of ~aid rniddle and last symbols, and analyser circuit
~ _ 3 _
--f''" s
~ 3~j
means for determining the shape of said middle symbol for repro-
duction by said output device, and wherein there is further com-
prising a control circuit to synchronise the operation of said
circuit means for identifying the concatenation properties of
characters, said output device being adapted to reproduce said
word at a speed commensurate with the English language while pre-
serving the natural style calligraphy of said languages.
According to a further broad aspect, the present in-
vention provides a method of reproducing languages using the
Arabic-Farsi script comprising the steps of determining if the
character links or does not link; determining if there is a link
on each side of the character; determining the direction of a
link; generating numbers defining possible linking properties
determined in the above steps, and reproducing characters of the
languages at a speed comparable to the English language while
preserving the natural calligraphic style of the language.
The present invention is an advance in the art and
technique of printing the family of languages using the Arabic-
Farsi script to a level comparable to the efficiency of printing
the English language. Potential applications of the invention
- 3a -
b()f~
are for U5C with tel (?t ~e'; ~or busirless, hospit~ls, ;~irlines,
industry, an~ e~ucation. I~lso, the invention will provide fGr
simpli~ie~ typewriters, working at the same speed as tho~e for
the western alphabet. Eurther, the invention ca~ be used for
automatic and photo-composition in the printing industry, gra-
phical display devices, and writing on illuminated bulbs used in
cities for news and advertising. The latter is a very common
method of communication in big cities in that part of the ~Jorld
using languages with Arabic-Farsi script.
The present invention also preserves the natural beauty
of calligraphy e.g. Naskh and Riquaa scripts in the case of the
Arabic language, without compromising it with technical limita-
tions. The introduction of new technology which helps to pre-
serve culture and tradition will evoke a very positive emotional
response in the users, and with time new applications will develop
in the countries where the languages using Arabic-Farsi script
are spoken.
The accompanying drawing is a block diagram illustra-
ting a communication system utilizing the present invention.
The word "Urdu" will be used in the following descrip-
tion to denote the family of languages using the script of the
Arabic-Farsi languages. A new theory has been developed to form
the basis of the hardware design of the present invention. This
is a first step in building the logical system, which is a par-
ticular embodiment of the principles delineated below.
Let VE = ~A, B, ..., Z~ be the set of characters of the
English alphabet, and let VE be the set of characters of the Urdu
alphabet whose elements have a phonetic similarity with the cor-
responding characters in English. However~ Urdu, depending on
country and usage, may have up to 35 characters. Let V0 be the
- 4 -
(3SJ
~olnplctc ~ of ~ ra~ rs of t~lf~ Vr~u ~ }l~bc~, then ~ = V~ U
~a~ditional char~ct~r~ of Ur~u ~tit~lout corr~sp~n~rlc~ in Erlgli.~h~.
Next, d~fine Vx to be the s~t of sy~ols that ne~d rlot
be analysed in the formation of a word, since they ar~ printed
without modification. This set includes numerals,punctu~tion
marks, and, most important, diacritics that are used in Urdu
- to denote phonetic information.
The total alphabet, VA, that needs to be considered is
then:
VA = VO U Vx
For the purpose of the analysis, the set VA is parti-
tioned into four groups. This partitioning is based on the
applicant's interpretation of the script. It may be modified
depending upon the country, language and individu~l preferences
of the user. The importance of this partitioning will be ex-
plained later.
Let the Urdu charactcr corresponding to thc English cha-
racter Ci be called ~C ~ where Ci VE. Next, define ~ij as
the Urdu character script shape of the type j corresponding to the
English character Ci for i = 1, ..., 26; j Ii, wherc for each
i, Ii is the set of j5 for which the script shape ~ij exists. For
the sake of simplicity one may write Wsj to denote ~ij for s-- Ci,
e.g. ~A5 = ~1 5. The availability of shapes may be represented by
the Boolean~latrix Ai j which signifies that for a given character
Ci, and for j - 0, 1, ..., 7 if for j = j', 0 < j' <, 7, then if
A = 1 ~ , exists
,J
~ ~i jl does not exist.
-- 5 --
;
,
1(14~()b
The availability rna~rix is irnp]crner)tcd i n a Read O~ly
Memory, and plays an important rolc in thc hardwclre dcsign as
will be describ~d later with refcrence to a script ~rocessor
design.
It should be noted that Urdu is written from right to
left. Consider the concatenation properties of an Urdu charac-
ter ~i. Let A, B and C be three Boolean variables which describe
the following concatenation properties.
i) A = O symbol concatcnatcs.
A = 1 symbol does not concatenate. It is isolated
or initial or terminal.
ii) B = O links down to the lcft
B = 1 links up to tlle left
iii) C = O links down from the right
1 links up from the right
The properties are summarized in Table I which follows.
-- 6 --
-104~
B C llin-tcr)n Commcrlt
O O O PO _ _ Links do~m L
Links down R
. Concatcnatcs in ~oth dircctions.
O O 1 Pl Links down L
Links up R
Concatenates in both directions.
O 1 0 P2 ~ Links up L
Links down R
Concatenates in both direction.
O 1 1 P3 ~ Links up L
O ~ Links up R
Concatenates in both directions.
1 0 0 P4 . ~ Links down R
Terminates on L.
1 0 1 P5 . I,inks up R
~ Terminates on L.
1 1 0 P6 ~--{} Links up or down at L.
Initial. No links on R.
1 1 1 P7 C~ ~oes not links on L or R
_ Isolated symbol. L
. _ _ _ _ _ . _ .... _ _, _
Tahle 1- Link Tablc
We as~igJl to j ;n ~jthe suffix of the corrcspolldi1lg ~ term
- 7 -
h~tj
~ rllc F,n~Jl.isll ~harclcte~s ~, B, D, ~, for ~xam~>le ~/ill
have the following associ~ted yraphic sha~es an~ names in the
Urdu writing syst~rn.
Lct~er P-term / ~i / gra~hic shapc
] ,
English Urdu , P P1 P2 P3 P4 p~ P6 P7
A ~A5 ~A6 ~A
~ 7
B ~B ~ ~Bl ~B3 ~B5 ~B6 ~B7
D ~D ~1)5 ~D6 ~D7
>
J J2 J4 ~J6 J7
Table 2 - Shapes of symbols A, B, D ~ J
'rhe domains for graphic shapcs ~Ci in Urdu for the English
character Ci are:
{~A5' ~A6' ~A7}
~ B {~Bl' ~B3' B5' ~B6' ~B7
- ~D = {~D5' ~)D6' ~D7}
J {~J2' ~J4' ~J6~ ~J7}
The first two rows of the availability matrix A
would then be
¦ O O O O 0 1 1 1 ¦
O I 0 1 0 1 1 1 1
v~
~ s m~JItioJ~ rlic~, the s~t of thc ~t~l alph~et
VA is p~rt.itioncd into foux g~o~p~-; suc~h that thc ~har~c~r~
having th~ samc architectural characteristics in their Urdu
form and similar concatenation propertie.s constitute the same
class of the partition.
VA = {Vs~ Vu. VD> Vx}
For the purpose of illustration, let VE = {vs) Vu, VD)
,
where Vs C Vs ~ Vu ~- Vu and VD e VD.
,
The characters in this partition Vs=~ } have the property
that they do not concatenate with the successor.
VD
The right link (connecting with the predecessor) of the char-
aeters points do~nwards, For example characters of the type
~iO~ ~i2 and ~i4 would be included in this partition.
, VU
The right link of the characters points upwards. Urdu graphics
or the type ~ 3~ and ~5 would be included in this parti-
tion.
20 Vx
This partition which includes numerals etc... has been described
earlier.
It is assumed that the four partitions do not contain any
eommon elements.
g _
(3t~
In thc currcllt dcsi gll
VS =~ J W~, W~ )o}
VD ={~1' wJJ wl~}
U { E Vu Vs}
As stated earlier the choice o~ characters in a
partition is based on the applic~nt's understanding of tne
script~ It could vary depending on the language, the country
and the user.
The following description relates to the details of
a transformational grammar, which accepts characters in their
input sequence and performs a forward scan for the analysis.
For the sake of completeness some basic definitions are re-
viewed.
A grammar G = (VT~ VN~ P, ~) is a 4-tuple that
consists of
VT a terminal vocabulary
VN a non-terminal vocabulary
P a set of production rules
~ a sentence symbol which
is member of VN.
If each production is of the form
whcre ~ and ~ are in (Vl. U VN)* and ~ is in (Vr U VN) - {~3, where
: {c3 is the empty word, thcn the grammar G is called context sensi-
tive. It should be notcd that ~ and ~ may be null, and w may not
be empty. Specifically VN = V~ U a , and Vr = {wij¦ i c {1,...,35~,
aij rf- O} U ~#} U ~VK~ } i5 the sct of tcrminal Urdu charactcr gral~h-
ic~ au~ncntcd by thc delirllitcr #, and thc set V~. It is rccallcd
that the symbols in Vx arc printcd Wit]lOUt modificatioll.
,
-- 10 --
~o~v~
~ 11c gr.lln~ r dcs(:ri.~)cd }ac:l,ow tr~lrn~lCorlns
.~ords IYrittcn in llrdu cll;lractcrs,i.c. st~ia)gs ovor VO,into wor~s
written in well-ror~nc~ Ur~u script ~r.lphics,i.c. stlin~s ovcr V,~. It
is assumcd that a sufficicnt numbcr of pro~1uction rulcs of thc
form c ~ 1 cxists,wherc a is a word writcn with Urdu cha-
racters (~ E V *) . Thesc rulcs gcneratc the languagc, e.g. Ara-
bic or Farsi, and arc differcnt for cach language. They are of
no conccrn to the thcory of the invcntion The rules
which transform the word of a language to its written form are
context sensitive,and are given bclow as:
RO: This is a large set of production rules of the form
o ~~3~ Sl~... Snfl, where Sl, ..., Sn E Vo and Sl, ..~ Sn
is the pseudo-English representation of an Urdu word.
R1: Si S~ i7 Sj for Si, Sj E VX U #
i j i7 j for Si E {Vx U fl} and Cj E V o
R3: ~k~ Ci Cj ~ K~ ~i7 Cj f i S
and ~ E {4, 5, 7}
R4 ~k~ Ci Cj >~k~ ~i6 Cj for Ci ~ D U s
and ~ E {4, 5, 7}
-- 11 --
IL()~hO~
5: ~k~ i5 j j ~ S
and ~ ~ {0, 2, 6}
R6: k~ Ci Cj >wk~ ~j4 Cj for Cj ~ Vs
and ~ ~ {1, 3, 6}
R7: ~k~ Ci Cj--~k~ i~ Cj for Cj ~ Vu
and Ci E Vu and ~ ~ {2, 3, 6}
~k~ Ci Cj >~kQ ~i2 Cj for Cj ~ Vu
Ci ~ VD a]~d ~ ~ {. 1, 6}
R9: ~kR Ci Cj ~ ~ iO Cj for Cj ~ VD,
Ci ~ VD alld ~ {O, 1, 6}
kR Ci Cj >wk~ ~il Cj for Cj ~ VD,
Ci ~ VU and Q ~ {2, 3, 6}
Rll: ~ Ci >wk~ ~i4 for Ci ~ VD
and ~ {~, 1, 6}
R12 ~k~ Ci # >~kR ~i5 ~t for Ci U S
- and ~ ~ {2, 3, 6}
- R13: ~k~ Ci # ~k~ ~i7
.
These rules formally express the tradition of
writing the Urdu language. This is a new idea, and forms an
important and integral part of the hardware design of the
present invention.
The theory and logical design of the machine which
perform~ 'che syntactic transformatiorl described previously are
given belC~1/1.
- 12 _
3t~
It i~; well J~r~o~trl ~h,lt ~ conteY.t ser,sitive lar,guat~e is
acceptcd by a li~car hounded automaton, Ilo~Jever, in this case,
while the grammar is context s~nsitive, the requirern~nt is to
find a transducer that would both accept and transforrn, It
appeared reasonable to find a finite state deterministic automaton.
The production rules of the grammar of script genera-
tion may be re-stated as under:
The string (actually written from right to left in
Urdu)
~k~ Ci Cj
and its concatenation characteristic,s are expressed in terms of
four new Boolean variables Ed, Eg, Ri, and Rj. They are des-
cribed below:
d
The character Ck that had been prcviously transformed to
~k~ is replaced by Ed, such that
0, if R {4, 5, 7}, and
Ed = 1 otherwise
It describes the contatenation characteristics of the two char-
acters Çi (undergoing analysis) and Cj (last input ), as follGws:
~ 0 if Ci Vs U V or CjcVx, and
g l 1 otherwi.se
Ri and Rj
These Boolean variablcs, Ri and Rj, describe the
right link properties of the characters Ci and Cj respectively.
- 13 -
~v~
~0 ri~l-t ~;"~ 10~"~
i gll t l i rl li up
Next, the new output Boolean variab].es SO, Sl, S2
are defined, which help in code translation from the input vari-
ables Eg, Ed, Ri and Rj.
The following table may be easily constructed from the
production rules described earlier.
. . __
R.Ri ~: Ed S0Sl S2 Output¦ Rule
J g _ _ . _
_ 00 1 1 1 73,13
_ _ _
- O O1 1 O O 4ll
1 O1 1 O 1 512
. .. _ _ _ _
_ O O1 1 O O 4 6
1 O1 -1--~ O ----1 5 5
_ __
_ _ lO 1 I o--- 6 4
O O 1 1 O O O O 9
O 1 1, 1 0 O 1 1 10
1 O 1 1 O 1 O 2 8
1 1 11 1 1 0 1 1 3 7
Ta~le 3. Code translation Table
By simplification the Booiean variables S0, Sl, S2
- may be obtained in térms of the variables Eg, Ed~ Ri~and Rj as follol~s:
S0 ~ Eg ' ~d ~- Ed Ri
S~ = Eg ' ~d ' Rj ~ ~d
2 g d
Ut~
7hc ~bovc 1'Cj)I-~',Cllts ~ eod~ tr~nslatioJ~ s~hem~
T: {o,]~ {O,l}n , m ~- n
where m, n are l:he dimensions of the Boolcan spaccs (4 and 3
in this casc) of thc input and output respcetively.
Thus, the v~riables SO, Sl, S2 give the rep-
resentation of the form of the Urdu graphic ~in eorrcsponding
to the character Ci in the string Ck Ci Cj, in terms of the
eoneatenation and linking propertics o~ thc characters in thc
string.
The operation will now be described. The analysis
of the character string is performed in a uniform manner, no
distinction being made between characters in different parti-
tions of VA, i.e. Vu, VD, Vs and Vx. The output follows the
input with a one symbol delay. This mode of operation results
in a simple design, by minimizing the problems of synchroniza-
tion, timing and control. In a communication system where two
teletype like devices are linked to each other, the method pro-
posed here eliminates the impression of erratic functioning on
the user, who anticipates and receives a continuous message, not
being aware of the delay. To the sender, inspite of the one
symbol delay, this method with the feature of continuous output
is equally attractive.
Por the purpose of illustration let us recall the
proccss of analysing the string ~k~ Ci Cj. It is noted that
the previous symbol Ck had been analysed as the Urdu graphie
~k~ Ci is the symbo~ under analysis, and Cj is thc last symbol
rcceivcd. The overall de~ign of the script processor shown in
the dr~wing will now be described with reference to the pro~
ces~iny of the ~tring ~k~CiCj.
- 15 -
After ~ignal correction, the characters from the
KSR-33 teletype 1 enter the eight bit input register 2, which
now contains the symbol Cj. The present symbol Ci, currently
under analysis was received from the input register 2, and is
now stored in the second register 8. A coupling interface,
not shown in the figure, is placed between the teletype and
the processor; the operation of which is herein described. The
last symbol Cj is decoded in the decoder 3 and the availability
matrix implemented in the Read Only Memory 5 is read to deter-
mine the available shapes for the character Cj. The availabilitystatus is entered in the analyser module 7 and in parallel stored
in the availability register 6. On the completion of the analysis
of the character Ci, discussed later, the symbol Cj is entered
into the ~egister 8 to become the new middle symbol Ci in the
chain. It becomes the new middle symbol Ci in the chain, and
is analysed on the arrival of a new symbol Cj in the first
register. The availability status is used to give the linking
properties Ri and Rj described earlier. The state of the
Boolean variable Ed, defined earlier, as determined from the
symbol ~k~' analysed previously, is stored in the status register
11. This register is set by the analyser module 7 or by the
input synchronizer 4. In particular the synchronizer 4 may
enter a blank or initial state in the state register at start
up or on the incidence of carriage control or other special
symbols. me state variable Eg is determined from the variable
Ri and Rj, using the relation defined earlier. The analyser
module 7 implements the code translation scheme and yields an
output of three variables S0, Sl, and S2 as described earlier.
The decoder 10 interprets this as one of the eight possible
shapes of the character Ci, currently undergoing analysis and
stored in the register 8. Following the output from the analyser,
- 16 -
~ 4 ~ ~ ~t~
it also sigrlclls tho outr~ut contrc>ller 9 to print the d~co(led
shape (~im corrc~sponding to the c~Jaract~r Ci on th~ output
device, the KSR-33 teletype 12 in this case.
For the purpose of testing the proccssor shown in the
drawing, the teletype output was modified to simulate Urdu
writing with appropriate linkages. In this representation
markers are printed around each character, i.e. before and after,
to indicate its linkages if they exist. The method is sho~m
below:
U link up forward (right in English, left in Urdu).
~ link down forward (right in English, left in Urdu).
~ link up backward
O link down backward
I initi~l
D Independent surrounded by blanks - -
o n Terminal down, up backward.
As an example, let us consider the word JOAB, which
means "answer" in the Farsi language, and is printed on line 2 of
Table 4. The analysis follows as under.
~~ Rule ~ # JOAB #
~V~ '~h()f~
t/J ~ i7
i7 ~~ 4 ~ J6
WJ6OA RU 1 C ~ WJ6WO5A
WOSAB RU I C ~ 05 A7
WA7 B ~ ` Ru l e wA7 ~B7 jy
h str ing #wJ6wos``A7wB7~ is pr inted on the
te I etype as J ! ' O,JA.,B .
7 In addition to the above example, other words are
printed by the processor in pseudo-Urdu showing their correct
10 linkage and are shown in Table 4, which is the actual output
produced by the system on a KSR.33 teletype.
- 18 -
~L04~
:.................... . . . .;
G~ 'O K A
. ~ . . . . .
Jl ~o A B
B ! 'O L :- i ' `
.. . . .
Bl'R Bl'G''E
A G I ~ A . , - `
J I l' A N
A Bl'A ~ ,
G I ' A N - .
Bl 'B' 'A
... . . .
Kl 'O FJ 'B' 'A ~ ; ~ ; .
KI 'E~ ~A R E ; . .
A Ml 'E
,'
K 1 'E ' ' A R ~ , '
~ . : '
A D R .,
D A R . . ;
D A . . .
F I ' D A . . ;
. ~
Fl 'A D
.~ - - , . . . .. . . . . .
A M l ' D B l ' D -~
. ~ ,: . ., . :
,: , - j .. - ~. ~ , ;., -,
. . .
.
J , . . . ~ . ,
, ~ , ' " ' ' ~ . ' .
rr~BL13 4 - PSEUDO-URI)U OUTPU T PRODUCED BY tHE .PROCESSOR
' - :
~ 19 -
j,. .
10'1~
It is a known ract that thc aestlletic serlsitivity of
readers of these lar,guages is great. They rightfully value and
take pride in direct contact ~ith the calligraphist. Their
tolerance of any but the rnost suitable/beautiful script is
limited, a consideration which has been taken very seriously
in the development of the present invention which provides the
tool with which the calligraphist can write in his own way but
at great speed and with very little labour.
This invention is intended to serve the need of a
large population, and ensure the preservation of cultural tra-
dition. It allows for adaptation to users needs in contrast to
many other instances where they have had to bow out in favour of
~echnology for its own sake. The invention was conceived to
combine both efficiency, beauty and adaptability.
-- ~0 --