Patent 1044806 Summary

(12) Patent:	(11) CA 1044806
(21) Application Number:	1044806
(54) English Title:	ELECTRONIC DIGITAL SYSTEM AND METHOD FOR REPRODUCING LANGUAGES USING THE ARABIC-FARSI SCRIPT
(54) French Title:	SYSTEME NUMERIQUE ELECTRONIQUE ET MODE DE REPRODUCTION DE LANGAGES D'APRES L'ECRITURE FARSI-ARABE
Status:	Term Expired - Post Grant Beyond Limit

Bibliographic Data

(51) International Patent Classification (IPC):	G06K 15/02 (2006.01) B41B 27/00 (2006.01) B41J 3/01 (2006.01) B41J 5/00 (2006.01)
(72) Inventors :	HYDER, SYED S. (Canada)
(73) Owners :	ALEPHTRAN SYSTEMS LTD.
(71) Applicants :
(74) Agent:
(74) Associate agent:
(45) Issued:	1978-12-19
(22) Filed Date:	1972-10-30
Availability of licence:	N/A
Dedicated to the Public:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Abstracts

English Abstract

ELECTRONIC DIGITAL SYSTEM AND METHOD
FOR REPRODUCING LANGUAGES USING THE
ARABIC-FARSI SCRIPT
Abstract of the Disclosure
An electronic digital system and a method for repro-
ducing languages using the Arabic-Farsi script at a speed com-
mensurate with the English language while preserving the natural
style calligraphy of the languages. The linking properties and
shapes of the characters depend on the succeeding and preceding
characters in a word string. A context-sensitive language model
has been conceived and implemented as a logic circuit to deter-
mine the linking properties and shapes of characters.

Claims

Note: Claims are shown in the official language in which they were submitted.

The embodiments of the invention in which an exclusive
property or privilege is claimed are defined as follows:-
1. An electronic digital system comprising electronic
circuit means for identifying the concatenation properties and
shape of character strings in a word from languages that use
the Arabic-Farsi script, said electronic circuit means having an
input device, circuit means for identifying said concatenation
properties and shape of said character strings and an output
device wherein said circuit means for identifying the concatena-
tion properties of character strings comprises a first circuit re-
gister for storing a last symbol of a string of three successive
symbols, a record register for storing the middle symbol of said
string of three symbols under current analysis, an output circuit
for storing concatenation information in a status register
circuit after the analysis of the first character in the said
string of three symbols has been carried out, a memory device for
verifying the concatenation property of said middle and last
symbols, and analyser circuit means for determining the shape of
said middle symbol for reproduction by said output device, and
wherein there is further comprising a control circuit to synchro-
nise the operation of said circuit means for identifying the
concatenation properties of characters, said output device being
adapted to reproduce said word at a speed commensurate with the
English language while preserving the natural style calligraphy
of said languages.
2. An electronic digital system as claimed in claim 1,
and further comprising verification circuit means for ascertain-
ing the validity of the determined shape of the symbol under
analysis by said analyser circuit means.
21

3. An electronic digital system as claimed in claim 2,
wherein there is further provided a decoder circuit to identify
incoming symbols from said input device whereby to receive and
process desired symbols and to bypass other directly to said
output device.
4. An electronic digital system as claimed in claim 1,
wherein said concatenation properties are defined by three
variables, one of said variables representative as to whether
a character links or does not link, said other two variables
each representative of the direction of a link and each corres-
ponding to a respective side of said character
5. An electronic digital system as claimed in claim 1,
wherein said circuit means for identifying the concatenation
properties and shape of character strings in a word is based
on the principle of treating the said character strings as a
context-sensitive grammar whose production rules have been
formulated.
6. An electronic digital system as claimed in claim 5,
wherein said grammar is described by the following production
rules:
R0: is a large set of production rules of the form
.sigma.?# S1,...Sn#, where S1, ..., Sn.epsilon.V0 and S1, ...Sn
is the pseudo-English representation of an Urdu word.
R1: Si Sj?.omega.i7Sj for Si, Sj .epsilon. Vx U #
R2: Si Cj?.omega.i7Cj for Si .epsilon.(Vx U #) and Cj.epsilon.V0
R3: .omega.k?CiCj?.omega.k?.omega.i7 Cj for Ci .epsilon. Vs
and ? .epsilon. (4, 5, 7)
R4: .omega.k?CiCj?.omega.k?.omega.i6 Cj for Cj .epsilon. V D U VU UVs
and ?.epsilon.(4, 5, 7)
R5: .omega.k?CiCj?.omega.k?.omega.i5 Cj for Cj .epsilon. Vs
and ?.epsilon. (0, 2, 6)
22

R6: .omega.k?Ci Cj<IMG>.omega.k?.omega.i4 Cj for Cj .epsilon. VS
and ? .epsilon. (1, 3, 6)
R7: .omega.k?Ci Cj<IMG>.omega.k? .omega.i3 Cj for Cj .epsilon. VU
and Ci .epsilon. VU and ? .epsilon. (2, 3, 6)
R8: .omega.k?Ci Cj<IMG>.omega.k? .omega.i2 Cj for Cj .epsilon. VU
Ci .epsilon. VD and ? .epsilon. (0, 1, 6)
R9: .omega.k?Ci Cj<IMG>.omega.k? .omega.i0 Cj for Cj .epsilon. VD,
Ci .epsilon. VD and ? .epsilon. (0, 1, 6)
R10: .omega.k?Ci Cj<IMG>.omega.k? .omega.i1 Cj for Cj .epsilon. VD,
Ci .epsilon. VU and ? .epsilon. (2, 3, 6)
R11: .omega.k?Ci #<IMG>.omega.k? .omega.i4 # for Ci .epsilon. VD
and ? .epsilon. (0, 1, 6)
R12: .omega.k?Ci #<IMG>.omega.k? .omega.i5 # for Ci .epsilon. VU U VS
and ? .epsilon. (2, 3, 6)
R13: .omega.k?Ci #<IMG>.omega.k? .omega.i7 # for ? .epsilon. (4, 5, 7)
wherein:
Ci is an English character;
Cj is an English character;
.omega.i7 is the Urdu character script of the seventh type
corresponding to an English character Ci;
.omega.k? is the Urdu character script of the ?th type
corresponding to the English character Ck;
VO is the complete set of characters in the Urdu
alphabet;
Vx is the set of symbols that need not be analysed
in the formation of a word,
VS is a partition containing characters which do
not concatenate with the successor;
23

VD is a partition containing characters whose right
link points downwards;
VU is a partition containing characters whose right
link points upwards;
# is a delimiter.
7. An electronic digital system as claimed in claim 5,
wherein said production rules are implemented into said circuit
means by the following analysis,
the string (actually written from right to left in
the Arabic-Farsi script) is expressed by
.omega.k?Ci Cj
and its concatenation characteristics are expressed in terms of
four Boolean variables Ed, Eg, Ri and Rj which are described as:
Ed: the character Ck that had been previously transformed
to .omega.k? is replaced by Ed, such that
if ? .epsilon. (4, 5, 7), and
Ed = ?
otherwise
Eg: describes the concatenation characteristics of the two
characters Ci (undergoing analysis) and Cj (last input), as
follows:
if Ci .epsilon. VS U Vx or Cj.epsilon.Vx, and
Eg = ?
otherwise
Ri and Rj: describe the right link properties of the
characters Ci and Cj respectively.
right link down
Ri, Rj = ?
right link up
24

next, the new Boolean variables S0, S1, S2 are defined, which
help in code translation from the input variables Eg, Ed, Ri and
Rj, and thus the following table may be constructed from the
above production rules:
<IMG>
by simplification the Boolean variables S0, S1, S2
may be obtained in terms of the variables Eg, Ed, Ri and Rj
as follows:
S0 = Eg?Ed + Ed ? Ri
S1 = Eg?Ed?Rj + Ed
and S2 = Eg + Ed
the above represents a code translation scheme
T: (0,1)m?(0,1)n , m>n
where m, n are the dimensions of the Boolean spaces (4 and 3
in this case) of the input and output respectively;
thus, the variables S0, S1, S2 give the representation
of the form of the Urdu graphic .omega.im corresponding to the
character Ci in the string Ck Ci Cj, in terms of the concat-
ention and linking properties of the characters in the string

wherein:
Ci is an English character;
Cj is an English character;
.omega.k? is the Urdu character script of the ?th type
corresponding to the English character Ck.
8. A machine method of determining the concatenation
properties of characters for reproducing languages using the
Arabic-Farsi script including the steps of:
(i) determining if a character links or does not link,
(ii) determining if there is a link on each side of said
character,
(iii) determining the direction of a link,
(iv) generating numbers defining possible linking properties
determined in steps (i), (ii) and (iii), and
(v) reproducing characters of said languages at a speed
comparable to the English language while preserving the natural
calligraphic style of said language.
9. A machine method as claimed in claim 8, wherein said
steps of determining the concatenation properties of characters
in said strings comprises the steps of:
(i) storing a last symbol of a character string of three
successive symbols in a first circuit register,
(ii) storing the middle symbol of said string in a second
circuit register,
(iii) determining the concatenation property of said middle
symbol,
(iv) storing said concatenation information of said middle
symbol in a status register circuit after the analysis of the
last symbol in said string has been carried out, and
(v) reproducing by a suitable output means the shape of
said middle symbol.
26

10. A method as claimed in claim 9, wherein said steps of
determining the concatenation properties of characters in said
strings comprises the steps of:
(i) storing a last symbol of a character string of three
successive symbols in a first circuit register,
(ii) storing the middle symbol of said string in a second
circuit register,
(iii) determining the concatenation property of said middle
symbol,
(iv) storing said concatenation information of said middle
symbol in a status register circuit after the analysis of the
last symbol in said string has been carried out, and
(v) reproducing by a suitable output means the shape of
said middle symbol.
11. A machine method as claimed in claim 10, wherein said
steps further comprise the step of verifying the validity of said
determined shape of said middle symbol.
12. A machine method as claimed in claim 10, wherein said
steps further comprise the step of identifying incoming symbols
whereby to process desired symbols and to bypass others directly
to said output device.
13. A machine method as claimed in claim 10, wherein said
step of identifying the concatenation properties and shape of
character strings in a word is based on the principle of treating
the said character strings as a context-sensitive grammar whose
production rules have been formulated and are described below:-
RO: is a large set of production rules of the form
.sigma.<IMG>? S1,... Sn?, where S1, ..., Sn .epsilon. V O and S1, ... Sn
is the pseudo-English representation of an Urdu word.
R1: Si Sj<IMG>.omega.i7 Sj for Si, Sj .epsilon. V x U ?
27

R2: Si Cj<IMG>.omega.i7 Cj for Si .epsilon. (Vx U ?) and Cj .epsilon. V O.
R3: .omega.k? Ci Cj<IMG>.omega.k? .omega.i7 Cj for Ci .epsilon. V S
and ? .epsilon. {4, 5, 7}
R4: .omega.k? Ci Cj<IMG>.omega.k? .omega.i6 Cj for Cj .epsilon. VD U V? UVS
and ? .epsilon. {4,5,7}
R5: .omega.k? Ci Cj<IMG>.omega.k? .omega.i5 Cj for Cj .epsilon. VS
R6: .omega.k? Ci Cj<IMG>.omega.k? .omega.i4 Cj for Cj .epsilon. VS
and ? .epsilon. {1, 3, 6}
R7: .omega.k? Ci Cj<IMG>.omega.k? .omega.i3 Cj for Cj .epsilon. VU
and Ci .epsilon. VU and ? .epsilon. {2, 3, 6}
R8: .omega.k? Ci Cj<IMG>.omega.k? .omega.i2 Cj for Cj .epsilon. VU
Ci .epsilon. VD and ? .epsilon. {0, 1, 6}
R9: .omega.k? Ci Cj<IMG>.omega.k? .omega.i0 Cj for Cj .epsilon. VD,
Ci .epsilon. VD and ? .epsilon. {0, 1, 6}
R10: .omega.k? Ci CJ<IMG>.omega.k? .omega.i1 Cj for Cj .epsilon. VD,
Ci .epsilon. VU and ? .epsilon. {2, 3, 6}
R11; .omega.k? Ci ?<IMG>.omega.k? .omega.i4 ? for Ci .epsilon. VD
and ? .epsilon. {0, 1, 6}
R12: .omega.k? Ci ?<IMG>.omega.k? .omega.i5 ? for Ci .epsilon. VU U VS
and ? .epsilon. {2, 3, 6}
R13: .omega.k? Ci ?<IMG>.omega.k? .omega.i7 ? for ? .epsilon. {4, 5, 7}
wherein:
Ci is an English character;
Cj is an English character;
.omega.i7 is the Urdu characcter script of the seventh type
corresponding to an English character Ci;
28

.omega.k? is the Urdu character script of the ?th type
corresponding to the English character Ck;
VO is the complete set of characters in the Urdu
alphabet;
Vx is the set of symbols that need not be analysed
in the formation of a word;
VS is a partition containing characters which do
not concatenate with the successor;
VD is a partition containing characters whose right
link points downwards;
VU is a partition containing characters whose right
link points upwards;
# is a delimiter.
14. A machine method as claimed in claim 13, wherein said
production rules are implemented into said step of identifying
the concatenation properties by the following analysis;
the string (actually written from right to left
in Urdu)
.omega.k? Ci Cj
and its concatenation characteristics are expressed in terms of
four new Boolean variables Ed, Eg, Ri and Rj and they are des-
cribed below as:
Ed: the character Ck that had been previously transformed
to .omega.k? is replaced by Ed, such that
<IMG> , and
otherwise
Eq: the concatenation characteristics of the two char-
acters Ci (undergoing analysis) and Cj (last input),
as follows:
<IMG> , and
otherwise
29

Ri and Rj: describe the right link properties of the
characters Ci and Cj respectively.
<IMG> right link down
right link up
the new Boolean variables SO, S1, S2 are defined, which help
in code translation from the input variables Eg, Ed, Ri and Rj,
and thus the following table may he constructed from the above
production rules:
<IMG>
by simplification the Boolean variables SO, S1, S2
may be obtained in terms of the variables Eg, Ed, Ri, and Rj as follows:
SO = ?g ? ?d + Ed ? Ri
S1 = Eg ?Ed ? Rj + ?d
and S2 = ?g + ?d
the above represents a code translation scheme
T: {0,1}m <IMG>{0,1}n , m ? n
where m, n are the dimensions of the Boolean spaces (4 and 3
in this case) of the input and output respectively;

thus, the variables SO, S1, S2 give the rep-
resentation of the form of the Urdu graphic .omega.im corresponding
to the character Ci in the string Ck Ci Cj, in terms of the
concatenation and linking properties of the characters in the
string
wherein:
Ci is an English character;
Cj is an English character;
.omega.k? is the Urdu character script of the ?th type
corresponding to the English character Ck.
31

Description

Note: Descriptions are shown in the official language in which they were submitted.

~o~
T}lis il-vcn~iorl r~lates to cl process and a device for
thc printir~ of all larlgua-~es which usc the ~rabic-Far.~i script.
In languay~s which use the Arabic-Farsi script, the
alphabetic characters have a phonetic similarity with the English
alphabet, but each character assumes different shapes depending
on its location in a word and on the character or symbol that
precedes and follows it.
The multiplicity of shapes helps in information com-
pression, as characters need not be written in their complete and
isolated form. This advantage in the handwritten form, however,
has led to problems in printing and reading this family of
languages.
The complexity of transfer from the handwritten word
to print may be considered and solved at five levels of decrea-
sing difficulty and cultural acceptance:
I) Handwritten reproduction, using the precision and
elegance of calligraphy, with the diacritics to indicate phonetic
emphasis clearly indicated. This method has been used histori-
cally for the printing of literature and holy scriptures.
II) A simplified version of calligraphy used for everyday
writing This script is usually written without diacritics and
may be slightly different in appearance among Urdu, Farsi and
Arabic.
III) A simplified subset of the script adapted for manual
or electric typewriters. m ese, depending on their design, are
likely to have four shapes and keys for each character, i.e.
initial, final, medial and isolated; in some cases only two,
; initial (also used as medial) and final (also used as isolated).
m e user supplies the linking information, shifting the carriage
on the typewriter ke~oard in t'he middle of the word if necessary,

lV~ ~h()~j
deperl~iny o~l the posi~ion of t~JC ch~racter in the word. The
~yping proccss, hecause of this ad~e~ requirernent to reme~ber
the conteY~t, is relativcly slow.
IV) The next level of simplification is to have only one
form per character. This printed form is quite different from
the handwritten script. In communication systems that use tele
type or similar output devices, this involves minimum technical
modification. By using a modified printing head, and reversing
the direction of printing, an English teletype can be used to
print Arabic-like languages. Since the output has little resem-
blance to the written form, user acceptance would require a
radical break with deep-seated cultural tradition.
V) Yet another level of simplification is the replacement
of the Arabic script characters by a phontetically equivalent
English alphabet. The language is altered to be written in
Roman form, and is phonetically and semantically the same as
before. Visually it is radically different. This involves no
technical modification to the printing device. It is apparent
that at present functional efficiency in printing and aesthetic
quality are at opposite ends of the scale. Furthermore 5 the
choice of a particular method of printing is determined by such
diverse factors as effect on employment, cultural tradition,
requirement for high speed output, cost, appearance, equipment
reliability and availability, and resistance to change.
- At present the language is transcribed to the printed
form either by hand (level I) or by mechanical means (level III),
both of which are very slow methods compared to the printing speed
of western languages.
For telecomrnunications, solutions at level IV using
isolated character~ have been implemented on telex-type equipment
- 2 -

~v~
on an experimental basis. A~ stated earlier this is an unsuit-
able solution, since the machine output has little resemblance
to the written form.
It has been stated earlier that in the languages using
Arabic-Farsi script the shape of a character i~ dependent upon
its location and contextual posi~ion in a word. Consequently
printing devices must have multiple keys and shapes for a single
character of the alphabet. A user must, on the basis of his
knowledge of the script, make the right choice of character shape.
This makes the process of transcribing the language slow and
tedious, while, at the same time, the devices used are themselves
cumbersome and inefficient.
A feature of the present invention is to incorporate
in a logic circuit the tradition and rules of writing and the
related memory requirement of the user whereby to reproduce the
natural style of a language using the Arabic-Farsi script.
In accordance with a specific embodiment, an electronic
digital system comprises electronic circuit means for identifying
the concatenation properties and shape of character strings in
a word from languages that use the Arabic-Farsi script, sa~d
electronic circuit means having an input device, circuit means
for identifying said concatenation properties and shape of said
strings and an output device wherein said circuit means for
identifying the concatenation properties of character strings
comprises a first circuit register for storing a last symbol of
a string of three successive symbols, a record register for
storing the middle symbol of said string of three symbols under
current analysis, an output circuit for storing concatenation
information in a status register circuit after the analysis of
the fir~t character in the said string of three symbols has been
carried out, a memory device for verifying the concatenation
property of ~aid rniddle and last symbols, and analyser circuit
~ _ 3 _
--f''" s

~ 3~j
means for determining the shape of said middle symbol for repro-
duction by said output device, and wherein there is further com-
prising a control circuit to synchronise the operation of said
circuit means for identifying the concatenation properties of
characters, said output device being adapted to reproduce said
word at a speed commensurate with the English language while pre-
serving the natural style calligraphy of said languages.
According to a further broad aspect, the present in-
vention provides a method of reproducing languages using the
Arabic-Farsi script comprising the steps of determining if the
character links or does not link; determining if there is a link
on each side of the character; determining the direction of a
link; generating numbers defining possible linking properties
determined in the above steps, and reproducing characters of the
languages at a speed comparable to the English language while
preserving the natural calligraphic style of the language.
The present invention is an advance in the art and
technique of printing the family of languages using the Arabic-
Farsi script to a level comparable to the efficiency of printing
the English language. Potential applications of the invention
- 3a -

b()f~
are for U5C with tel (?t ~e'; ~or busirless, hospit~ls, ;~irlines,
industry, an~ e~ucation. I~lso, the invention will provide fGr
simpli~ie~ typewriters, working at the same speed as tho~e for
the western alphabet. Eurther, the invention ca~ be used for
automatic and photo-composition in the printing industry, gra-
phical display devices, and writing on illuminated bulbs used in
cities for news and advertising. The latter is a very common
method of communication in big cities in that part of the ~Jorld
using languages with Arabic-Farsi script.
The present invention also preserves the natural beauty
of calligraphy e.g. Naskh and Riquaa scripts in the case of the
Arabic language, without compromising it with technical limita-
tions. The introduction of new technology which helps to pre-
serve culture and tradition will evoke a very positive emotional
response in the users, and with time new applications will develop
in the countries where the languages using Arabic-Farsi script
are spoken.
The accompanying drawing is a block diagram illustra-
ting a communication system utilizing the present invention.
The word "Urdu" will be used in the following descrip-
tion to denote the family of languages using the script of the
Arabic-Farsi languages. A new theory has been developed to form
the basis of the hardware design of the present invention. This
is a first step in building the logical system, which is a par-
ticular embodiment of the principles delineated below.
Let VE = ~A, B, ..., Z~ be the set of characters of the
English alphabet, and let VE be the set of characters of the Urdu
alphabet whose elements have a phonetic similarity with the cor-
responding characters in English. However~ Urdu, depending on
country and usage, may have up to 35 characters. Let V0 be the
- 4 -

(3SJ
~olnplctc ~ of ~ ra~ rs of t~lf~ Vr~u ~ }l~bc~, then ~ = V~ U
~a~ditional char~ct~r~ of Ur~u ~tit~lout corr~sp~n~rlc~ in Erlgli.~h~.
Next, d~fine Vx to be the s~t of sy~ols that ne~d rlot
be analysed in the formation of a word, since they ar~ printed
without modification. This set includes numerals,punctu~tion
marks, and, most important, diacritics that are used in Urdu
- to denote phonetic information.
The total alphabet, VA, that needs to be considered is
then:
VA = VO U Vx
For the purpose of the analysis, the set VA is parti-
tioned into four groups. This partitioning is based on the
applicant's interpretation of the script. It may be modified
depending upon the country, language and individu~l preferences
of the user. The importance of this partitioning will be ex-
plained later.
Let the Urdu charactcr corresponding to thc English cha-
racter Ci be called ~C ~ where Ci VE. Next, define ~ij as
the Urdu character script shape of the type j corresponding to the
English character Ci for i = 1, ..., 26; j Ii, wherc for each
i, Ii is the set of j5 for which the script shape ~ij exists. For
the sake of simplicity one may write Wsj to denote ~ij for s-- Ci,
e.g. ~A5 = ~1 5. The availability of shapes may be represented by
the Boolean~latrix Ai j which signifies that for a given character
Ci, and for j - 0, 1, ..., 7 if for j = j', 0 < j' <, 7, then if
A = 1 ~ , exists
,J
~ ~i jl does not exist.
-- 5 --
;
,

1(14~()b
The availability rna~rix is irnp]crner)tcd i n a Read O~ly
Memory, and plays an important rolc in thc hardwclre dcsign as
will be describ~d later with refcrence to a script ~rocessor
design.
It should be noted that Urdu is written from right to
left. Consider the concatenation properties of an Urdu charac-
ter ~i. Let A, B and C be three Boolean variables which describe
the following concatenation properties.
i) A = O symbol concatcnatcs.
A = 1 symbol does not concatenate. It is isolated
or initial or terminal.
ii) B = O links down to the lcft
B = 1 links up to tlle left
iii) C = O links down from the right
1 links up from the right
The properties are summarized in Table I which follows.
-- 6 --

-104~
B C llin-tcr)n Commcrlt
O O O PO _ _ Links do~m L
Links down R
. Concatcnatcs in ~oth dircctions.
O O 1 Pl Links down L
Links up R
Concatenates in both directions.
O 1 0 P2 ~ Links up L
Links down R
Concatenates in both direction.
O 1 1 P3 ~ Links up L
O ~ Links up R
Concatenates in both directions.
1 0 0 P4 . ~ Links down R
Terminates on L.
1 0 1 P5 . I,inks up R
~ Terminates on L.
1 1 0 P6 ~--{} Links up or down at L.
Initial. No links on R.
1 1 1 P7 C~ ~oes not links on L or R
_ Isolated symbol. L
. _ _ _ _ _ . _ .... _ _, _
Tahle 1- Link Tablc
We as~igJl to j ;n ~jthe suffix of the corrcspolldi1lg ~ term
- 7 -

h~tj
~ rllc F,n~Jl.isll ~harclcte~s ~, B, D, ~, for ~xam~>le ~/ill
have the following associ~ted yraphic sha~es an~ names in the
Urdu writing syst~rn.
Lct~er P-term / ~i / gra~hic shapc
] ,
English Urdu , P P1 P2 P3 P4 p~ P6 P7
A ~A5 ~A6 ~A
~ 7
B ~B ~ ~Bl ~B3 ~B5 ~B6 ~B7
D ~D ~1)5 ~D6 ~D7
>
J J2 J4 ~J6 J7
Table 2 - Shapes of symbols A, B, D ~ J
'rhe domains for graphic shapcs ~Ci in Urdu for the English
character Ci are:
{~A5' ~A6' ~A7}
~ B {~Bl' ~B3' B5' ~B6' ~B7
- ~D = {~D5' ~)D6' ~D7}
J {~J2' ~J4' ~J6~ ~J7}
The first two rows of the availability matrix A
would then be
¦ O O O O 0 1 1 1 ¦
O I 0 1 0 1 1 1 1

v~
~ s m~JItioJ~ rlic~, the s~t of thc ~t~l alph~et
VA is p~rt.itioncd into foux g~o~p~-; suc~h that thc ~har~c~r~
having th~ samc architectural characteristics in their Urdu
form and similar concatenation propertie.s constitute the same
class of the partition.
VA = {Vs~ Vu. VD> Vx}
For the purpose of illustration, let VE = {vs) Vu, VD)
,
where Vs C Vs ~ Vu ~- Vu and VD e VD.
,
The characters in this partition Vs=~ } have the property
that they do not concatenate with the successor.
VD
The right link (connecting with the predecessor) of the char-
aeters points do~nwards, For example characters of the type
~iO~ ~i2 and ~i4 would be included in this partition.
, VU
The right link of the characters points upwards. Urdu graphics
or the type ~ 3~ and ~5 would be included in this parti-
tion.
20 Vx
This partition which includes numerals etc... has been described
earlier.
It is assumed that the four partitions do not contain any
eommon elements.
g _

(3t~
In thc currcllt dcsi gll
VS =~ J W~, W~ )o}
VD ={~1' wJJ wl~}
U { E Vu Vs}
As stated earlier the choice o~ characters in a
partition is based on the applic~nt's understanding of tne
script~ It could vary depending on the language, the country
and the user.
The following description relates to the details of
a transformational grammar, which accepts characters in their
input sequence and performs a forward scan for the analysis.
For the sake of completeness some basic definitions are re-
viewed.
A grammar G = (VT~ VN~ P, ~) is a 4-tuple that
consists of
VT a terminal vocabulary
VN a non-terminal vocabulary
P a set of production rules
~ a sentence symbol which
is member of VN.
If each production is of the form
whcre ~ and ~ are in (Vl. U VN)* and ~ is in (Vr U VN) - {~3, where
: {c3 is the empty word, thcn the grammar G is called context sensi-
tive. It should be notcd that ~ and ~ may be null, and w may not
be empty. Specifically VN = V~ U a , and Vr = {wij¦ i c {1,...,35~,
aij rf- O} U ~#} U ~VK~ } i5 the sct of tcrminal Urdu charactcr gral~h-
ic~ au~ncntcd by thc delirllitcr #, and thc set V~. It is rccallcd
that the symbols in Vx arc printcd Wit]lOUt modificatioll.
,
-- 10 --

~o~v~
~ 11c gr.lln~ r dcs(:ri.~)cd }ac:l,ow tr~lrn~lCorlns
.~ords IYrittcn in llrdu cll;lractcrs,i.c. st~ia)gs ovor VO,into wor~s
written in well-ror~nc~ Ur~u script ~r.lphics,i.c. stlin~s ovcr V,~. It
is assumcd that a sufficicnt numbcr of pro~1uction rulcs of thc
form c ~ 1 cxists,wherc a is a word writcn with Urdu cha-
racters (~ E V *) . Thesc rulcs gcneratc the languagc, e.g. Ara-
bic or Farsi, and arc differcnt for cach language. They are of
no conccrn to the thcory of the invcntion The rules
which transform the word of a language to its written form are
context sensitive,and are given bclow as:
RO: This is a large set of production rules of the form
o ~~3~ Sl~... Snfl, where Sl, ..., Sn E Vo and Sl, ..~ Sn
is the pseudo-English representation of an Urdu word.
R1: Si S~ i7 Sj for Si, Sj E VX U #
i j i7 j for Si E {Vx U fl} and Cj E V o
R3: ~k~ Ci Cj ~ K~ ~i7 Cj f i S
and ~ E {4, 5, 7}
R4 ~k~ Ci Cj >~k~ ~i6 Cj for Ci ~ D U s
and ~ E {4, 5, 7}
-- 11 --

IL()~hO~
5: ~k~ i5 j j ~ S
and ~ ~ {0, 2, 6}
R6: k~ Ci Cj >wk~ ~j4 Cj for Cj ~ Vs
and ~ ~ {1, 3, 6}
R7: ~k~ Ci Cj--~k~ i~ Cj for Cj ~ Vu
and Ci E Vu and ~ ~ {2, 3, 6}
~k~ Ci Cj >~kQ ~i2 Cj for Cj ~ Vu
Ci ~ VD a]~d ~ ~ {. 1, 6}
R9: ~kR Ci Cj ~ ~ iO Cj for Cj ~ VD,
Ci ~ VD alld ~ {O, 1, 6}
kR Ci Cj >wk~ ~il Cj for Cj ~ VD,
Ci ~ VU and Q ~ {2, 3, 6}
Rll: ~ Ci >wk~ ~i4 for Ci ~ VD
and ~ {~, 1, 6}
R12 ~k~ Ci # >~kR ~i5 ~t for Ci U S
- and ~ ~ {2, 3, 6}
- R13: ~k~ Ci # ~k~ ~i7
.
These rules formally express the tradition of
writing the Urdu language. This is a new idea, and forms an
important and integral part of the hardware design of the
present invention.
The theory and logical design of the machine which
perform~ 'che syntactic transformatiorl described previously are
given belC~1/1.
- 12 _

3t~
It i~; well J~r~o~trl ~h,lt ~ conteY.t ser,sitive lar,guat~e is
acceptcd by a li~car hounded automaton, Ilo~Jever, in this case,
while the grammar is context s~nsitive, the requirern~nt is to
find a transducer that would both accept and transforrn, It
appeared reasonable to find a finite state deterministic automaton.
The production rules of the grammar of script genera-
tion may be re-stated as under:
The string (actually written from right to left in
Urdu)
~k~ Ci Cj
and its concatenation characteristic,s are expressed in terms of
four new Boolean variables Ed, Eg, Ri, and Rj. They are des-
cribed below:
d
The character Ck that had been prcviously transformed to
~k~ is replaced by Ed, such that
0, if R {4, 5, 7}, and
Ed = 1 otherwise
It describes the contatenation characteristics of the two char-
acters Çi (undergoing analysis) and Cj (last input ), as follGws:
~ 0 if Ci Vs U V or CjcVx, and
g l 1 otherwi.se
Ri and Rj
These Boolean variablcs, Ri and Rj, describe the
right link properties of the characters Ci and Cj respectively.
- 13 -

~v~
~0 ri~l-t ~;"~ 10~"~
i gll t l i rl li up
Next, the new output Boolean variab].es SO, Sl, S2
are defined, which help in code translation from the input vari-
ables Eg, Ed, Ri and Rj.
The following table may be easily constructed from the
production rules described earlier.
. . __
R.Ri ~: Ed S0Sl S2 Output¦ Rule
J g _ _ . _
_ 00 1 1 1 73,13
_ _ _
- O O1 1 O O 4ll
1 O1 1 O 1 512
. .. _ _ _ _
_ O O1 1 O O 4 6
1 O1 -1--~ O ----1 5 5
_ __
_ _ lO 1 I o--- 6 4
O O 1 1 O O O O 9
O 1 1, 1 0 O 1 1 10
1 O 1 1 O 1 O 2 8
1 1 11 1 1 0 1 1 3 7
Ta~le 3. Code translation Table
By simplification the Booiean variables S0, Sl, S2
- may be obtained in térms of the variables Eg, Ed~ Ri~and Rj as follol~s:
S0 ~ Eg ' ~d ~- Ed Ri
S~ = Eg ' ~d ' Rj ~ ~d
2 g d

Ut~
7hc ~bovc 1'Cj)I-~',Cllts ~ eod~ tr~nslatioJ~ s~hem~
T: {o,]~ {O,l}n , m ~- n
where m, n are l:he dimensions of the Boolcan spaccs (4 and 3
in this casc) of thc input and output respcetively.
Thus, the v~riables SO, Sl, S2 give the rep-
resentation of the form of the Urdu graphic ~in eorrcsponding
to the character Ci in the string Ck Ci Cj, in terms of the
eoneatenation and linking propertics o~ thc characters in thc
string.
The operation will now be described. The analysis
of the character string is performed in a uniform manner, no
distinction being made between characters in different parti-
tions of VA, i.e. Vu, VD, Vs and Vx. The output follows the
input with a one symbol delay. This mode of operation results
in a simple design, by minimizing the problems of synchroniza-
tion, timing and control. In a communication system where two
teletype like devices are linked to each other, the method pro-
posed here eliminates the impression of erratic functioning on
the user, who anticipates and receives a continuous message, not
being aware of the delay. To the sender, inspite of the one
symbol delay, this method with the feature of continuous output
is equally attractive.
Por the purpose of illustration let us recall the
proccss of analysing the string ~k~ Ci Cj. It is noted that
the previous symbol Ck had been analysed as the Urdu graphie
~k~ Ci is the symbo~ under analysis, and Cj is thc last symbol
rcceivcd. The overall de~ign of the script processor shown in
the dr~wing will now be described with reference to the pro~
ces~iny of the ~tring ~k~CiCj.
- 15 -

After ~ignal correction, the characters from the
KSR-33 teletype 1 enter the eight bit input register 2, which
now contains the symbol Cj. The present symbol Ci, currently
under analysis was received from the input register 2, and is
now stored in the second register 8. A coupling interface,
not shown in the figure, is placed between the teletype and
the processor; the operation of which is herein described. The
last symbol Cj is decoded in the decoder 3 and the availability
matrix implemented in the Read Only Memory 5 is read to deter-
mine the available shapes for the character Cj. The availabilitystatus is entered in the analyser module 7 and in parallel stored
in the availability register 6. On the completion of the analysis
of the character Ci, discussed later, the symbol Cj is entered
into the ~egister 8 to become the new middle symbol Ci in the
chain. It becomes the new middle symbol Ci in the chain, and
is analysed on the arrival of a new symbol Cj in the first
register. The availability status is used to give the linking
properties Ri and Rj described earlier. The state of the
Boolean variable Ed, defined earlier, as determined from the
symbol ~k~' analysed previously, is stored in the status register
11. This register is set by the analyser module 7 or by the
input synchronizer 4. In particular the synchronizer 4 may
enter a blank or initial state in the state register at start
up or on the incidence of carriage control or other special
symbols. me state variable Eg is determined from the variable
Ri and Rj, using the relation defined earlier. The analyser
module 7 implements the code translation scheme and yields an
output of three variables S0, Sl, and S2 as described earlier.
The decoder 10 interprets this as one of the eight possible
shapes of the character Ci, currently undergoing analysis and
stored in the register 8. Following the output from the analyser,
- 16 -

~ 4 ~ ~ ~t~
it also sigrlclls tho outr~ut contrc>ller 9 to print the d~co(led
shape (~im corrc~sponding to the c~Jaract~r Ci on th~ output
device, the KSR-33 teletype 12 in this case.
For the purpose of testing the proccssor shown in the
drawing, the teletype output was modified to simulate Urdu
writing with appropriate linkages. In this representation
markers are printed around each character, i.e. before and after,
to indicate its linkages if they exist. The method is sho~m
below:
U link up forward (right in English, left in Urdu).
~ link down forward (right in English, left in Urdu).
~ link up backward
O link down backward
I initi~l
D Independent surrounded by blanks - -
o n Terminal down, up backward.
As an example, let us consider the word JOAB, which
means "answer" in the Farsi language, and is printed on line 2 of
Table 4. The analysis follows as under.
~~ Rule ~ # JOAB #

~V~ '~h()f~
t/J ~ i7
i7 ~~ 4 ~ J6
WJ6OA RU 1 C ~ WJ6WO5A
WOSAB RU I C ~ 05 A7
WA7 B ~ ` Ru l e wA7 ~B7 jy
h str ing #wJ6wos``A7wB7~ is pr inted on the
te I etype as J ! ' O,JA.,B .
7 In addition to the above example, other words are
printed by the processor in pseudo-Urdu showing their correct
10 linkage and are shown in Table 4, which is the actual output
produced by the system on a KSR.33 teletype.
- 18 -

~L04~
:.................... . . . .;
G~ 'O K A
. ~ . . . . .
Jl ~o A B
B ! 'O L :- i ' `
.. . . .
Bl'R Bl'G''E
A G I ~ A . , - `
J I l' A N
A Bl'A ~ ,
G I ' A N - .
Bl 'B' 'A
... . . .
Kl 'O FJ 'B' 'A ~ ; ~ ; .
KI 'E~ ~A R E ; . .
A Ml 'E
,'
K 1 'E ' ' A R ~ , '
~ . : '
A D R .,
D A R . . ;
D A . . .
F I ' D A . . ;
. ~
Fl 'A D
.~ - - , . . . .. . . . . .
A M l ' D B l ' D -~
. ~ ,: . ., . :
,: , - j .. - ~. ~ , ;., -,
. . .
.
J , . . . ~ . ,
, ~ , ' " ' ' ~ . ' .
rr~BL13 4 - PSEUDO-URI)U OUTPU T PRODUCED BY tHE .PROCESSOR
' - :
~ 19 -
j,. .

10'1~
It is a known ract that thc aestlletic serlsitivity of
readers of these lar,guages is great. They rightfully value and
take pride in direct contact ~ith the calligraphist. Their
tolerance of any but the rnost suitable/beautiful script is
limited, a consideration which has been taken very seriously
in the development of the present invention which provides the
tool with which the calligraphist can write in his own way but
at great speed and with very little labour.
This invention is intended to serve the need of a
large population, and ensure the preservation of cultural tra-
dition. It allows for adaptation to users needs in contrast to
many other instances where they have had to bow out in favour of
~echnology for its own sake. The invention was conceived to
combine both efficiency, beauty and adaptability.
-- ~0 --

Representative Drawing

Sorry, the representative drawing for patent document number 1044806 was not found.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee and Payment History should be consulted.

Event History

Description	Date
Inactive: IPC expired	2020-01-01
Inactive: IPC deactivated	2011-07-26
Inactive: IPC from MCD	2006-03-11
Inactive: IPC from MCD	2006-03-11
Inactive: IPC from MCD	2006-03-11
Inactive: Expired (old Act Patent) latest possible expiry date	1995-12-19
Grant by Issuance	1978-12-19

Abandonment History

There is no abandonment history.

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
ALEPHTRAN SYSTEMS LTD.

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Claims	1994-05-24	11	281
Abstract	1994-05-24	1	17
Cover Page	1994-05-24	1	13
Drawings	1994-05-24	1	16
Descriptions	1994-05-24	21	535

Language selection

Menus

English Abstract

Event History

Abandonment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 1044806 Summary

English Abstract

Event History

Abandonment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.