Note: Descriptions are shown in the official language in which they were submitted.
~L258~5
DATA STREAM SHAPING OF ARABIC CHARACTERS
-
FIELD OF THE INVENTION
The present invention relates to data-stream processing
of Arabic alphabet characters. It provides apparatus for
converting an input data stream containing basic,
unconcatenated, Arabic words into an output data stream
wherein the Arabic letter shape, for proper concatenation in
words, have been substituted for the basic shapes originally
transmitted.
BACKGROUND AND PRIOR ART OF THE INVENTION
The problem of converting the basic shape of an Arabic
letter into its context sensitive proper shape within an
Arabic word is not trivial. A review of the background to
this problem is included in the commonly assigned Canadian
Patent No. 1,207,905, of F. Metwaly, issued July 15, 1986,
and entitled "Method and System for the Generation of Arabic
Script".
The prior art patents, mentioned in Canadian Patent
1,207,905, although not germane to the particular problem
addressed by the present invention, demonstrate the degree
of complexity that was necessary to produce Arabic script
from basic letter shapes. At the very least, as Canadian
Patent No. 1,044,806 issued December 19, 1978 to S.S. Hyder
demonstrates, it was necessary to examine an Arabic
character in the context of the character preceding it and
that succeeding it before deciding on its script or display
shape.
A disadvantage of the prior art solutions is that they
are too complex, at least for application within a
communication data stream.
Due to the fact that Arabic characters are entered into
a computer system, stored, manipulated and transmitted in
their basic shape format, it is necessary to process them
every time before display or printing to
$~
-
~258135
produce readable Arabic script, A hos compu'er ~iay J_
com~unicating w th subordinatc devises. It is ~Aesirable
to interpose a simple device to modify the data stre2- so
that no furthe. p~ocessing is necessary befor~ ~ ~Dlay or
prin'ing at the subordinate device.
SU.`1MARY OF THE INVENTION
The present invention provides a simple appar~tus
which interrupts a data stream~ processes two sequen'ial
characters at-a-time, and outputs a modified data stream
delayed by the duration of two sharacters. Neither the
transmi.ter no- the receiver of the data stream are
interfered with adversely.
The basic Arabic alphabet has twenty-eight
letters. For simple Arabic script suitable for business
ancA the like environments, seventeen letters may eac'n
assume one of t.e two shapes, two letters may each assume
one of ~hree shapes, and two letters may each assume one
of four shapes. The remaining letters have one shape
only. An expanded keyboard alphabet may contain, as
distinct characters, some of the basic letters but
uniquely shaped.
The apparatus performs a mapping operation,
mapping the basic alphabet, or in practice the basic shape
code page, into the Arabic font page. Both the basic code
page and the front page, of course, contain numerals, in
arabic and latin scripts, and many other non-Arabic
characters such as the Latin alphabet, all of which do not
change shape and are treated as stand-alone characters.
In the preferred embodiment, the basic code anc fon~ pages
are matrices wherein each character is iden.ified by a
unique ASCII code point at the intersection of a row and a
column in the FxF matrices (in He~decimal no_ation).
The p~esent invention provides appara'us for
modifying a data stream, which includes data words
er.coding basic Arabic characters to generate a deiayed~
C~9-~6-0~i
~2Sl~35
--3--
da,a stream wnerein ~asic Arabic charac.ers a.e modi~i2~
in=o Arabic script characters, comp.ising: a data bufrer
having 2 serial input and a serial output for receiving
and outputting dat~, respectively; means, fcr assign ng in
a predetermined manner one of two losic states to one cf
two consecutive characters stored temporarily in said data
buffer; and means responsive to said two consecutive
characters and to said one of two logic states .or
modifyi-.g some characters while temporarily stored in s~id
data buîfer in a predetermined manner, whereby basic
Arabic characters are received and Arabic script
characters are output.
~RIEF DESCRI~TIO~ OF THE DR~WI~'GS
The preferred embodiment of the invention wi'l
no~ be des_ribed in conjunction with the annexed drawings
in which:
~igure 1 shows the basic Arabic ASCII code a~e
frcm which all numerals and other characters have be~n
omitted for clarity;
Figure 2 shows the Arabic ~SCII font pag~ frcm
which all numerals and other characters have be2n omi~ted
for clarity; and
Figure 3 shows a bloc'~ diagram of an appar2tus
according to the present invention.
~ETAILED DESCRIPTION OF THE PREFERRED EM~ODIMENT
Refering now to Figure 1 of the drawings, the
bas,c Arabic alphabet is repres~nted in the ~S~II code
page tmatrix) by thirty-six code points. Henceforth each
character will be referred to by its ASCII code in
hexadecimal notation' for example, the right most
character (called `'shadda") is referred to as Fl. Some of
these basic characters will change shape when incorporated
i a word, depending on where in the word they are
located. In Figure 2 of the drawings, the permissible
-5 va-iations of t'nat particular font are represented as code
C~3-86-COl
~258~3~
-- 1
points i;- the ASCII font pase. The thirty-si:~ basic
characte~rs of ~igure 1 re,ain -their code pGSit'OrS, in the
~atrix of Figure 2. Both code pages are indus ry
standards, and contain numerals, ia~in alphabet characters
an othe- characters, which are no. of particular conce~n
to this invention. As will be seen later, all non-~rabic
characters, including numerals, are treated as s,and-c one
characters and their codes remain unaltered.
The purpose of the ap aratus, shown in Figu-e 3,
is to map input data-stream characters, representing the
basic Arabic characters of Figure 1, onto the font
characters of Figure ~, which Ihen form the output da~a
stream~
The apparatus in Figu-e 3 comprises a
ripple-through '~uffer or register 10, which has a serial
input 11 and a serial output 1~. Irne buffer 10 is capable
of holding two c'naracters of eight bits each, eight-'3i~_s
being the necessary number of bits to s ecify a code point
in the 16x16 ~atrices of Figures 1 and 2. The buffer 10
also has parallel inputs 13 and 13' and parallel outputs
14 and 14' giving parallel access to the bit positions of
a current character (CC), having just been fully entere~
from the input 11, and glvlng parallel access to bit
positions of a preceding eharacter (PC), having just been
fully transferred into the last eight-bit positions or the
buffer 10, respectively. The parallel output 14 is input
to a st~te analyses logic 15, which decermines whethe~- the
current character in Ihe buffer 10 connects ~eoncatenctes)
or not. If a character connects, it is assi~ned a s-.ate
o-- logic ~, if it does no~ connect, it is assigned a
state or logic 1. The state of CC is entered intG a s~ate
register 16. A rule application logic 17 co~putes fro~
the character codes in the bufrer 10 and the states in the
\o
re~ister 16 whether the characters in the buffer ~ shoula
_, be ~ltered, and if so into what characters of ~he fon~_
o6-G~l
~L2S813S
,
p2~ e one the o~her or both CC and PC are converted t_o
This up~ating of the c~.aracters stored momenta~ily in .he
buffer 13 is accomplished via ~ata up~ate DUS 18 and _he
parall.el inputs 13, 13'.
The s~ate analyser logic 15 and the rule
applica~ion logic 17 operate to implem2nt the followina
losic/arithmetric equations, which map the cod2 page of
Figure 1 onto the font page of Figure 2 following the
concentra~ion rules of Arabic script. _t should be
understood that these equations are spec-fic to the
particular code pages or matrices as shown in FigLres 1
and 2, and, of course, to the rules of script of Arabic.
DEFINITIONS
(Note: In the following logic/arithmetric
equations it is not necessary to distinguish between
character codes of Figures 1 and 2, because those in
Eigure 1 occu?y the same code points in Figure 2.)
CC means current character
PC means preceding character
CS means state of CC
PS means state of PC
State ~ means character connects.
State 1 means character does not connectO
Al' bracketed numbers denote hexadecimal ASCII codes.
STATE D~TERMINATION EQUATIONS
CS = ~
If CC / (C2), then CS = 1
If (C~) > CC ~ (C3), then CS = 1
If (D3) ~ CC > (CE), then CS = 1
If (E1) ~ CC ~ (DA), then CS = 1
If CC = (E8), then CS = 1
If CC = (C9), then CS = 1
If CC = (E~), then CS = 1
C.~ -86-~01
~25813~
-G-
STATE C'~fNGE EQU~T102~5
If PC = (E9), .hen PS = 1
If PC = (C7) t then PS = 1
If ?C = (C2), then PS = 1
If PC = (C3), then PS = 1
C~R~ENT CHAR~CTER EQUATIONS
State of CS = 0
If CC = (E7) and PS = ~, then CC = ~F4)
If CC = (D9) and PS = ~, then CC = (EC)
1~ I, CC = (DA) and PS = 0, then CC ~ (F7)
If CC = (C7) and PS = ~, then CC = (~3)
If CC = (C2) and PS = 0, then CC = (~2)
If CC = (C3) and PS = ~, then CC = (.~)
Sta~e of CS - 1
If CC = (C4), then CC = (C4)
If CC = (C6), then CC = (C6)
If CC = (C9), then CC = (C9)
If CC = (CF), ther. CC - (CF)
If CC = (D~), then CC = (DO)
If CC = ~Dl), then CC = (Dl)
If CC = (D2), then CC = (D2)
If CC = (E8), then CC = (E8)
PRECEDING CHARACTERS EQUATION
State of CS = 0
If CC = (C7) and PC = (E4) and PS = ~, then
PC = (9E~
If CC = (C2) and PC = (E4) and PS = 0, then
PC = (F~)
If CC = (C3) and PC = (E4) and PS = 0, then
,,~j PC -- (9-~)
If CC = (C7) and PC = (E4) and PS = 1, then
PC = ( 9rJ )
If CC = (C2) and PC = (E4! and PS = 1, ther.
PC = ~F9)
3~ If CC = (C3) and PC = (E4) and PS = i, then
~C = (~9)
_7_ 1258135
Star~ o~ CS = 1
If PC = (C8), then PC = (A9)
If PC = (CA), then PC = (AA)
If PC = (CB), then PC = (AB)
If PC = (CC), then PC - (AD~
If PC = (CD), then PC = (AE)
If PC = (CE), then PC = (AF)
Ir PC = (D3), then PC = (BC)
If PC = (D4), then PC = (BD)
If PC = (D5), then PC = (BE)
If PC = (D6), then PC = (EB)
If PC = (El), then PC = (BA)
If PC = (E2), then PC = (F8)
If PC = (E3), then PC = (FC)
If PC = (E4), then PC = (FB)
If PC = (E5), then PC = (EF)
If PC = (E6), then PC = (F2)
I. PC = (F4), then PC = (F3)
If PC = ~E7), then PC = (F3)
If PC = (EC), then PC = (C5)
If PC = (D9), then PC = (DF)
If PC = (F7), then PC = (ED)
If PC = (DA), then PC = (EE)
If PC = (C7) and PS = ~, then PC = (A8)
If PC = (C2) and PS = 0, then PC = (A2)
If PC = (C3) and PS = ~, then PC - (A5)
If PC = (E9) and PS = 0, then PC = (F5)
If PC = (EA) and PS = ~, then PC = (Fo)
If PC = (EA) and PS = 1, then PC = (FD~
3~ OPERATIO~
Any character that is not one of the basic
thi~ty-six characters shown in Figure 1 is auto~atically
assigned a state of 1 (i.e. that it does not connect and
paases through the buffer 10 without alterati~n. Each or
the remaining (Arabic) characters as it is fully entered
C~9-86-001
~25~3~i
. ~
in the CC posi.ion in ~ne buf-e~ 10 is assigned ei~ne~ a
s~ate o_ ~ or 1, depending on whether the character is
capable or connection to the character succeeding it, i.e.
.he char~cter to the left of i. (remember that Arabic is
5 writter r-rom right to left). These assign en.s of a state
may be accomplished by means of a lock-up table store~ in
a ROM, or by a logic circuit implementing the s'ate
determination equation above-mentioned.
A connectable character that has rippled through
lC into the PC position in the buffer 10 is altered intc its
terminal shape if followed in the CC position by any
non-connecting character, which, of course, includes word
delimeters. For example, the character (C~) ir followed
by a numeral will be clocked out of the buffer 10 af=e-~
having been updated via bus 18 into the character (AF).
The device is initialized by clearing the b~_~er
10 and assigning 1 states. As the first CC is clocXed in,
i s sta~e is determined. As CC becomes PC its state moves
into second position in the slates register 16. If CC has
been assigned a state of 1, it passes unaltered into the
PC position. If, however, CC has been assigned a sta-e Or
0 and PS (the state of PC) is 0, then CC will be
updated while still in the CC position, as is determinead
by the current character equations.
The logic/arithmetric equations, given above are
most efficiently implemented by means OL a
microprocessor. But it is equally possible to impler_A.t
the equations by means of lock-up tables stored in
read-only memories.
As shown in the preferred embodimen', an input
character maps into exactly one output character. It is
sometimes desirable to have better script resolution b~
having some script characters occupy two character sLots;
for example, when mapping the input (D~) into the output
(B_) plus its "tail" (9F). In such a case, it would b-
C`9-86-C~l
1 25~ ri
i
nec_ssaLv to have tWO characte registe~s for e-c-n c ~C
and PC, that is .o dou'Dle the size of the rip?le-.,hro;~h
bu'-fer 10. However, this would necessitate the speed-ng
up of the bi, rate of the output data s_ream
The reverse mapping operatio~ is also possiD'e
and sometimes necessary, wherein scrip_ characters are
map?ed `~ck into basic (keyboard) charac,ers. As wil' be
appreciated, such reverse opera~ion is much simp'er tc
impleme:lt ard may De ca~ried out with .he same or si-pler
a~paratus with simple mapping eauations.
~5
CA9-8~-001
_ J