Language selection

Search

Patent 1258135 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 1258135
(21) Application Number: 1258135
(54) English Title: DATA STREAM SHAPING OF ARABIC CHARACTERS
(54) French Title: STRUCTURATION DE FLUX DE DONNEES POUR CARACTERES ARABES
Status: Term Expired - Post Grant
Bibliographic Data
(51) International Patent Classification (IPC):
(72) Inventors :
  • SMITH, DEREK K.W. (Canada)
(73) Owners :
  • IBM CANADA LIMITED-IBM CANADA LIMITEE
(71) Applicants :
  • IBM CANADA LIMITED-IBM CANADA LIMITEE (Canada)
(74) Agent: ALEXANDER KERRKERR, ALEXANDER
(74) Associate agent:
(45) Issued: 1989-08-01
(22) Filed Date: 1986-04-24
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data: None

Abstracts

English Abstract


ABSTRACT
An apparatus for modifying encoded Arabic
characters transmitted as part of a data stream comprises
a ripple-through buffer, where the characters stored
temporarily therein are analysed to determine if they are
Arabic alphabet characters and are modified to convert
them to Arabic script characters by means of logic
processing, preferably by a microprocessor. The inverse,
simpler, operation is also possible, where script
characters are modified into basic shape characters.


Claims

Note: Claims are shown in the official language in which they were submitted.


The embodiments of the invention in which an
exclusive property or privilege is claimed are defined as
follows:
1. Apparatus for modifying an input data stream,
which includes data words encoding Arabic alphabet
characters, to generate an output stream wherein said
Arabic alphabet characters are modified in a predetermined
manner into differently shaped corresponding alphabet
characters, comprising: a ripple-through buffer for
storing two characters of said input data stream at
time: logic processing means for identifying said two
characters; and data update means for modifying said two
characters in a predetermined manner.
2. Apparatus as defined in claim 1, wherein said
data update means modifies said two characters such that
basic Arabic alphabet characters are modified into Arabic
alphabet script characters according to Arabic rules of
script.
3. Apparatus as defined in claim 1, wherein said
data update means modifies said two characters such that
Arabic alphabet script characters are modified into basic
Arabic alphabet script characters in a predetermined
manner.
4. Apparatus for modifying a data stream, which
includes data words encoding basic Arabic characters, to
generate a delayed data stream wherein basic Arabic
characters are modified into Arabic script characters,
comprising: a data buffer having a serial input and a
serial output for receiving and outputting data,
respectively; logic analyses means for assigning in
predetermined manner one of two logic states to one of two
consecutive characters stored temporarily in said data
buffer, and logic processing means responsive to said two
consecutive characters and to said one of two logic states
for modifying some characters while temporarily stored in

said data buffer in a predetermined manner whereby basic
Arabic characters are received and Arabic script
characters are output.
5. Apparatus as defined in claims 1, 2 or 4, said
logic processing means being a microprocessor.
6. Apparatus as defined in claim 4, said logic
analysis means being lock-up table stored in a memory,
and said logic processing means being a microprocessor.
11

Description

Note: Descriptions are shown in the official language in which they were submitted.


~L258~5
DATA STREAM SHAPING OF ARABIC CHARACTERS
-
FIELD OF THE INVENTION
The present invention relates to data-stream processing
of Arabic alphabet characters. It provides apparatus for
converting an input data stream containing basic,
unconcatenated, Arabic words into an output data stream
wherein the Arabic letter shape, for proper concatenation in
words, have been substituted for the basic shapes originally
transmitted.
BACKGROUND AND PRIOR ART OF THE INVENTION
The problem of converting the basic shape of an Arabic
letter into its context sensitive proper shape within an
Arabic word is not trivial. A review of the background to
this problem is included in the commonly assigned Canadian
Patent No. 1,207,905, of F. Metwaly, issued July 15, 1986,
and entitled "Method and System for the Generation of Arabic
Script".
The prior art patents, mentioned in Canadian Patent
1,207,905, although not germane to the particular problem
addressed by the present invention, demonstrate the degree
of complexity that was necessary to produce Arabic script
from basic letter shapes. At the very least, as Canadian
Patent No. 1,044,806 issued December 19, 1978 to S.S. Hyder
demonstrates, it was necessary to examine an Arabic
character in the context of the character preceding it and
that succeeding it before deciding on its script or display
shape.
A disadvantage of the prior art solutions is that they
are too complex, at least for application within a
communication data stream.
Due to the fact that Arabic characters are entered into
a computer system, stored, manipulated and transmitted in
their basic shape format, it is necessary to process them
every time before display or printing to
$~

-
~258135
produce readable Arabic script, A hos compu'er ~iay J_
com~unicating w th subordinatc devises. It is ~Aesirable
to interpose a simple device to modify the data stre2- so
that no furthe. p~ocessing is necessary befor~ ~ ~Dlay or
prin'ing at the subordinate device.
SU.`1MARY OF THE INVENTION
The present invention provides a simple appar~tus
which interrupts a data stream~ processes two sequen'ial
characters at-a-time, and outputs a modified data stream
delayed by the duration of two sharacters. Neither the
transmi.ter no- the receiver of the data stream are
interfered with adversely.
The basic Arabic alphabet has twenty-eight
letters. For simple Arabic script suitable for business
ancA the like environments, seventeen letters may eac'n
assume one of t.e two shapes, two letters may each assume
one of ~hree shapes, and two letters may each assume one
of four shapes. The remaining letters have one shape
only. An expanded keyboard alphabet may contain, as
distinct characters, some of the basic letters but
uniquely shaped.
The apparatus performs a mapping operation,
mapping the basic alphabet, or in practice the basic shape
code page, into the Arabic font page. Both the basic code
page and the front page, of course, contain numerals, in
arabic and latin scripts, and many other non-Arabic
characters such as the Latin alphabet, all of which do not
change shape and are treated as stand-alone characters.
In the preferred embodiment, the basic code anc fon~ pages
are matrices wherein each character is iden.ified by a
unique ASCII code point at the intersection of a row and a
column in the FxF matrices (in He~decimal no_ation).
The p~esent invention provides appara'us for
modifying a data stream, which includes data words
er.coding basic Arabic characters to generate a deiayed~
C~9-~6-0~i

~2Sl~35
--3--
da,a stream wnerein ~asic Arabic charac.ers a.e modi~i2~
in=o Arabic script characters, comp.ising: a data bufrer
having 2 serial input and a serial output for receiving
and outputting dat~, respectively; means, fcr assign ng in
a predetermined manner one of two losic states to one cf
two consecutive characters stored temporarily in said data
buffer; and means responsive to said two consecutive
characters and to said one of two logic states .or
modifyi-.g some characters while temporarily stored in s~id
data buîfer in a predetermined manner, whereby basic
Arabic characters are received and Arabic script
characters are output.
~RIEF DESCRI~TIO~ OF THE DR~WI~'GS
The preferred embodiment of the invention wi'l
no~ be des_ribed in conjunction with the annexed drawings
in which:
~igure 1 shows the basic Arabic ASCII code a~e
frcm which all numerals and other characters have be~n
omitted for clarity;
Figure 2 shows the Arabic ~SCII font pag~ frcm
which all numerals and other characters have be2n omi~ted
for clarity; and
Figure 3 shows a bloc'~ diagram of an appar2tus
according to the present invention.
~ETAILED DESCRIPTION OF THE PREFERRED EM~ODIMENT
Refering now to Figure 1 of the drawings, the
bas,c Arabic alphabet is repres~nted in the ~S~II code
page tmatrix) by thirty-six code points. Henceforth each
character will be referred to by its ASCII code in
hexadecimal notation' for example, the right most
character (called `'shadda") is referred to as Fl. Some of
these basic characters will change shape when incorporated
i a word, depending on where in the word they are
located. In Figure 2 of the drawings, the permissible
-5 va-iations of t'nat particular font are represented as code
C~3-86-COl

~258~3~
-- 1
points i;- the ASCII font pase. The thirty-si:~ basic
characte~rs of ~igure 1 re,ain -their code pGSit'OrS, in the
~atrix of Figure 2. Both code pages are indus ry
standards, and contain numerals, ia~in alphabet characters
an othe- characters, which are no. of particular conce~n
to this invention. As will be seen later, all non-~rabic
characters, including numerals, are treated as s,and-c one
characters and their codes remain unaltered.
The purpose of the ap aratus, shown in Figu-e 3,
is to map input data-stream characters, representing the
basic Arabic characters of Figure 1, onto the font
characters of Figure ~, which Ihen form the output da~a
stream~
The apparatus in Figu-e 3 comprises a
ripple-through '~uffer or register 10, which has a serial
input 11 and a serial output 1~. Irne buffer 10 is capable
of holding two c'naracters of eight bits each, eight-'3i~_s
being the necessary number of bits to s ecify a code point
in the 16x16 ~atrices of Figures 1 and 2. The buffer 10
also has parallel inputs 13 and 13' and parallel outputs
14 and 14' giving parallel access to the bit positions of
a current character (CC), having just been fully entere~
from the input 11, and glvlng parallel access to bit
positions of a preceding eharacter (PC), having just been
fully transferred into the last eight-bit positions or the
buffer 10, respectively. The parallel output 14 is input
to a st~te analyses logic 15, which decermines whethe~- the
current character in Ihe buffer 10 connects ~eoncatenctes)
or not. If a character connects, it is assi~ned a s-.ate
o-- logic ~, if it does no~ connect, it is assigned a
state or logic 1. The state of CC is entered intG a s~ate
register 16. A rule application logic 17 co~putes fro~
the character codes in the bufrer 10 and the states in the
\o
re~ister 16 whether the characters in the buffer ~ shoula
_, be ~ltered, and if so into what characters of ~he fon~_
o6-G~l

~L2S813S
,
p2~ e one the o~her or both CC and PC are converted t_o
This up~ating of the c~.aracters stored momenta~ily in .he
buffer 13 is accomplished via ~ata up~ate DUS 18 and _he
parall.el inputs 13, 13'.
The s~ate analyser logic 15 and the rule
applica~ion logic 17 operate to implem2nt the followina
losic/arithmetric equations, which map the cod2 page of
Figure 1 onto the font page of Figure 2 following the
concentra~ion rules of Arabic script. _t should be
understood that these equations are spec-fic to the
particular code pages or matrices as shown in FigLres 1
and 2, and, of course, to the rules of script of Arabic.
DEFINITIONS
(Note: In the following logic/arithmetric
equations it is not necessary to distinguish between
character codes of Figures 1 and 2, because those in
Eigure 1 occu?y the same code points in Figure 2.)
CC means current character
PC means preceding character
CS means state of CC
PS means state of PC
State ~ means character connects.
State 1 means character does not connectO
Al' bracketed numbers denote hexadecimal ASCII codes.
STATE D~TERMINATION EQUATIONS
CS = ~
If CC / (C2), then CS = 1
If (C~) > CC ~ (C3), then CS = 1
If (D3) ~ CC > (CE), then CS = 1
If (E1) ~ CC ~ (DA), then CS = 1
If CC = (E8), then CS = 1
If CC = (C9), then CS = 1
If CC = (E~), then CS = 1
C.~ -86-~01

~25813~
-G-
STATE C'~fNGE EQU~T102~5
If PC = (E9), .hen PS = 1
If PC = (C7) t then PS = 1
If ?C = (C2), then PS = 1
If PC = (C3), then PS = 1
C~R~ENT CHAR~CTER EQUATIONS
State of CS = 0
If CC = (E7) and PS = ~, then CC = ~F4)
If CC = (D9) and PS = ~, then CC = (EC)
1~ I, CC = (DA) and PS = 0, then CC ~ (F7)
If CC = (C7) and PS = ~, then CC = (~3)
If CC = (C2) and PS = 0, then CC = (~2)
If CC = (C3) and PS = ~, then CC = (.~)
Sta~e of CS - 1
If CC = (C4), then CC = (C4)
If CC = (C6), then CC = (C6)
If CC = (C9), then CC = (C9)
If CC = (CF), ther. CC - (CF)
If CC = (D~), then CC = (DO)
If CC = ~Dl), then CC = (Dl)
If CC = (D2), then CC = (D2)
If CC = (E8), then CC = (E8)
PRECEDING CHARACTERS EQUATION
State of CS = 0
If CC = (C7) and PC = (E4) and PS = ~, then
PC = (9E~
If CC = (C2) and PC = (E4) and PS = 0, then
PC = (F~)
If CC = (C3) and PC = (E4) and PS = 0, then
,,~j PC -- (9-~)
If CC = (C7) and PC = (E4) and PS = 1, then
PC = ( 9rJ )
If CC = (C2) and PC = (E4! and PS = 1, ther.
PC = ~F9)
3~ If CC = (C3) and PC = (E4) and PS = i, then
~C = (~9)

_7_ 1258135
Star~ o~ CS = 1
If PC = (C8), then PC = (A9)
If PC = (CA), then PC = (AA)
If PC = (CB), then PC = (AB)
If PC = (CC), then PC - (AD~
If PC = (CD), then PC = (AE)
If PC = (CE), then PC = (AF)
Ir PC = (D3), then PC = (BC)
If PC = (D4), then PC = (BD)
If PC = (D5), then PC = (BE)
If PC = (D6), then PC = (EB)
If PC = (El), then PC = (BA)
If PC = (E2), then PC = (F8)
If PC = (E3), then PC = (FC)
If PC = (E4), then PC = (FB)
If PC = (E5), then PC = (EF)
If PC = (E6), then PC = (F2)
I. PC = (F4), then PC = (F3)
If PC = ~E7), then PC = (F3)
If PC = (EC), then PC = (C5)
If PC = (D9), then PC = (DF)
If PC = (F7), then PC = (ED)
If PC = (DA), then PC = (EE)
If PC = (C7) and PS = ~, then PC = (A8)
If PC = (C2) and PS = 0, then PC = (A2)
If PC = (C3) and PS = ~, then PC - (A5)
If PC = (E9) and PS = 0, then PC = (F5)
If PC = (EA) and PS = ~, then PC = (Fo)
If PC = (EA) and PS = 1, then PC = (FD~
3~ OPERATIO~
Any character that is not one of the basic
thi~ty-six characters shown in Figure 1 is auto~atically
assigned a state of 1 (i.e. that it does not connect and
paases through the buffer 10 without alterati~n. Each or
the remaining (Arabic) characters as it is fully entered
C~9-86-001

~25~3~i
. ~
in the CC posi.ion in ~ne buf-e~ 10 is assigned ei~ne~ a
s~ate o_ ~ or 1, depending on whether the character is
capable or connection to the character succeeding it, i.e.
.he char~cter to the left of i. (remember that Arabic is
5 writter r-rom right to left). These assign en.s of a state
may be accomplished by means of a lock-up table store~ in
a ROM, or by a logic circuit implementing the s'ate
determination equation above-mentioned.
A connectable character that has rippled through
lC into the PC position in the buffer 10 is altered intc its
terminal shape if followed in the CC position by any
non-connecting character, which, of course, includes word
delimeters. For example, the character (C~) ir followed
by a numeral will be clocked out of the buffer 10 af=e-~
having been updated via bus 18 into the character (AF).
The device is initialized by clearing the b~_~er
10 and assigning 1 states. As the first CC is clocXed in,
i s sta~e is determined. As CC becomes PC its state moves
into second position in the slates register 16. If CC has
been assigned a state of 1, it passes unaltered into the
PC position. If, however, CC has been assigned a sta-e Or
0 and PS (the state of PC) is 0, then CC will be
updated while still in the CC position, as is determinead
by the current character equations.
The logic/arithmetric equations, given above are
most efficiently implemented by means OL a
microprocessor. But it is equally possible to impler_A.t
the equations by means of lock-up tables stored in
read-only memories.
As shown in the preferred embodimen', an input
character maps into exactly one output character. It is
sometimes desirable to have better script resolution b~
having some script characters occupy two character sLots;
for example, when mapping the input (D~) into the output
(B_) plus its "tail" (9F). In such a case, it would b-
C`9-86-C~l

1 25~ ri
i
nec_ssaLv to have tWO characte registe~s for e-c-n c ~C
and PC, that is .o dou'Dle the size of the rip?le-.,hro;~h
bu'-fer 10. However, this would necessitate the speed-ng
up of the bi, rate of the output data s_ream
The reverse mapping operatio~ is also possiD'e
and sometimes necessary, wherein scrip_ characters are
map?ed `~ck into basic (keyboard) charac,ers. As wil' be
appreciated, such reverse opera~ion is much simp'er tc
impleme:lt ard may De ca~ried out with .he same or si-pler
a~paratus with simple mapping eauations.
~5
CA9-8~-001
_ J

Representative Drawing

Sorry, the representative drawing for patent document number 1258135 was not found.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Inactive: IPC expired 2020-01-01
Inactive: IPC deactivated 2011-07-26
Inactive: Expired (old Act Patent) latest possible expiry date 2006-08-01
Inactive: IPC from MCD 2006-03-11
Inactive: First IPC derived 2006-03-11
Grant by Issuance 1989-08-01

Abandonment History

There is no abandonment history.

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
IBM CANADA LIMITED-IBM CANADA LIMITEE
Past Owners on Record
DEREK K.W. SMITH
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Abstract 1993-09-07 1 13
Claims 1993-09-07 2 48
Drawings 1993-09-07 2 33
Descriptions 1993-09-07 9 279