Patent 2427468 Summary

(12) Patent Application:	(11) CA 2427468
(54) English Title:	METHOD OF PRODUCING A DATABASE HAVING INPUT FROM A SCANNED DOCUMENT
(54) French Title:	METHODE DE PRODUCTION D'UNE BASE DE DONNEES A PARTIR D'UN DOCUMENT BALAYE
Status:	Dead

Bibliographic Data

(51) International Patent Classification (IPC):	G06F 16/93 (2019.01) G06F 17/20 (2006.01) G06K 9/00 (2006.01)
(72) Inventors :	JANSONS, GIRTS (Canada) TIGWELL, ROB (Canada)
(73) Owners :	JANSONS, GIRTS (Canada) TIGWELL, ROB (Canada)
(71) Applicants :	JANSONS, GIRTS (Canada) TIGWELL, ROB (Canada)
(74) Agent:
(74) Associate agent:
(45) Issued:
(22) Filed Date:	2003-05-02
(41) Open to Public Inspection:	2004-11-02
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	No

(30) Application Priority Data:	None

Abstracts

English Abstract

A method of producing a database having input from a
scanned document comprises the steps of performing a preliminary
scan of the document to thereby produce a digital image stored in
computer memory; displaying the scanned image on the computer
screen; presenting on the computer screen a plurality of database
field types associated with the database, for subsequent selection
of one of the database field types by a user; retrieving the
properties of the selected database field type; optimizing optical
character recognition software according to the properties of the
selected database field; accepting a user-defined area of the
displayed image on the screen; performing optical character
recognition on the defined area so as to convert images within the
defined area into resultant text; displaying the resultant text;
and storing the resultant text in the database. Steps (a) through
(i) are performed as necessary, to form the database.

Claims

Note: Claims are shown in the official language in which they were submitted.

I CLAIM:
1. A method of producing a database having input from a
scanned document, said method comprising the steps of:
(a) performing a preliminary scan of said document to
thereby produce a digital image stored in computer
memory;
(b) displaying the scanned image on said computer
screen;
(c) presenting on said computer screen a plurality of
database field types associated with said database,
for subsequent selection of one of said database
field types by a user;
(d) retrieving the properties of the selected database
field type;
(e) optimizing optical character recognition software
according to the properties of the selected
database field;
-20-

(f) accepting a user-defined area of the displayed
image on said screen;
(g) performing optical character recognition on the
defined area so as to convert images within the
defined area into resultant text;
(h) displaying the resultant text; and,
(i) storing the resultant text as records in said
database;
wherein steps (d) through (i) are performed on an
iterative basis to form said database.
2. The method of claim 1, further comprising the step of:
(e') optimizing said optical character recognition
software according to records stored in said
database;
wherein step (e') is performed on an iterative basis with
steps (d) through (i) to form said database.
-21-

3. The method of claim 2, wherein said records stored in
said database represent the record history of said database field.

4. The method of claim 1, further comprising the step of:
(j) creating an edit history of said database field.

5. The method of claim 4, wherein step (j) is performed on
an iterative basis with steps (d) through (i) to form said database

6. The method of claim 4, wherein said edit history
comprises a text file.

7. The method of claim 4, further comprising the step of:
(e'') optimizing said optical character recognition
software according to said edit history;

wherein step (e'') is performed on an iterative basis
with steps (d) through (i) to form said database.

8. The method of claim 1, wherein said type of database
field is a numeric database field.

9. The method of claim 1, wherein said type of database
field is a date database field.

-22-

10. The method of claim 1, wherein said type of database
field is a text database field.

11. The method of claim 1, wherein said resultant text is
presented on said screen for editing.

12. The method of claim 1, further comprising the step of:
(h') presenting on said computer screen a list of words
similar to a chosen word from said resultant text;

wherein step (h') is performed on an iterative basis with
steps (d) through (i) to form said database.

13. The method of claim 12, wherein said list of words is
related to the selected database field.

14. The method of claim 1, further comprising the step of:
(b') presenting a list of document numbers for selection
by a user.

15. The method of claim 1, wherein in step (h), the resultant
text is displayed on said computer screen in correlation with the
presentation of the selected database field type.

-23-

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02427468 2003-05-02
FIELD OF THE INVENTION
[0001 The present invention relates to a method of producing a
database having input from a scanned document, such as a legal
document, wherein optical character recognition is used to populate
the database, and more particularly to such a method wherein the
optical character recognition software is optimized according to
the properties of selected fields in the database.
BACKGROUND OF THE INVENTION
[0002] Many businesses, organizations, and the like, have a need
to store large number of paper documents in an organized manner
such that the documents can be identified and retrieved, and
information on these documents can be readily found. For instance,
in the legal profession, hundreds or even thousands of court
documents and other related documents need to be stored for ready
use during work on a trial. It is necessary that information
within such documents be easily located, when needed. Without the
use of a computer, it would be necessary to physically read
through the documents one page at a time, in order to locate the
information. Such reading of these documents is usually completely
- 1 -

CA 02427468 2003-05-02
impractical as it is extremely time consuming and much of the
information needed could be easily overlooked.
[0003] Alternatively, in order to permit computers to be used
for searching for such information, documents can be scanned into
a computer database using optical character recognition software.
This method is known to be only reasonably accurate, at best, in
terms of actual character recognition, per se, and may need to be
supplemented by corrections typically made by using editing
software, word processing software, or the like, especially when
dealing with legal documents. However, a text document that is
created using optical character recognition software is essentially
only a collection of words and numbers presented in a visual format
of some type. There is no defined significance to the various
characters, words, and numbers. For instance, a name such as John
Doe may appear in the document several times, but that name might
have no significance in the overall context of the document.
Accordingly, searching for information in a complete document
created in its entirety by optical character recognition generally
produces results that have very little meaning. In contrast,
another name, such as Herman Schwartz, might be very significant,
since he might be the subject of the document, the author, and so
on; however, using this method, no significance is associated with
the name.
- 2 -

CA 02427468 2003-05-02
[0004] It has been found that it is useful to categorize and
store in a database various significant key words related to a
document, such as subject, author, date, location, document type,
and so on. As is well known in the prior art, significant
information from the document is manually coded into the database.
In order to identify particular documents that are being sought,
the database is subsequently viewed to find key words or is
searched by known key words. Due to the time it takes to manually
code such information into a database, this method is extremely
slow and undesirable.
[0005] There is also a consideration of processing time when
using optical character recognition software. A considerable
amount of time can be used when scanning a large number of
documents, many of which may have several dozen pages or more.
Using large amounts of time or creating information databases
relating to documents is highly undesirable, as it is ultimately
expensive.
[0006] One specific attempt to produce a quick and accurate
method of recognizing text from scanned documents is disclosed in
U. S. patent 6, 400, 845 issued June 4t'', 2002, to Volino, and entitled
System and Method of Data Extraction from Digital Images. In this
- 3 -

CA 02427468 2003-05-02
system and method of extraction of textual data, the digital image
to be processed is first compared against master document images
contained in a database. Upon determining the proper master
document image, a template having predefined data zones is applied
to the image to create zone images. The zone images are optically
read and converted into a character file which is then parsed with
the pattern to locate the text to be extracted. Upon finding data
matching the pattern, that data is extracted and visible portions
are used to populate data fields in the database record associated
with the digital image. In other words, the optical character
recognition is dependent on the template having predefined data
zones. Such templates might include an invoice template, a
purchase order template, a memo template, a name and address
template, and so on. Configuring of an optical character
recognition engine according to a known or suspected template, is
of limited usefulness.
[0007] It is an object of the present invention to provide a
method of producing a database having input from a scanned
document.
[0008] It is another object of the present invention to provide
a method of producing a database having input from a scanned
- 4 -

CA 02427468 2003-05-02
document, wherein the database is populated by scanning of
documents.
[0009] It is another object of the present invention to provide
a method of producing a database having input from a scanned
document, wherein the database is populated by means of optical
character recognition of selected portions of documents.
[00010] It is another object of the present invention to provide
a method of producing a database having input from a scanned
document, wherein the optical character recognition engine is
optimized according to a database field type relating to a selected
portion of a document.
[00011] It is another object of the present invention to provide
a method of producing a database having input from a scanned
document, wherein the optical character recognition engine is
optimized according to the records of the database.
[00012] It is another object of the present invention to provide
a method of producing a database having input from a scanned
document, wherein the optical character recognition engine is
optimized according to the records of the database, which records
represent the record history of the database.
- 5 -

CA 02427468 2003-05-02
[00013] It is another object of the present invention to provide
a method of producing a database having input from a scanned
document, wherein the optical character recognition engine is
optimized according to the edit history of the database.
SLII~tARY OF THE INVENTION
[00014] In accordance with one aspect of the present invention
there is disclosed a novel method of producing a database having
input from a scanned document. The method comprises the steps of:
(a) performing a preliminary scan of the document to thereby
produce a digital image stored in computer memory; (b) displaying
the scanned image on the computer screen; (c) presenting on the
computer screen a plurality of database field types associated with
the database, for subsequent selection of one of the database field
types by a user; (d) retrieving the properties of the selected
database field type; (e) optimizing optical character recognition
software according to the properties of the selected database
field; (f) accepting a user-defined area of the displayed image on
the screen; (g) performing optical character recognition on the
defined area so as to convert images within the defined area into
resultant text; (h) displaying the resultant text; and (i) storing
- 6 -

CA 02427468 2003-05-02
the resultant text in the database. Steps (a) through (i) are
performed at least one time each, as necessary, to form the
database.
[00015] Other advantages, features and characteristics of the
present invention, as well as methods of operation and functions of
the related elements of the structure, and the combination of parts
and economies of manufacture, will become more apparent upon
consideration of the following detailed description and the
appended claims with reference to the accompanying drawings, the
latter of which is briefly described herein below.
BRIEF DESCRIPTION OF THE DRAWINGS
[00016] The novel features which are believed to be
characteristic of the method of producing a database having input
from a scanned document according to the present invention, as to
its organization, use and method of operation, together with
further objectives and advantages thereof, will be better
understood from the following drawings in which a presently
preferred embodiment of the invention will now be illustrated by
way of example. It is expressly understood, however, that the
drawings are for the purpose of illustration and description only,

CA 02427468 2003-05-02
and are not intended as a definition of the limits of the
invention. In the accompanying drawings:
[00017] Figure 1 is a flow chart of the preferred embodiment of
the method of producing a database having input from a scanned
document according to the present invention;
[00018] Figure 2 is a representation of a computer screen showing
a scanned document;
[00019] Figure 3 is a representation of a computer screen showing
the software employing the method of the present invention, with
the type of database field about to be selected;
[00020] Figure 4 is a representation of a computer screen showing
the software employing the method of the present invention, with a
date type of database field having been selected;
[00021] Figure 5 is a representation of a computer screen as
shown in Figure Q, and additionally showing the user defining an
area of scanned characters for subsequent optical character
recognition;
_ g _

CA 02427468 2003-05-02
[00022] Figure 6 is a representation of a computer screen as
shown in Figure 4, and additionally showing the user defined area
having scanned characters therein and the resultant text from the
optical character recognition execution;
[00023] Figure 7 is a representation of a computer screen showing
the software employing the method of the present invention, with a
text type of database field having been selected;
[00024] Figure 8 is a representation of a computer screen as
shown in Figure 7, and additionally showing the user defining an
area of scanned characters for subsequent optical character
recognition;
[00025] Figure 9 is a representation of a computer screen as
shown in Figure 7, and additionally showing the user defined area
having scanned characters therein and the resultant text from the
optical character recognition execution;
[00026] Figure 10 is a representation of a computer screen as
shown in Figure 9, with the resultant text from the optical
character recognition execution being edited;
- 9 -

CA 02427468 2003-05-02
[00027] Figure 11 is a representation of a computer screen
showing the software employing the method of the present invention,
with a date type of database field having been selected;
[00028] Figure 12 is a representation of a computer screen as
shown in Figure 11, and additionally showing the user defining an
area of scanned characters for subsequent optical character
recognition;
[00029] Figure 13 is a representation of a computer screen as
shown in Figure 11, and additionally showing the user defined area
having scanned characters therein and the resultant text from the
optical character recognition execution;
[00030] Figure 14 is a simplified digrammatic representation of
an edit history of the database field; and,
[00031] Figure 15 is a representation of a computer screen
showing the software employing an alternative embodiment of the
method of the present invention, showing the user defined area
having scanned characters therein and the resultant text from the
optical character recognition execution.
- 10 -

CA 02427468 2003-05-02
DETAILED DESCRIPTION OF THE PREFERRED AND ALTERNATIVE EMBODIMENTS
[00032] Reference will now be made to Figures 1 through 14, which
show a preferred embodiment of the method of producing a database
22 having input from a scanned document 24, according to the
present invention, as indicated by general reference numeral 20.
The preferred embodiment method 20 of producing a database 22
having input from a scanned document 24 comprises the steps of
first performing a preliminary scan of the document to thereby
produce a digital image 30, as can be best seen in Figure 2. The
digital image 30 of the scanned document 24 is stored in computer
memory. The method 20 of producing a database 22 having input from
a scanned document 24 is embodied in the form of a software program
executed on an appropriate computer, typically a microcomputer.
[00033] The scanned image 32 of the document is then displayed on
the computer screen 34, as is best seen in Figure 3. The document
has a document number associated with it, as shown in box 36 on the
computer screen 34. The desired document number may be entered in
the box 36 entitled ~~Go to Document", or alternatively the arrow
buttons could be used to navigate through the document images.
Also, a list of the document numbers 37 is presented in a menu for
selection by a user.
- 11 -

CA 02427468 2003-05-02
[00034] As can be seen in Figures 4 through 13, a plurality of
database field types associated with the database 22 are presented
on the computer screen 34. These database field types include a
date database field 38, text database fields such as document type
41, title 42, summary 43, author 44, recipient 45, and location 46,
and numeric database fields, which might include a page number or
a monetary value 48.
[00035] The database field types are presented on the computer
screen 34 for subsequent selection of one of the database field
types by a user. Typically, a user would move a cursor to the
desired database field on the screen, and select it by clicking a
mouse button, or the like. The computer program then retrieves the
properties of the selected database field type. For instance, the
properties of a date database field type might include limiting
characters to 1 through 12, 1 through 31, and a two digit number
preceded by a "19" or a "20". The properties of a text database
field type might include a maximum length, a list of preferred
field content, and a history of past entries into that field. The
properties of a numeric database field type might include maximum
value, minimum value, and maximum length of content.
- 12 -

CA 02427468 2003-05-02
[00036] The optical character recognition software is then
optimized according to the properties of the selected database
field.
[00037] Reference will now be made to Figures 4 through 6, which
show the selection of a date database field type. Figure 4 shows
the date database field 38 being selected. Figure 5 shows a user-
defined area 50 of the displayed image being created on the
computer screen 34. The program then accepts this user-defined
area 50 of the displayed image on the screen for use by the optical
character recognition software. The next step is to performing
optical character recognition on the user-defined area 50 so as to
convert images 52 within the defined area into resultant text 54.
As can be seen in Figure 6, an enlarged version 56 of the captured
image is displayed on the screen, for ready verification by a user
that it is the desired image. The resultant text 54 is then
displayed on the computer screen 34, in box 38. In this manner,
the resultant text is displayed on the computer screen 34 in
correlation with the presentation of the selected database field
type. Further, the resultant text 54 is presented on the computer
screen 34 for editing, if necessary, which will be discussed in
greater detail subsequently.
- 13 -

CA 02427468 2003-05-02
[00038] The steps discussed above, from retrieving the properties
of the selected database field type through storing the resultant
text as records 28 in the database 22, are performed on an
iterative basis to form the database 22.
[00039] The resultant text is then stored as records 28 in the
database 22, as can be best seen in Figure 1. The records 28
stored in the database 22 represent the record history of the
database field. In subsequent occurrences of the same type of
database field type, such as the numeric database field type shown
in Figures 4 through 6, the optical character recognition software
is optimized according to the appropriate records 28 stored in the
database 22. This type of optimization is also performed on an
iterative basis with the above steps to form the database 22.
[00040] Reference will now be made to Figures 7 through 10, which
show the selection of a date database field type. Figure 7 shows
the text database field 60 being selected, which is a title field
42. Figure 8 shows a user-defined area 61 of the displayed image
being created on the screen. The program then accepts this user-
defined area 61 of the displayed image on the computer screen 34
for use by the optical character recognition software. The next
step is to performing optical character recognition on the user-
defined area 61 so as to convert images 62 within the user-defined
- 14 -

CA 02427468 2003-05-02
area 61 into resultant text 64. As can be seen in Figure 9, an
enlarged version 66 of the captured image is displayed on the
computer screen 34, for ready verification by a user that it is the
desired image. The resultant text 64 is then displayed on the
computer screen 34, in box 60. In this manner, the resultant text
64 is displayed on the computer screen 34 in correlation with the
presentation of the selected database field type. As is best seen
in Figure 10, it is sometimes necessary to edit the resultant text
presented on the computer screen 34. The term "AFFIDAvrr OF
DOCUMENTS" in box 60, which was original resultant text generated
by the optical character recognition software, has been amended to
read "AFFIDAVIT OF DOCUMENTS".
[00041] The resultant text is then stored as records 28 in the
database 22, as can be best seen in Figure 1. The records 28
stored in the database 22 represent the record history of the
database field. In subsequent occurrences of the same type of
database field type, such as the numeric database field type shown
in Figures 7 through 11, the optical character recognition software
is optimized according to the appropriate records 28 stored in the
database 22.
- 15 -

CA 02427468 2003-05-02
[00042] The steps discussed above with reference to Figures 7
through 10, are performed on an iterative basis to form the
database 22.
[00043] Reference will now be made to Figures 11 through 13,
which show the selection of a numeric database field type. Figure
I1 shows the numeric database field being selected, which is the
monetary value 48. Figure 12 shows a user-defined area 71 of the
displayed image being created on the computer screen 34. The
program then accepts this user-defined area 71 of the displayed
image on the computer screen 34 for use by the optical character
recognition software. The next step is to performing optical
character recognition on the defined area so as to convert images
72 within the defined area 71 into resultant text 74. As can be
seen in Figure 13, an enlarged version 76 of the captured image is
displayed on the computer screen 34, for ready verification by a
user that it is the desired image. The resultant text is then
displayed on the computer screen 34, in box 48. In this manner,
the resultant text is displayed on the computer screen 34 in
correlation with the presentation of the selected database field
type.
[00044] The resultant text is then stored as records 28 in the
database 22, as can be best seen in Figure 1. The records 28
- 16 -

CA 02427468 2003-05-02
stored in the database 22 represent the record history of the
database field. In subsequent occurrences of the same type of
database field type, such as the numeric database field type shown
in Figures 11 through 13 the optical character recognition software
is optimized according to the appropriate records 28 stored in the
database 22.
[00045] The steps discussed above with reference to Figures 11
through 13, are performed on an iterative basis to form the
database 22.
[00046] As can be best seen in Figure 14, the present invention
also permits creation of an edit history 80 of the database field,
on an iterative basis with the other steps described above, as
editing of the resultant text is done. This edit history 80 is
preferably stored as a text file associated with the file and the
database 22. The optical character recognition software may also
be optimized according to the edit history.
[00047] Reference will now be made to Figure 15, which shows an
alternative embodiment of the method of producing a database having
input from a scanned document according to the present invention,
as indicated by general reference numeral 120. In the alternative
embodiment method of producing a database having input from a
- 17 -

CA 02427468 2003-05-02
scanned document, as is seen in Figure 15, an optional list, as
indicated by general reference numeral 122, of words similar to a
chosen word from the resultant text can be presented on the
computer screen 34. These similar words are to assist in choosing
the proper word for the resultant text, and may also preclude a
user from having to type in corrections. This list of words is
related to the selected database field, so as to provide a high
degree of accuracy. These similar words have either been
previously entered into the appropriate database field or have been
entered in the appropriate database field by the software program
embodying the present invention. These words may also be stored in
a list associated with the appropriate database field, for the
purpose of ready review by a user. These similar words are
displayed on the computer screen for selection by a user, via a
drop-down list.
[00048] As can be understood from the above description and from
the accompanying drawings, the present invention provides a method
of producing a database having input from a scanned document, a
method of producing a database having input from a scanned
document, wherein the database is populated by scanning of
documents, a method of producing a database having input from a
scanned document, wherein the database is populated by means of
optical character recognition of selected portions of documents, a
- 18 -

CA 02427468 2003-05-02
method of producing a database having input from a scanned
document, wherein the optical character recognition engine is
optimized according to a database field type relating to a selected
portion of a document, a method of producing a database having
input from a scanned document, wherein the optical character
recognition engine is optimized according to the records of the
database, a method of producing a database having input from a
scanned document, wherein the optical character recognition engine
is optimized according to the records of the database, which
records represent the record history of the database, and a method
of producing a database having input from a scanned document,
wherein the optical character recognition engine is optimized
according to the edit history of the database, all of which
features are unknown in the prior art.
[00049] Other variations of the above principles will be apparent
to those who are knowledgeable in the field of the invention, and
such variations are considered to be within the scope of the
present invention. Further, other modifications and alterations
may be used in the design and implementation of the method of
producing a database having input from a scanned document according
to the present invention without departing from the spirit and
scope of the accompanying claims.
- 19 -

Representative Drawing

Sorry, the representative drawing for patent document number 2427468 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(22) Filed	2003-05-02
(41) Open to Public Inspection	2004-11-02
Dead Application	2007-05-02

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2006-05-02	FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Application Fee			$150.00	2003-05-02
Maintenance Fee - Application - New Act	2	2005-05-02	$50.00	2005-04-27
Back Payment of Fees			$50.00	2006-05-26

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
JANSONS, GIRTS
TIGWELL, ROB

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Abstract	2003-05-02	1	30
Description	2003-05-02	19	691
Claims	2003-05-02	4	94
Cover Page	2004-10-08	1	34
Correspondence	2003-06-04	1	10
Assignment	2003-05-02	2	81
Correspondence	2005-04-08	1	17
Fees	2005-03-15	4	205
Fees	2005-04-27	1	32
Fees	2005-03-15	1	63
Correspondence	2006-06-02	1	24
Fees	2006-05-26	3	119
Drawings	2003-05-02	15	849

Language selection

Menus

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2427468 Summary

English Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.