Note: Descriptions are shown in the official language in which they were submitted.
CA 02805150 2013-02-06
AUTOMATED GENERATION OF STRUCTURED ELECTRONIC
REPRESENTATIONS OF USER-FILLABLE FORMS
TECHNICAL FIELD
[0001] This relates to electronic document processing, and more
particularly, to
methods, software, and devices for generating structured electronic
representations
of user-fillable forms.
BACKGROUND
[0002] Most documents are structured to organize their contents. For
example, a
document may be structured to organize its contents into various regions, such
as a
table of contents, a body, an index, etc. The structure of some documents may
be
hierarchical, such that document regions are further divisible into sub-
regions. For
example, a document body may be divided into chapters, paragraphs, etc.
Document structure may depend on document type and/or document contents.
[0003] With the proliferation of computers and electronic communication,
documents are now commonly represented electronically. A typical electronic
representation of a document includes data reflective of the document's
contents.
In some cases, an electronic representation of a document may include data
reflective of the document's structure. Inclusion of data reflective of the
document's
structure facilitates automatic processing of that document. For example, a
document's title may be automatically modified using structural data that
identifies
the title amongst the document's contents. Similarly, titles of two different
documents may be automatically compared using structural data for those
documents.
[0004] In recent years, one type of document that has become more commonly
represented electronically is the patient order set. A patient order set is a
form
fillable by a doctor to prescribe a course of treatment for a hospital
patient. A typical
patient order form includes a multitude of treatment options, from which a
doctor
may select. Patient order sets are often structured to organize treatment
options by
1
treatment type, drug type, symptom type, patient type, etc. However,
electronic
representations of patient order sets typically do not include data reflective
of such
structure. Moreover, many organizations create their patient order sets using
off-
the-shelf word processing software which do not have the capability to process
data
reflective of such structure. As such, automatic processing of patient order
sets,
e.g., to modify or compare their contents, has been difficult.
SUMMARY
[0005] According to an embodiment, there is provided a computer-implemented
method of constructing a structured electronic representation of a given user-
fillable
form from a parsable version of said given user-fillable form, said method
comprising: storing a plurality of data structures each representative of a
region of a
pre-defined type of a user-fillable form in structured electronic
representations of
user-fillable forms; receiving a parsable version of said given user-fillable
form;
parsing said parsable version to identify fields in said parsable version of
said given
user-fillable form, said fields including at least one text field and at least
one input
field; grouping said fields to form a plurality of grouping results, each of
said
grouping results corresponding to a region of said given user-fillable form,
and
comprising one or more fields and data parsed from said parsable version; for
each
one of said plurality of grouping results: determining a quality of match
between
said one of said plurality of grouping results and each of said data
structures;
identifying one of said plurality of data structures having a highest quality
of match
between it and said one of said plurality of grouping results; and storing an
indicator
of said identified data structure having the highest quality of match between
it and
said one of said plurality of grouping results.
[0006] According to another embodiment, there is provided a non-transitory
computer-readable storage medium storing instructions which when executed
adapt a computing device to: store a plurality of data structures each
representative
of a region of a pre-defined type of a user-fillable form in structured
electronic
representations of user-fillable forms; receive a parsable version of a given
user-
fillable form; parse said parsable version to identify fields in said parsable
version of
2
CA 2805150 2019-01-14
said given user-fillable form, said fields including at least one text field
and at least
one input field; grouping said fields to form a plurality of grouping results,
each
grouping result corresponding to a region of said given user-fillable form,
and
comprising one or more fields and data parsed from said parsable version for
said
one or more fields; for each one of said plurality of grouping results:
determine a
quality of match between said one of said plurality of grouping results and
each of
said data structures; identify one of said plurality of data structures having
a highest
quality of match between it and said one of said plurality of grouping
results; and
store an indicator of said identified data structure having the highest
quality of
match between it and said one of said plurality of grouping results.
[0007] According to yet another embodiment, there is provided a computing
device for constructing a structured electronic representation of a user-
fillable form
from a parsable version of said user-fillable form, said computing device
comprising: at least one processor; memory in communication with said at least
one processor, and software code stored in said memory, which when executed by
said at least one processor causes said computing device to: store a plurality
of
data structures each representative of a region of a pre-defined type of a
user-
fillable form in structured electronic representations of user-fillable forms;
receive a
parsable version of said given user-fillable form; parse said parsable version
to
identify fields in said parsable version of said given user-finable form, said
fields
including at least one text field and at least one input field; group said
fields to form
a plurality of grouping results, each grouping result corresponding to a
region of
said given user-fillable form, each said grouping results comprising one or
more
fields and data parsed from said parsable version for said one or more fields;
for
each one of said plurality of grouping results: determine a quality of match
between
said one of said plurality of grouping results and each of said data
structures;
identify one of said plurality of data structures having a highest quality of
match
between it and said one of said plurality of grouping results; and store an
indicator
of said identified data structure having the highest quality of match between
it and
said one of said plurality of grouping results.
[0008] Other features will become apparent from the drawings in conjunction
3
CA 2805150 2019-01-14
with the following description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] In the figures, which illustrate example embodiments,
[0010] FIG. 1 is a high level block diagram of a computing device for
generating
structured electronic representations of user-fillable forms, exemplary of
embodiments;
[0011] FIG. 2 illustrates example software organization of the computing
device
of FIG. 1;
3a
CA 2805150 2019-01-14
CA 02805150 2013-02-06
[0012] FIG. 3 is a high-level block diagram of the software modules of the
form
processing software of FIG. 2, executed at the computing device of FIG. 1;
[0013] FIG. 4 illustrates an example form page;
[0014] FIG. 5 illustrates a region of the example form page of FIG. 4;
[0015] FIG. 6 illustrates a portion of an example XML document,
representative
of the form region of FIG. 5;
[0016] FIG. 7 schematically illustrates matching of a form region to stored
data
structures, performed by the matching module of FIG. 3;
[0017] FIG. 8 illustrates an example user interface for modifying
boundaries of
form sections, presented by the adjusting module of FIG. 3;
[0018] FIG. 9 illustrates an example user interface containing a tree
diagram
representing form regions in the example form page of FIG. 4, presented by the
adjusting module of FIG. 3;
[0019] FIG. 10 illustrates an example user interface for editing form
regions,
presented by the adjusting module of FIG. 3;
[0020] FIG. 11 illustrates a portion of an example XML document, generated
by
the generating module of FIG. 3, providing a structured representation of the
example form page of FIG. 4; and
[0021] FIG. 12 is a flowchart depicting exemplary blocks performed by the
form
processing software of FIG. 2.
DETAILED DESCRIPTION
[0022] FIG. 1 is a high-level block diagram of computing device 10
operating
as a device for generating structured electronic representations of user-
fillable
forms, exemplary of embodiments. As will become apparent, computing device 10
stores and executes software to adapt it to function in manners exemplary of
embodiments.
4
CA 02805150 2013-02-06
[0023] As illustrated, computing device 10 includes processor 12, network
interface 14, a suitable combination of persistent storage memory 16, random
access memory and read only memory, and one or more I/O interfaces 18.
Processor 12 may be an Intel x86, PowerPC, ARM processor or the like. Network
interface 14 interconnects computing device 10 to a data network such a
private
local area network or the public Internet. Additional input/output peripherals
such as
a keyboard, monitor, mouse, scanner, printer and the like of computing device
10
are not specifically detailed herein. Computing device 10 may also include
peripheral devices operable to load software into memory 16 from a computer-
readable medium, for executing at computing device 10. Peripheral devices may
be
interconnected to computing device 10 by one or more I/O interfaces 18.
[0024] FIG. 2 depicts a simplified organization of example software
components stored within memory 16 of computing device 10. As illustrated,
these
software components include operating system (OS) software 20, database engine
22, database 30, hypertext transfer protocol (HTTP) server software 24, and
form
processing software 26. These software components, when executed, adapt
computing device 10 to operate as a device for generating structured
electronic
representations 34 of user-fillable forms, exemplary of embodiments.
[0025] As depicted in FIG. 2 and further detailed below, form processing
software 26 receives electronic representations 32 of user-fillable forms.
Each
electronic representation 32 stores data reflective of the contents of the
represented user-fillable form, but stores insufficient data reflective of the
structure
of that form. Form processing software 26 processes each electronic
representation
32 of a user-fillable form to generate a corresponding structured electronic
representation 34 of that form.
[0026] OS software 20 may, for example, be a Unix-based operating system
(e.g., Linux, FreeBSD, Solaris, OSX, etc.), a Microsoft Windows operating
system
or the like. OS software 20 allows form processing software 26 to access
processor
12, network interface 14, memory 16 and one or more I/O interfaces 18 of
computing device 10. OS software 20 may include a TCP/IP stack allowing
computing device 10 to communicate with interconnected computing devices
CA 02805150 2013-02-06
through network interface 14 using the TCP/IP protocol.
[0027] Database engine 22 may be a conventional relational or object-
oriented
database engine, such as Microsoft SQL Server, Oracle, DB2, Sybase, Pervasive,
MongoDB, NoSQL, Hadoop or any other database engine known to those of
ordinary skill in the art. Database engine 22 provides access to one or more
databases 30, and thus typically includes an interface for interaction with OS
software 20, and other software, such as form processing software 26. Database
30 may be a relational, object-oriented or document-oriented database. As will
become apparent, database 30 stores data structures representative of
different
types of structural regions found in user-tillable forms, and also stores data
reflective of past instances in which particular ones of those data structures
have
been used to represent particular structural regions.
[0028] HTTP server software 24 is a conventional HTTP web server
application such as the Apache HTTP Server, nginx, Microsoft IIS or similar
server
application. HTTP server software 24 allows computing device 10 to act as a
conventional HTTP server and provides a plurality of web pages for access by
way
of network-interconnected computing devices (not shown). Web pages may be
implemented using traditional web languages such as HTML, XHTML, Java,
Javascript, Ruby, Python, Perl, PHP, Flash or the like, and stored in memory
16 of
computing device 10. HTTP server software 24 may also receive electronic
representations 32 of user-fillable forms for processing by computing device
10,
and may host structured electronic representations 34 of those user-fillable
forms,'
as generated by form processing software 26.
[0029] In the embodiment shown in FIG. 3, form processing software 26
includes the following software modules: receiving module 36, parsing module
38,
grouping module 40, matching module 42, adjusting module 44, and generating
module 46. The functions of each of these software modules are detailed below.
[0030] These software modules may be written, for example, using
conventional programming languages such as Java, J#, C, C++, C#, Pert, Visual
Basic, Ruby, Scala, etc. These modules may additionally be written to use
6
CA 02805150 2013-02-06
conventional web application frameworks such as the Java Servlet framework or
the .NET Framework. Thus, the modules of form processing software 26 may be in
the form of one or more executable programs, scripts, routines,
statically/dynamically linkable libraries, or servlets.
[0031] In embodiments in which one or more software modules of form
processing software 26 are in the form of servlets, HTTP server software 24
may
communicate with an interconnected servlet server application (not shown) such
Apache Tomcat, IBM WebSphere, Red Hat JBoss, or the like. The servlet server
application executes servlets on behalf of HTTP server software 24 to extend
the
ability of HTTP server software 24 to respond to HTTP requests, such as a
request
to process an electronic representation 32 of a user-fillable form.
[0032] The software modules of form processing software 26 may include
various user interfaces, as detailed below. These user interfaces may be
written in
a language allowing their presentation on a web browser, or code that will
dynamically generate such user interfaces. As will be apparent, users of
network-
interconnected computing devices may interact with form processing software 26
by way of these user interfaces. These user interfaces may be provided in the
form
of web pages by way of HTTP server software 24 to network-interconnected
computing devices.
[0033] FIG. 4 depicts one page of an exemplary user-fillable form. As
illustrated, this user-fillable form is a patient order set usable by doctors
to prescribe
treatment for community-acquired pneumonia patients. The depicted page
contains
text detailing various treatment options, as well as input fields (checkboxes
and text
entry fields) to be filled by doctors when filling the form.
[0034] As depicted, the form page is structured into regions to organize
its
contents. For example, the page includes a region corresponding to a Title,
namely,
"Community Acquired Pneumonia Admission Order Set." The page also includes a
region entitled "Antibiotic Therapy", which may be referred to as a Module.
This
Module includes a number of sub-regions each consisting of a checkbox
accompanied by text, which may each be referred to as an Order. This module
also
7
CA 02805150 2013-02-06
includes two further sub-regions, respectively entitled "For severely ill
patient" and
"For patient with suspected gross aspiration". These sub-regions may each be
referred to as a Sub-module. Like Modules, Sub-modules may also include
Orders.
Further, each Order may include one or more Sub-orders. For example, the Order
having accompanying text "metroNIDAZOLE 500 mg..." includes two Sub-orders,
respectively accompanied by text "Maintain NPO" and "SLP assessment in a.m. to
assess swallowing."
[0036] The patient order set depicted in FIG. 4 may be represented by an
electronic representation 32. However, while the electronic representation 32
may
store data reflective the form's constituent text (as referred to as text
fields) and
input fields, the electronic representation 32 may store no data or
insufficient data
reflective of the form's structural regions, such as its Title, Modules, Sub-
modules,
Orders or Sub-orders.
[0036] Receiving module 36 receives electronic representations 32 of user-
fillable forms for processing by form processing software 26. Receiving module
36
may receive each electronic representation 32 from memory 16, or from a
computer-readable medium connected to an I/O interface 18, or through network
interface 14, e.g., by way of HTTP server software 24.
[0037] Receiving module 36 may include user interfaces configured to allow
users to request processing of particular electronic representations 32. These
user
interfaces may be configured to allow users to provide receiving module 36
with
indicators of particular electronic representation 32 to be processed (e.g., a
URL),
and/or copies of such electronic representations 32. In some embodiments,
receiving module 36 may present these user interfaces to users directly
operating
computing device 10. In alternate embodiments, receiving module 36 may present
these user interfaces to users operating network-connected computing devices,
e.g., by way of HTTP server software 24.
[0038] Receiving module 36 receives each electronic representation 32 of a
user-fillable form in a parsable format. This parsable format may include one
or
more electronic documents in Portable Document Format (PDF), Rich Text Format
8
CA 02805150 2013-02-06
(RTF), Extensible Markup Language (XML) format, HyperText Markup Language
(HTML) format, Microsoft Word (DOC/DOCX) format or the like. This parsable
format may also include one or more electronic messages, such as HTTP
messages or the like.
[0039] Parsing module 38 parses each electronic representation 32 of a user-
fillable form to determine the form's fields, including text fields and input
fields. To
this end, parsing module 38 may include one or more parsers such as a text
field
parser and an input field parser.
[0040] A text field parser identifies text fields in an electronic
representation 32
of a user-fillable form. Each text field corresponds to a piece of text such
as a
phrase, a word, or a character. Text fields may include numbers, symbols, or
any
Unicode character. A text field parser may also identify properties of text
fields, as
detailed below.
[0041] An input field parser identifies input fields in an electronic
representation 32 of a user-fillable form, such as text input fields for
receiving
textual data from users filling the form or checkbox fields which can be
selected
(checked) by such users. Other types of input fields known to those of
ordinary skill
in the art, such as date input fields, numerical input fields, radio selection
boxes,
drop-down selection boxes, multiple-choice boxes, etc., may also be
identified. An
input field parser may also identify the properties of input fields, as
detailed below.
[0042] In some embodiments, a single parser may perform the functions of
both a text field parser and an input field parser. For example, when
electronic
representation 32 is in the form of one or more PDF documents, PDFBox,
distributed by the Apache Software Foundation, may be used as the text field
parser and/or the input field parser.
[0043] In an embodiment, parsing module 38 creates a new PageElement
object to correspond to each form field identified by parsing electronic
representation 32. The PageElement class is a parent class to a number of
subclasses including TextInputField, CheckboxField, and TextField. The
particular
subclass of each PageElement object created depends on the type of field
9
CA 02805150 2013-02-06
identified. For example, each TextInputField object or CheckboxField object is
created to correspond to an identified text input field or a checkbox field,
respectively. Similarly, a TextField object is created to correspond to an
identified
text field. PageElement objects of other subclasses may be created to
correspond
to other types of fields identified by parsing electronic representation 32,
such as
radio buttons, images, hyperlinks, etc.
[0044] Each PageElement object contains the parsed properties of the
corresponding form field, such as x and y coordinates defining the location of
the
field on a form page. Other parsed properties may include font properties such
as
typeface, size or weight. Yet other parsed properties may include layout
properties
such as indentation, justification, line spacing, kerning, etc.
[0045] Parsing module 38 creates an UnstructuredDocument object containing
one or more Page objects, wherein each Page object represents one page of the
user-fillable form represented by electronic representation 32, e.g., as
depicted in
FIG. 4. Each Page object is populated with the PageElement objects
corresponding
to those form fields in the form page represented by that Page object.
[0046] Grouping module 40 groups form fields, as identified by parsing
module
38. Form fields are grouped to correspond to structural regions in a user-
fillable
form. To this end, grouping module 40 creates a StructuredDocument object
corresponding to the UnstructuredDocument object created by parsing module 38.
The StructuredDocument object contains the PageElement objects of an
UnstructuredDocument object, in which PageElement objects are grouped.
[0047] Like the UnstructuredDocument object, the StructuredDocument object
contains Page objects. Each Page object in turn may contain Section objects
and/or ContentBlock objects. Each Section object corresponds to a section of a
form page such as the header, footer, body, left margin, right margin, etc.
Boundaries for each section (header, footer, body, margins, etc.) may be pre-
defined. Each Section object contains PageElement objects, corresponding to
form
field within the form section represented by that Section object. PageElement
objects may be populated into Section objects based on their location on a
page,
CA 02805150 2013-02-06
as identified for example based on parsed x and y coordinates for each
PageElement object.
[0048] Grouping module 40 groups PageElement objects into ContentBlock
objects. Grouping is performed using the parsed properties of the PageElement
objects, a set of Structural Rules and a business rule management system
(BRMS)
with a rules engine, such as JBoss Drools, ILOG JRules, FICO Blaze Advisor, or
the like. The set of Structural Rules may include some or all of the following
rules:
[0049] 1) Combine a line starting with a TextField with a TextInputField
beside
it.
[0050] 2) Combine a line starting with a TextField with a TextField beside
it.
[0051] 3) Combine a line starting with a CheckboxField with a TextField.
[0052] 4) Combine a ContentBlock containing a Checkbox with a ContentBlock
below containing only a TextInputField.
[0053] 5) Combine a ContentBlock with a TextField to the right.
[0054] 6) Combine a ContentBlock with a TextInputField to the right.
[0055] 7) Combine TextFields Far From Previous ContentBlocks.
[0056] 8) Combine a TextInputField with a PageElement to the right where a
ContentBlock to the left has Larger font.
[0057] 9) Combine a ContentBlock alone on a line with another ContentBlock
alone on a line below if both are only include TextField.
[0058] 10) Convert a TextField alone on a line into a ContentBlock.
[0059] 11) Convert a TextInputField alone on a line to ContentBlock.
[0060] By applying the above set of Structural Rules to the PageElement
objects in an UnstructuredDocument object, grouping module 40 groups
PageElement objects into respective ContentBlock objects of a
11
CA 02805150 2013-02-06
StructuredDocument object. Each ContentBlock object corresponds to a
structural
region of a user-fillable form. When the user-fillable form is a patient order
set, each
ContentBlock object may, for example, correspond to a Title, a Module, a Sub-
module, an Order, or a Sub-order.
[0061] The above grouping rules are exemplary only, and other rules
suitable
for grouping patient order sets and/or other types of user-fillable forms will
be
readily apparent to those of ordinary skill in the art. Such other grouping
rules may
be used in conjunction with or in place of some or all of the above rules.
Some
embodiments may use an entirely different set of grouping rules altogether. In
some
embodiments, different grouping rules may be applied to different sections of
a
form. In some embodiments, grouping rules may be chosen based on the type of
document being processed, the file format of the document, and/or the language
of
the document.
[0062] FIG. 5 depicts an excerpted Sub-order region of the example form
page
depicted in FIG. 4. As depicted, this region includes a checkbox input field,
and two
text fields: "Maintain" and "NPO". These fields may be respectively
represented by
a CheckboxField object and two TextField objects. These three objects may be
grouped by grouping module 40 into a single ContentBlock object by applying
the
above Structural Rules.
[0063] Optionally, grouping module 40 may store a representation of the
StructuredDocument object in memory 16. This representation may take the form
of an XML document. FIG. 6 depicts a portion of an example XML document
representative of a StructuredDocument object. The XML code depicted in FIG. 6
represents a ContentBlock object corresponding to the form region depicted in
FIG.
5. As depicted in FIG. 6, the ContentBlock object is represented by a
contentBlock
tag. This contentBlock tag encloses a checkboxField tag and two textField
tags,
which respectively correspond to the checkbox input field and text fields
shown in
FIG. 5.
[0064] As noted, database 30 stores data structures representative of
different
types of structural regions found in user-fillable forms. These types of
structural
12
CA 02805150 2013-02-06
regions may vary according to the type of user-fillable form. For patient
order sets,
for example, five types of structural regions may be defined: Title, Module,
Sub-
module, Order, and Sub-order. To facilitate processing of electronic
representations
32 of patient order sets, database 30 stores five data structures respectively
representative of each of these five region types. These data structures may
be
stored in the form of object classes, XML tags or the like.
[0065] Other region types found in other types of user-fillable forms will
be
apparent to those of ordinary skill in the art. Data structures representative
of these
other region types may also be stored in database 30.
[0066] Database 30 also includes data reflective of the hierarchical
relationship between region types. For example, a Sub-module is defined to be
a
sub-region of a Module, an Order is defined to be a sub-region of a Module or
a
Sub-module, a Sub-order is defined to be a sub-region of an Order, and so on.
[0067] Matching module 42 matches each ContentBlock object corresponding
to a structural region of a user-fillable form to the data structures stored
in database
30 that best represents that structural region. To this end, matching module
42
determines a quality of match between each ContentBlock object and each stored
data structure.
[0068] Each quality of match may be represented by a numerical score. In an
embodiment, each numerical match score has an initial value of zero, and is
increased or decreased by applying a set of Scoring Rules. To apply Scoring
Rules,
matching module 42 may use a business rule management system (BRMS) with a
rules engine, such as JBoss Drools, ILOG JRules, FICO Blaze Advisor, etc.
[0069] The set of Scoring Rules may vary based on document type. The
following Scoring Rules are exemplary rules that may be used to calculate
numerical match scores between ContentBlock objects and data structures
representative of structural regions found in patient order sets (viz. Title,
Module,
Sub-module, Order, Sub-order). FIG. 7 schematically illustrates matching of a
ContentBlock object to one of these data structures.
13
CA 02805150 2013-02-06
[0070] The numerical match score between a ContentBlock object and the data
structure representative of a Title is increased by one if the ContentBlock
object:
contains only letters;
is centered;
is bold;
is in the header section of the document;
has font size greater than 13; or
was previously matched to the data structure representative of a Title
either through automated scoring or manual user override.
[0071] The numerical match score between a ContentBlock object and the data
structure representative of a Module is increased by one if the ContentBlock
object:
contains only letters;
is centered;
is bold;
is in the body section of the document;
has font size of 12; or
was previously matched to the data structure representative of a
Module either through automated scoring or manual user override.
[0072] The numerical
match score between a ContentBlock object and the data
structure representative of a Sub-module is increased by one if the
ContentBlock
object:
contains only letters;
is left justified;
14
CA 02805150 2013-02-06
is bold;
in the body section of the document;
has font size of 11; or
was previously matched to the data structure representative of a Sub-
module either through automated scoring or manual user override.
[0073] The numerical match score between a ContentBlock object and the data
structure representative of an Order is increased by one if the ContentBlock
object:
contains a checkbox or text input field;
is left justified; or
was previously matched to the data structure representative of an
Order either through automated scoring or manual user override.
[0074] The numerical match score between a ContentBlock object and the data
structure representative of a Sub-order is increased by one if the
ContentBlock
object:
contains an input field such as a checkbox or text field;
is indented;
is preceded by a ContentBlock with a colon;
is preceded by a ContentBlock with an "OR"; or
was previously matched to the data structure representative of a Sub-
order either through automated scoring or manual user override.
[0075] As can be seen from the above rules, the numerical match score may be
calculated taking into account font properties, layout properties, the types
of fields
in the ContentBlock object, etc.
[0076] The above scoring
rules are exemplary only, and other rules suitable for
CA 02805150 2013-02-06
patient order sets and/or other types of user-fillable forms will be readily
apparent to
those of ordinary skill in the art. For example, although the above scoring
rules
increase the numerical match score between ContentBlock objects and stored
data
structures, some of such other scoring rules may decrease the numerical match
score. Such other scoring rules may be used in conjunction with or in place of
some
or all of the above rules. Some embodiments may use an entirely different set
of
scoring rules altogether. In some embodiments, different scoring rules may be
applied to different sections of a form. In some embodiments, scoring rules
may be
chosen based on the type of document being processed, the file format of the
document, and/or the language of the document.
[0077] In applying the above Scoring Rules, each ContentBlock object is
deemed to possess the characteristics of PageElement objects contained in that
ContentBlock object. For example, a ContentBlock object is deemed to be in the
body section of a document jilts constituent PageElement objects are in the
body
section (i.e., contained in a body Section object). Similarly, a ContentBlock
object is
deemed to be bold if text in its constituent PageElement objects is bold.
[0078] To determine whether a ContentBlock object was previously matched to
a particular data structure, matching module 42 generates a signature
identifying
the ContentBlock object. In an embodiment, the signature may be generated as a
text string representative of the contents of the ContentBlock object. For
example,
the signature for the ContentBlock object of FIG. 6 may be generated as "[CBX]
Maintain NPO". ContentBlocks objects with the same content may share the same
signature. As will be appreciated, ContentBlock objects sharing a signature
are
more likely to be matched to the same stored data structure.
[0079] Using this signature, matching module 42 searches through records of
past instances in which ContentBlock objects bearing the same signature have
been matched to a particular data structure. In some embodiments, such records
may be stored in database 30. The numerical match score for a ContentBlock
object and a particular data structure may be increased by one for each
recorded
instance found. For example, if a ContentBlock object having a signature of
"[CBX]
Maintain NPO" has previously been matched to the data structure representative
of
16
CA 02805150 2013-02-06
Sub-orders in five instances, the numerical match score for the ContentBlock
and
the data structure representative of Sub-orders may be increased
commensurately.
[0080] For each ContentBlock object, matching module 42 determines the
stored data structure that best represents the structural region corresponding
to
that ContentBlock object based on the calculated numerical match scores. For
example, matching module 42 may match each ContentBlock object to the stored
data structure having the highest numerical match score.
[0081] To store matching results, matching module 42 creates a
ScoredDocument object. This ScoredDocument object inherits the contents of the
StructuredDocument object created by grouping module 40, and additionally
includes an indicator of the data structure matched to each ContentBlock
object.
Each ScoredDocument object may also include the numerical matched scores for
each ContentBlock object.
[0082] Adjusting module 44 allows users to modify the grouping results
produced by grouping module 40 and the matching results produced by matching
module 42. To this end, adjusting module 44 includes users interfaces
configured
to allow users to view and/or modify ScoredDocument objects. FIGS. 8-11 depict
exemplary screens of user interfaces configured for this purpose. In some
embodiments, adjusting module 44 may present these user interfaces to users
directly operating computing device 10. In alternate embodiments, adjusting
module 44 may present these user interfaces to users operating network-
connected
computing devices, e.g., by way of HTTP server software 24.
[0083] In particular, FIG. 8 illustrates an example user interface
configured to
allow a user to modify the defined boundaries of form sections (e.g., header,
footer,
body, margins, etc). These boundaries govern which Section object to which
each
PageElement object belongs, which is taken into account when calculating
numerical match scores. As such, after section boundaries have been modified,
numerical match scores may be re-calculated. Optionally, the user-defined
section
boundaries may be stored for future use, e.g., by grouping module 40 when
populating Section objects with PageElement objects.
17
CA 02805150 2013-02-06
[0084] FIG. 9 illustrates an example user interface configured to contain a
tree
diagram showing ContentBlock objects. These ContentBlock objects correspond to
the structural regions of the form page shown in FIG. 4. As depicted, these
ContentBlock objects have been matched to data structures representative of
particular form regions. For example, the tree diagram includes a ContentBlock
object matched to a data structure representative of a Module (shown as Module
"Antibiotic Therapy"). The tree diagram also includes ContentBlock objects
respectively matched to data structures representative of two Sub-modules
(shown
as Sub-modules "For severely ill patient" and "For patient with suspected
gross
aspiration"). The illustrated tree diagram reflects the hierarchical structure
of the
form page shown in FIG. 4. Thus, for example, the two Sub-modules are shown to
be sub-regions of the Module, in accordance with the defined hierarchical
relationship between structural regions. The tree diagram also includes
ContentBlock objects respectively matched to data structures representative of
various Orders, and two Sub-orders shown as "Maintain NPO" and "SLP
assessment in a.m. to assess swallowing". These two Sub-orders are shown to be
sub-regions of their parent Order (shown as "metroN1DAZOLE 500 mg...").
[0085] The ContentBlock objects contained in the tree diagram of FIG. 9 may
be
selected by a user, e.g., by way of a mouse click. Upon selecting a particular
ContentBlock object, the example user interface shown in FIG. 10 may be
presented to the user. This user interface is configured to allow a
ContentBlock
object, corresponding to a particular structural region, to be manually
edited. As
depicted, the selected ContentBlock object may be merged with other
ContentBlock
objects. The selected ContentBlock object may also be split into its
constituent
PageElement objects. In this way, PageElement objects may be manually re-
grouped to correspond to manually-identified structural regions.
[0086] The user interface shown in FIG. 10 is also configured to allow the
matching result for a selected ContentBlock object to be manually overridden.
For
example, a ContentBlock object automatically matched by matching module 42 to
a
data structure representative of a Sub-order may be manually matched by way of
this user interface to a data structure representative of an Order.
18
CA 02805150 2013-02-06
[0087] Adjusting module 44 modifies the ScoredDocument object based on
modifications entered by users, e.g., by way of the above-described example
user
interfaces.
[0088] Adjusting module 44 may receive indicators from users that some or
all of
the grouping results or matching results are satisfactory.
[0089] Adjusting module 44 stores records of matches modified or confirmed by
users. Each match is stored in association with a signature identifying the
particular
ContentBlock object. This signature may be generated in the manner described
for
matching module 42. Records of matches may be stored in database 30. These
records of matches will be used by matching module 42 when processing future
forms. Thus, matches modified or confirmed by users affect future scoring of
ContentBlock objects. In this way, form processing software 26 learns from the
matching results to improve future scoring.
[0090] Generating module 46 generates structured electronic representations
34
of user-fillable forms using the ScoredDocument object, as created by matching
module 42 and as modified by adjusting module 44. Each structured electronic
representation 34 of a user-fillable form may include one or more electronic
documents. These electronic documents may, for example, be in XML, HTML,
JSON or PDF format. Other parsable formats known to those of ordinary skill in
the
art may also be used.
[0091] The structured electronic representation may include an instance of
each
data structure matched to the ContentBlock objects contained in the
ScoredDocument object. Each instance of one of these data structures may be
populated with data reflective of the contents of the matched ContentBlock
object.
Such data may, for example, include data describing text fields and input
fields, as
represented by PageElement objects contained in the ContentBlock object.
[0092] FIG. 11 depicts an example structured representation 34 of the form
page shown in FIG. 4. As depicted, this structural representation 34 is in XML
format. The XML code includes matched data structures representative of each
of
the form page's structural regions. Each data structure takes the form of an
XML
19
CA 02805150 2013-02-06
tag. For example, a data structure in the form of a module tag represents the
Module form region. Similarly, data structures in the form of sub-module tags
represent the Sub-module form regions. Data structures in the form of order
tags
represent form regions corresponding to Orders and Sub-orders. Of these, order
tags for Sub-orders are enclosed in a childOrders tag.
[0093] As depicted, the XML code also includes XML tags and tag properties
reflective of the contents of each structural region (e.g., textual content,
page -
coordinates, etc.). For each structural region, the XML code is generated by
traversing the PageElement objects contained in the ContentBlock object that
corresponds to that structural region.
[0094] In some embodiments, generating module 46 may store structural
representations 34 in memory 16. In some embodiments, generating module 46
may provide structural representations 34 to users operating network-
interconnected computing devices, e.g., by way of HTTP server software 24.
Optionally, generating module 46 may include user interfaces configured to
present
structural representations 34 to such users, and/or allow such users to
retrieve
copies of structural representations 34 from computing device 10.
[0095] Structural representations 34 of user-fillable forms may be
subsequently
parsed to easily determine both the form's content and structure. This
facilitates
ready modification and/or comparison of such forms.
[0096] The operation of form processing software 26 is further described
with
reference to the flowchart illustrated in FIG. 12.
[0097] As depicted in FIG. 12, form processing software 26 performs blocks
S1200 and onward at computing device 10. At block S1202, receiving module 36
of
form processing software 26 receives an electronic representation 32 of a user-
-
fillable form. Electronic representation 32 is a parsable version of that user-
fillable
form.
[0098] At block S1204, parsing module 38 of form processing software 26
parses electronic representation 32 to identify form fields. These form fields
include
CA 02805150 2013-02-06
both text fields and input fields. Parsed fields are each represented by a
PageElement object, which are stored in an UnstructuredDocument object.
[0099] Next, at block S1026, grouping module 40 of form processing software
26 groups form fields in the user-fillable to correspond to the form's
structural
regions. To this end, grouping module 40 groups PageElement objects
representative of these form fields into ContentBlock objects, with each
ContentBlock object representative of one of the form's structural regions.
Grouping
is performed by applying a set of Structural Rules using a rules engine.
[00100] PageElement objects are also divided into Section objects, each
corresponding to a section of a form page (e.g., header, footer, body,
margins,
etc.). ContentBlock objects and Section objects are stored in a
StructuredDocument
object Optionally, grouping module 40 may store an electronic representation
of
the StructuredDocument object, as depicted for example in FIG. 6.
[00101] At block S1208, matching module 42 matches each form region (as
represented by a ContentBlock object) to the stored data structure that best
represents that form region. Matching is performed by calculating numerical
match
scores for each prospective match between a form region and one of the stored
data structures. These numerical match scores are calculated by applying a set
of
Scoring Rules using a rules engine. A ScoredDocument object is created, which
includes the contents of the StructuredDocument object and also the matching
results, i.e., the stored data structure matched to each of the form regions.
[00102] Next, at block S1210, a determination is made whether grouping
results obtained at block S1206 or matching results obtained at block S1210
should
be manually reviewed and/or adjusted. This determination may be made by
prompting a user, e.g., by presenting a prompt asking whether
review/adjustment is
required. Alternatively, this determination may be made according to pre-
defined
parameters. In some embodiments, manual review/adjustment is always required
or is never required, and block S1210 may be omitted.
[00103] If manual review/adjustment is required, form processing software
26
performs block S1212. Otherwise, block S1212 is skipped and block S1214 is
21
CA 02805150 2013-02-06
performed. At block S1212, adjusting module 44 of form processing software 26
presents user interfaces configured to allow the user to review and modify
grouping
results and matching results. Exemplary user interfaces are depicted in FIGS.
8-10.
Adjusting module 44 may receive adjustments from the user to the grouping
results
and/or the matching results. Adjusting module 44 may receive confirmation from
the
user that the grouping results and/or the matching results are satisfactory.
The
ScoredDocument object is updated based on any adjustments received from the
user.
[00104] At block S1214, generating module 46 of form processing software 26
uses the ScoredDocument object to generate structured electronic
representation
34 of the user-fillable form, as depicted for example in FIG. 11. Finally,
generating
module 46 may provide a copy of structured electronic representation 34 to the
user.
[00105] Of course, the above described embodiments are intended to be
illustrative only and in no way limiting. The described embodiments are
susceptible
to many modifications of form, arrangement of parts, details and order of
operation.
For example, software (or components thereof) described at computing device 10
may be hosted at several devices. Software implemented in the modules
described
above could be using more or fewer modules. The invention is intended to
encompass all such modification within its scope, as defined by the claims.
22