Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
METHOD AND APPARATUS FOR DYNAMICALLY DISPLAYING A SET OF DOCUMENTS ORGANIZED
BY A
HIERARCHY OF INDEXING CONCEPTS
FIELD AND BACKGROUND OF THE INVENTION
The amount of textual information that is available in computerized media
has increased dramatically in recent years. As a result. there is an
increasing need
i « for end users to have effective tools for searching, browsing, navigating,
reading
alld analyzing collections of textual documents. Current common practice.
within
organizations as well as in the Internet, is having a search engine that
indexes a
large repositon~ of documents and enables users to issue a search query and to
get
in response all documents that satisfy the search conditions. Usually. a list
of
i; titles. along with some additional information. is presented for each
document
and the user can further asl: for the display of specific documents ti-om the
list.
The list of documents is often sorted by some relevance ranking. which is
intended to approximate the degree of relevance of the document to the query.
Sorting by date is also often available.
A search mechanism typically attaches to each document a set of indexing
concepts. An indexing concept is a symbol or value that characterizes the
document. and is typically used within search queries or within routing
queries
("queries" that specify which documents will be routed to an addressee).
Typical
types of indexing concepts include (but are not limited to):
I. Topical categories (also known as controlled keywords. topics.
descriptors etc.). These are symbols denoting topical issues. which are
usually general or abstract concepts that do not necessarily appear
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-2-
literally in the text. For example, a topical category may be "Company
Acquisition". This term, serving as the name of the category, may not
appear literally in a document that describes such an event.
2. Important terms and names of entities (such as countries, companies,
products and people) which appear or are referred to in the text (as is
or by synonyms).
3. Document meta-data items, such as document source, type, author and
date.
In the following, a document is considered indexed by the indexing
concepts characterizing it. Apart from being used in ad-hoc search queries,
indexing concepts may also be used to determine routine routing of incoming
documents to addressees.
The process of associating indexing concepts to documents (the indexifzg
process) is performed either manually, automatically, or by some combination
of
the two modes. With respect to indexing concepts that consist of terms and
names
from the document text, the indexing process usually involves scanning the
text
of the document, identifying words, terms and names, and possibly bringing
these
terms to some canonical form (e.g. the grammatical base form (lemma) of the
word). Meta-data indexing concepts are often determined by the systems, in
2o which the document is created or received, but may also be handled
manually.
Of particular interest to the invention is the indexing process for topical
categories (catego~°ies. in short). In many systems, it is possible for
the user to
manually assign topical categories to a document. More recently, there have
been
developed a number of methods for assigning topical categories to documents
2s automatically, which are referred to here as automatic text classification
methods.
Such methods classify documents to appropriate categories taken from a
predetemnined list of possible categories. Classification is performed by some
mechanism that receives the document text as input and determines the
appropriate categories based on the words. terms or their combinations that
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-3-
appear in the document.
There are two common approaches for automatic text classification
methods. The first approach is based on manual definition of the rules, or
some
other type of logic. by which a document is being classified to a category
based
on the terms in the text. For example, some systems allow users (or
administrators) to define complex queries, which may include Boolean and other
types of conditions (such as weights and proximity) that the terms in the
document should satisf~~. A document that satisfies these conditions is
classified
to the category. An example for such a system is the Topics TM system that was
io developed by Verity Inc., USA.
The second approach is based on automatic learning of the "logic" which
entails the classification of the document to a category. Methods belonging to
this
approach utilize a set of t~°ainiszg documents, for which the correct
categories are
1C110WI1 111 adVallCe (usually as the result of manual classification of these
i S documents). A learning method may then include a learning phase, in which
some model of the category is constructed. For example, such a model may
include terms that are highly associated with the category, and possibly some
weights that quantify the degree of correlation (entailment) between each term
and the category. Alternatively, a learning method may be nZemouy based, in
?o which case the learning method simply stores the training data in some
useful
format. Then. when a new document is given for classification, the method
classifies it automatically by consulting or applying the category model (or
by
simply comparing the document to the training data, in case of a memory based
approach). Examples for trainable (learning) classification systems are
described
?5 m:
1. C. Apte and F. Damerau and S. Weiss, 1994. Towards language
independent automated learning of text categorization models, in
Proceedings of ACM-SIGIR Conference on Information Retrieval.
JO
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-4-
2. W.W. Cohen, Text categorization and relational learning, in Machine
Learning Journal, 1995, pages 124-132.
3. W. W. Cohen and Y. Singer, Context-sensitive learning methods for text
categorization, in Proceedings of the 19th Annual Int. ACM Conference
on Research and Development in Information Retrieval, 1996, pages
307315.
4. D. Lewis, 1992. An evaluation of phrasal and clustered representations on
a text categorization problem, in Proc. of the 15th Int. ACM-SIGIR
Conference on Information Retrieval, pages 37-50.
5. D. Lewis and M. Ringuette, 1994, A comparison of two learning
a1g01'1th111S for text categorization, in Proc. of Symposium on Document
t s Analysis and Information Retrieval, pages 81-93.
6. D. Lewis and R. E. Schapire and J. P. Callan and R. Paplca, 1996, Training
algorithms for linear text classifiers, in SIGIR '96: Proc. of the 19th Int.
Conference on Research and Development in Information Retrieval.
7. K. Tzeras and S. Hartmann, 1993, Automatic Indexing Based on Bayesian
Inference Networks, in Proc. of 16th Int. ACM SIGIR Conference on
Research and Development in Information Retrieval, pages22-34.
?5 8. E. Wiener and J. Pedersen and A. Weigend, 1995, A neural network
approach to topic spotting, in Symposium on Document Analysis and
Information Retrieval, pages 317-332.
Once documents have been obtained by a user, as a result of some search
or some routing mechanism, these documents are typically displayed in one of
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-S-
several formats. A common method for display is to present a list of items,
each
providing some high level information about a document, such as the document
title, meta-data items (such as author, source or date) and possibly a short
summary. The list may be sorted by document publication date or by some
relevance score, which quantifies the degree of relevance of the document to
the
user's query, as hypothesized by the search system. Another display method is
a
hierarchical display, in which documents are organized in a hierarchical
structure,
similar to a graphical user interface displaying a hierarchical file system.
U.S. patent 5,924,090 (Krellenstein) "Method and Apparatus for
io Searching a Database of Records'' discloses system for searching a database
and
present to the user a small number of categories along with a list of most
relevant
records that satisfy a query. The methodology of the Krellenstein patent has a
sophisticated clustering algorithm that includes three primary steps:
identifying
candidate categories, weighting candidate categories and displaying a set of
search result categories selected from the candidate categories. A typical
result
of the system according to the Krellenstein patent is illustrated in Fig. l,
as
extracted from the www.northernlight.com site.
Thus, as shown the query text categorization (1) results in 19,215
documents (records) (2) (of which 6 are shown in the first page). The
documents
2o are assigned to 15 categories (3). The set of categories are determined
after
applying the specified sophisticated clustering including identifying
candidate
categor ies, weighting candidate categories and displaying a set of search
result
categories selected from the candidate categories. In accordance with the
specified system, the user can repeat this process further narrowing the
search
?s with each iteration. Thus, double clicking the category Special collection
documents (4) will result in applying the specified steps again giving rise to
the
search results illustrate in Fig. 2. As shown there are 2057 records (5) in
the
sought category (6) that, in turn are assigned to 12 categories (7). As
readily
arises from the search results depicted in Fig. 2, the resulting categories
are
;o determined dynamically and, accordingly, each search is likely to give rise
to
i.r ~ . w~-rv. Vii I7 V1 iV..~'v ilv.VvL llli IL
19-04-2001 ~ ~ - PCT/IL00/00'11~7 DESCPAMD ~'
CA 02371244 2001-08-23
6
different set of. categories. This approach has a significant shortcoming i.n
tb.at
every time there is a different list of categories, so the user depends on
"luck" on
whether the categories of interest are included in the list or not. In
addition, there
is no fixed structure that the user knows and can expect, in order to look for
the
categories that are of interest to him.
According to Ching-Chi I~su et al., in "Constructing Personal Digital
Library by Multi-Search. and Customized Category (Proceedings Tenth IEEE Intl.
Conf on Tools with Artificial Jntelligence (Cat. No. 98CH36294, Proceedings of
10~' int. Conf. on Tools with Artificial Intelligence (ICTA'98), Taipei,
Taiwan.,
ro IO-12 Nov. 1998, pages 148-155, XP002141059 1998, Piscataway, NJ, USA,
IEEE, USA ISBN: 0-?803-5214-9), the current search tools for retrieving
information on WWW are not suitable for building customized information
repository because these search tools are designed for general users with the
result of only an unstructured collection of documents. Ching-Chi Hsu et al.
provide a personal digital library capable of efficiently retrieving
information on
tl~e World Wide Web, which adopts several new strategies to overcome the
shortcomings of current tools. The first strategy, Classification, merges and
organizes the retrieved documents to put them in a structural, hierarchical
frame.
The second strategy, User Profile, saves time and bandwidth for the access of
the
2o documents anal pezrn.its the users to build their own customized category
str. uch~re. The third strategy, Multi-Search, capitalizes on the power of
multiple
search engines to broaden the domains of information sources and alleviate the
overloading of a single search engi~nc. Furthernaiore, they derive in detail
the
techniques for speeding up the iterative process of clustering.
2s Several systems anal method provide a summarization mechanism, which
produces automatically a summary for a document. The stncnmary is produced
based on various rulES or other criteria that evaluate the degree of
importance of.
different parts of the document. 'fhe suznxnary is typically constructed as
an.
extract of important sentences or paragraphs taken from the document. For
3o example, systems that offer summaries include the LinguistX software
package
Printed:24-04-2001 -AMMO $~
~hAOrnunn~rrr ,n inn ,c cn .nr,nnmvn,rrT ~~ .nn .r rr
7 I L 5 l l Vly4 f i iJ~+/ f y ti I f C7 : .5r5 IvU .1IVG 1 I / I G I
.~'q ~~;~~0'.I:. ~ - - P'CT/IL:ODlOaI'17 DESCFAMD Y
CA 02371244 2001-08-23
6a ,
from. In,Xight Inc., USA, tb.e "AutoSumrnarize" option in Word, available from
Microsoft Inc., USA.
When displaying the full text of a document, many search systems
b.ighlight the search words that were matched in the document text.
The current common practice for utilizing textual information does not
satisfy sufficiently the increasing need of individuals and organizations. ,.
Searching infoxmatioa in large repositories is often a very tedious process,
preventing effective utilization of information that is potentially available
to the
user. In particular, searches made with current techniques in large
repositori.cs
to often retrieve large document sets, making it extremely difficult and often
xznpractical for the user to browse and sift through the retrieved documents
and
extract the relevant knowledge hidden in the vast amount of. information. The
bott).cncck in information quest processes thus becomes the amount of time
necessary for users to satisfy their infomnation. needs, as current processes
require
too much of the user's time.
There is accordingly a need in the art to provide for a system and method
that substantially reduces or overcomes the drawbacks of hitherto known
techniques, and for increasing the effectiveness of user effort in.
i.aforznation
quest processes.
AMh~DrE~ S~bT
P:rinted:24-04-20D1 ' ~, 2
........ .... rl,nrmnn-rrm .n .nn .r r.. .mnnnmnvn~rr~r m .nn nr rr ,..
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
SUMMARY OF THE INVENTION
The invention provides for a method for dynamically presenting set of
documents to users , comprising:
(a) providing a predetermined hierarchy of indexing
S concepts;
(b) providing a set of documents;
(c) providing hierarchical display of the indexing
concepts such that each indexing concept in the
hierarchy is associated with a sub-set of
i o documents from among said set of documents;
(d) applying steps that include the following (i) to
(iii) . as many times as required:
(i) determining a subset of documents by
utilizing the hierarchical display, thereby
~ s rendering it organized document subset;
(ii) defining at least one indexing concept in
the hierarchical display so as to constitute a
respective organizing concept; and
(iii) providing at least one organizing
2o hierarchical display of indexing concepts,
wherein the root of said at least one organizing
hierarchical display being the respective at
least one organizing concept; each concept in
said at least one organizing hierarchical
2s display is associated with a respective subset
of documents, from among said organized
document subset.
The invention further provides for a method for presenting set of documents
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
_g_
to users comprising:
(a) providing indexing concepts and a set of documents ; the set of
documents are associated with concepts in accordance with triggering
terms in said documents;
s (b) selecting a document from said set;
(c) selecting at least one concept associated with said document;
(d) quantifying the importance of the triggering terms of said at least one
concept and in response thereto, determining the important triggering
terms; and
i o (e) emphasizing the important triggering terms that correspond to said at
least one concept.
The invention further provides for a method for presenting set of documents
to users comprising:
(a) providing indexing concepts and a set of documents ; the set of
~ s documents are associated with concepts in accordance with
triggering terms in said documents;
(b) selecting a document from said set;
(c) selecting at least one concept associated with said document;
(d) quantifying the importance of the triggering terms of said at
20 least one concept and in response thereto. determining the
important triggering terms; and
(e) obtaining a summary based on said important triggering teens.
Still further the invention provides for a method for dynamically
presenting set of documents to users, comprising:
2s (a) providing a predetermined hierarchy of indexing concepts;
(b) providing a set of documents;
(c) providing hierarchical display of the indexing concepts; the
indexing concepts are associated with the set of documents
(d) determining a subset of documents by utilizing the
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-9-
hierarchical display, thereby rendering it organized document
subset;
(e) defining at least one indexing concept in the hierarchical
display so as to constitute a respective "organizing" concept;
s and
(f~ providing at least one organizing hierarchical display of
indexing concepts, wherein the root of said at least one
organizing hierarchical display being the respective at least
one organizing concept, wherein concepts in said organizing
hierarchical display are associated with the organized
document subset;
(g) repeating steps (d) to (f~, as many times as required.
Still further, the invention provides for a system that includes a processor
associated with a memory and display for dynamically presenting set of
documents to users , comprising:
(a) the memory is configured to store of a predetermined hierarchy of
indexing concepts;
(b) the memory is configured for store a set of documents
?o (c) the processor is configured to provide hierarchical display of the
indexing concepts such that each indexing concept in the hierarchy is
associated with a sub-set of documents from among said set of
documents;
(d) the processor is configured to apply steps that include the
2s following (i) to (iii) , as many times as required:
(i) determining a subset of documents by utilizing the hierarchy
of display, thereby rendering it organized document subset;
(ii) defining at least one indexing concept in the hierarchy of
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-l~-
display so as to constitute a respective "organizing" concept;
and
(iii) providing at least one organizing hierarchical display of
indexing concepts, wherein the root of said at least one
s organizing hierarchical display being the respective at least one
organizing concept; each concept in said at least one organizing
hierarchical display is associated with a respective subset of
documents, from among said organized document subset.
Yet further, the invention provides for a system that includes a
~ o processor associated with a memory and display for presenting a set of
documents to users comprising:
(a) the memory is configured to store indexing concepts and a set of
documents ; the set of documents are associated with concepts in
accordance with triggering terms in said documents;
~ s (b) the processor is configured to select a document from said set;
(c) the processor is configured to select at least one concept associated
with said document;
(d) the processor is configured to quantify the importance of the triggering
terms of said at least one concept and in response thereto, determining
2o the important triggering terms; and
(e) the processor is configured to emphasize in the display the important
triggering terms that correspond to said at least one concept.
The invention provides for a system that includes a processor associated
2s with a memory and display for presenting set of documents to users
comprising:
(a) the memory is configured to store indexing concepts and a set of
documents ; the set of documents are associated with concepts in
accordance with triggering terms in said documents;
(b) the processor is configured to select a document from said set;
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-11-
(c) the processor is configured to select at least one concept associated
with said document;
(d) the processor is configured to quantify the importance of the
triggering terms of said at least one concept and in response thereto,
s determining the important triggering terms; and
(e) the processor is configured to obtain a summary in said display
based on said important triggering teens.
Still further, the invention provides for a system that includes a processor
associated with a memory and display for dynamically presenting set of
documents to users. comprising:
(a) the memory is configured to store a predetermined hierarchy
of indexing concepts;
(b) the memory is configured to store a set of documents;
t s (c) the processor is configured to provide hierarchical display of
the indexing concepts; the indexing concepts are associated
with the set of documents;
the processor is configured to apply the following steps (d) to (f) as many
times
as required.
20 (d) determining a subset of documents by utilizing the
hierarchical display, thereby rendering it organized document
subset;
(e) defining at least one indexing concept in the hierarchical
display so as to constitute a respective "organizing" concept;
2s and
(f~ providing at least one organizing hierarchical display of
indexing concepts, wherein the root of said at least one
organizing hierarchical display being the respective at least
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-12-
one organizing concept, wherein concepts in said organizing
hierarchical display are associated with the organized
document subset;
BRIEF DESCRIPTION OF THE DRAWINGS
For better understanding, the invention will now be described by way of
examples only, with reference to the accompanying drawings in which:
Figure 1 - illustrates a screen result of a database search system in
accordance
with the prior art;
Figure 2 - illustrates a screen result of a database search system in
accordance
with the prior art;
Figure 3 - illustrates a generalized computer system.
Figure 4 - illustrates a flowchart of the preferred embodiment of the
invention.
Figure 5 -illustrates a top pane the concept hierarchical display, Left Top
pane
tree representation of hierarchical document set display, Bottom document list
of
a document subset.
Figure 6 - illustrates a left Top pane pie: representation of hierarchical
document
set display.
Figure 7 - illustrates a left Top pane: pie representation of hierarchical
document
2o set display.
Figure 8 - illustrates an overlapping window - Top pane: document important
terms Bottom pane: document full text and terms highlighting.
Figure 9 - illustrates a left Top pane: a document subset that have been
"organized by". Right Top pane: the topics that have performed the
25 "organization".
Figure 10 - illustrates an overlapping window - Top pane: document important
terms. Bottom pane: document summary and terms highlighting.
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-13-
Figure 11 - illustrates a left Top pane: tree representation of hierarchical
document set display.
Figure 12 - illustrates a left Top pane: tree representation of hierarchical
document set display Overlapping window - Top pane: automatic important terms
selection. Bottom pane: document text and automatic selected terms
highlighting.
Figure 13 - illustrates a left Top pane: a document subset that have been
"organized by" twice. Right Top pane: the topics that have performed the
second
"organization"; and,
Figures 14 to 21 illustrate a succession of screen results obtained by
applying the
method in accordance with one embodiment of the invention.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
It should be noted that in the context of the invention, the terms concept
~ s and category are used interchangeably. In connection with some embodiments
the
term node signifies concept or category.
The invention provides novel methods for utilizing textual information
that considerably increase the effectiveness of the end user when dealing with
large volumes of documents. A typical embodiment of the invention is used in a
2o computer system, as illustrated in e.g. in Fig. 3. The computer system (30)
includes a processor unit (31) with input and output (32 and 33) and
associated
display (32) and memory (not shown). The computer system (30) is configured to
display documents and information about them in order to fulfill some
information needs of end users (referred in the following as "system"). The
?5 invention is, of course, not bound by any specific realization of computer
system
and may include any known structure such as conventional Personal Computer
(P.C.) in either stand-alone or network configuration, all as required and
appropriate.
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
- I4-
Fig. 4 provides a high-level flow chart of a typical embodiment of the
invention within some computer system (the details of the components of the
invention are described below). The system presents a document set (41) in a
hierarchical display (42). The structure of the display may be modified
dynamically by an ''organize by" operation (43) maintaining, however, a
predetermined structure of the hierarchy. The user may select a node (standing
for
indexing concept) (44) within the hierarchy, and ask for a display of
information
about the documents that are associated with the selected node (45). The
displayed information may include one or more of the following the number of
sub-set of the documents that are associated with the specified indexing
concept,
the percentage thereof from among the entire document set, the document title,
meta-data elements (such as source and date) and optionally a short summary of
the document. The information is of course not limited to the specified
details
and may vary, depending upon the particular application.
The user may then select a particular document (16) for display, leading to
the display of the full document text or of a summary of the document. The
content of the summary, as well as highlighting within the text, are
determined
automatically by some indexing concepts, that are determined by default to be
in
focus of attention of the user. The user may then select different indexing
2o concepts to be in focus, leading to modified highlighting and summary.
The rest of the section describes the details of the preferred embodiment of
the invention.
Setti~rg and Input
25 Document set
In accordance with the invention, there is provided a method and system for
presenting document sets and their content to the user of a system in an
effective
manner. The invention thus refers to any situation in which some document set
has to be presented by the system, at any point of time, for purposes such as
;o exploration. scanning, reading or analysis. The term document should be
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-15-
construed in a broad manner to encompass any record in a database including,
but
not limited to, a text and or text/image document. The displayed document set
may be e.g. the output of a search query that is applied to a search engine
(e.g.
Alta vista ~M), or an entire document collection indexed by the system, or any
s other document set that is provided as an input for displaying to the user
in
accordance with the invention.
Indexing concepts
i o The documents in the presented document set are characterized by indexing
coJ~cepts, as described above. That is, a typical document is characterized by
several indexing concepts that are logically associated thereto. A document is
considered indexed by the indexing concepts characterizing it.
i s Concept hierarchy
The possible indexing concepts for documents in the system are arranged
111 a predetermined Iziera~°clzy of indexing concepts
(hiei°a~~clzy in short), as
illustrated e.g. in Fig. 5 (31). That is, a pareizt concept (which is an
indexing
concept by itself) is defined for each indexing concept. For example, in Fig.
5
20 (33) "Countries"" is the parent of (34) "Latin America". One or several
concepts
that are defined as roots of the hierarchy may not have a parent node. For
example, in Fig. 3 (32) "All" is the root. Usually, each concept in the
hierarchy
has only one parent giving the hierarchy the form of a tree data structure (or
several trees in case of several roots). The described functionality can
2s accommodate also situations where some nodes have more then one parent. The
terms concept and node are used interchangeably to denote an indexing concept
within the hierarchy. Those versed in the art will readily appreciate that the
structure of the indexing concept hierarchy is substantially predetermined.
Those versed in the art will readily appreciate that the predetermined
;o structure does not necessarily mean that the indexing categories may not be
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
- 16-
subject to modification.
For example, the hierarchy may include an indexing concept
"Companies", such that some of its specific daughters are not predetermined.
The
system may include a mechanism to recognize dynamically that a new name
appearing in a document is a company, and define that name as an indexing
concept for the document which is a daughter of the node "Companies".
By another embodiment, notwithstanding the predetermined nature of the
hierarchy, the system includes a filtering mechanism which in response to
filtering criterion decides whether an indexing concept is displayed, or not,
in the
io hierarchy. For example, the filtering criterion may filter out concepts
associated
with a small number of documents, below a certain threshold, or concepts that
are
associated only with documents whose score for a search query, whose results
list
constitutes the document set to be displayed, is low.
The CoiZCept Hierarchical Display
In an embodiment of the invention, a system displays the concept
hierarchy (in a hierarchical display) by any visualization mechanism that is
suitable for displaying a hierarchical structure. The most typical display
form for
2o a hierarchy is a tree display, as in Fig. 5 (37), in which each node of the
tree
corresponds to one concept in the. Clicking on a node (or on a special sign,
such
as '+', that is attached to the node) leads to displaying or hiding its
daughters.
Other hierarchical display mechanisms may show one level of siblings in the
hierarchy at a time, by showing a list of elements, each represented by some
2s symbol or icon, where clicking on an element leads to displaying its
siblings,
while some other option enables getting back (up) in the hierarchy (for
example,
the "My Computer" icon in the Windows-98/NT system available from Microsoft
Inc, USA). For the purpose of the invention, any hierarchical display
mechanism
can be used to display the hierarchy of indexing concept, where user
interaction
;o with the display mechanism controls the display of different portions of
the
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
- l~ -
hierarchy. Another non-limiting example of hierarchical display is a chart,
e.g. a
pie chart.
Hienar~clzical Document Set Display
This subsection defines a hierarchical display of a presented document set
(containing documents indexed by indexing concepts). The hierarchical display
serves as a ''table of contents" for the document set, which facilitates
navigating
and browsing of document sets. The scheme of a hierarchical document set
display is available in previous systems, but the invention includes some
specific
enhancements to this scheme, as noted below.
The hiera~clzical docmnent set display is based on the concept hierarchical
display, and can be realized by any mechanism for displaying hierarchies, just
like the concept hierarchical display discussed above. For example, in Fig. 5
(37)
is a hierarchical document set display in tree form. In addition to the
predetermined hierarchy of concepts (as explained above) a set of documents,
which is a subset of the currently presented document set, is associated with
each
concept (node) in the hierarchy. In Fig. 5, a set of documents is associated
with
the node (39) "Countries"'. The associated document set for a concept in the
zp hierarchy (the document set of the node) contains all documents that are
indexed
by that concept. In certain embodiments of the invention, the associated
document set for a concept is defined to include all documents associated with
by any of its decedents in the hierarchy. For example, the document set of the
concept "Countries"" includes all documents indexed by any country or
?s geographical region, assuming that these concepts are all descendents of
the
concept "Countries" in the hierarchical display.
It is simple to compute the document set that is associated with a given
node in the hierarchical display. As a non-limiting example, such computation
may scan all documents in the displayed document set and check for each of
them
;p if it is associated with the given concept.
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
- 1g-
A hierarchical document display thus includes a display of the concept
hierarchy
(as described above), augmented with some information at each concept node
about the document set associated with that node. The information about the
associated document set may include, by one embodiment, one or more of the
s following items:
1. The number of documents in the associated set. In Fig. 5 (40) there are
13 documents in the set associated with "Latin America".
2. The percentage (proportion) of documents associated with the concept
relative to the number of documents associated with its parent in the
hierarchy. In Fig. 3 (41 ) 7% of the documents in the set associated with
"Latin America" relate to "Argentina" (note that a 0% number
represents a small positive percentage that was rounded to 0).
3. Some key information about prominent topics described within
documents of the document set, such as most frequent or prominent
t s lcey terms within the documents of the set, and/or the list of all or some
of the indexing concepts for the documents.
It should be noted that the nature and form of presenting the specified
types of information (by this particular example number of documents,
2o percentage and prominent topics) is only an example and accordingly other
types
of information may be presented in addition or instead the specified items.
Likewise, and as will be explained in greater detail below the concepts and
their
associated information is not limited to a specific form of graphical and or
textual
representation.
2s Reverting now to the specified types, these or other types of information
may be presented either textually or graphically. In Fig. 5 (37) is a tree
display of
the hierarchy with associated information about the document set of each node,
containing number of documents and percentage relative to the parent node
document set. In particular, since the hierarchical display of indexing
concept
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-19-
may include numerical data, such as numbers and proportions, mechanisms for
displaying quantitative information may be used for the display. For example,
a
pie (or bar) chart can be used to display several sibling nodes (daughters of
a
common parent). In Fig. 6 (44) is a pie representing the daughter nodes of
s "Countries". Each pie slice corresponds to one concept and its size
indicates the
proportion of its associated document set relative to the parent node document
set. The quantitative graphical display mechanism may be interactive, in a
similar
manner to interactive tree presentation of the concept hierarchy. For example,
double clicking on a pie slice may lead to displaying the pie of the daughters
of
i o the selected node. For example, double clicking on the slice in Fig. 4
(45),
corresponding to "Latin America", leads to the display in Fig. 7 (47), a pie
presenting the daughters of "Latin America".
The displayed daughters of a node may be sorted alphabetically, or by
some characterizing quantitative information, in particular by the size of the
t s associated document set for each daughter.
In accordance with the invention, different display mechanisms are
provided. According to the invention, several different display mechanisms may
be used interchangeably within a system for the hierarchical document set
display, letting the user switch from one to another while maintaining the
position
2o within the hierarchy. For example, a system may combine both a pie chart
display
and a tree display. When viewing the tree display with a certain node
selected,
and switching to the pie chart display, the system will present the pie that
corresponds to the daughters of the selected node.
The graphical display may present further information about the
2s documents in the associated document set, such as their titles, meta-data
elements, document summaries or the full text of the document. For example,
Pig. 5 (42) is the list of titles for the documents associated with the node
"Latin
America''. Fig. 10 (48) is a summary of a selected document in the document
list.
A display of the full text of a document is presented in Fig. 8 (52). (54) is
a list of
;o indexing concepts for the document.
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-20-
Optionally, concepts in the hierarchical display to which no documents are
attached may be omitted from the display. For example, in Fig. 5 no documents
are associated with the indexing concept (36) "Bahamas" in the concept
hierarchy, thus in the hierarchical indexing concept display, this concept
does not
appear as a daughter of (40) "Latin America".
In a more generalized embodiment, the concepts in the hierarchical display
are being subject to filtering criterion in order to determine whether or not
they
will be displayed in said hierarchy. A typical, yet not exclusive, example of
filtering criterion concern which folders in deeper levels of the hierarchy
tree will
be displayed. The necessity of this criterion stems from the fact that the
display
area allocated to the hierarchy in the display screen may not be sufficient to
accommodate the entire hierarchy, and accordingly only portion thereof is
displayed, e.g. few levels, and only in response to user selection further
levels are
displayed (instead of the previously higher levels). For example: if the top
level
and only some of its daughters are shown, with say a "... ' symbol indicating
that
there are more daughters, that can be displayed if the user explicitly opens
the
parent node- (as, say in AltaVistaTn~). More advanced filtering criterion may
rank
folders (standing by this embodiment for nodes) to be presented according to,
say the number of documents in it and the quality of their match to the
current
20 "query~~ (query means the entire sequence of operations that led to the
display of
the current results). Thus, folders having high rank may be displayed in the
limited display zone instead of other folders having lower rank,
notwithstanding
the fact that the higher ranked folders reside in a lower level in the
hierarchy as
compared to the lower ranked folders. Obviously, the user can display the rest
of
?5 the folders (which are currently not displayed due to their low rank) by,
say,
clicking the specified "..." symbol.
According to an embodiment of the invention, an "Others" node is added
to each list of siblings having a common parent. By this embodiment, the
documents associated with the "Others" node are those associated with the
parent
~o node but not with any of its daughters in the concept hierarchy. For
example, an
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-21 -
"Others" node that is a daughter of the node "Europe" will be associated with
all
documents indexed by "Europe" but not by any particular European country.
The hierarchical indexing concept display may be restricted to a particular
sub-part of the hierarchy, determined by some mechanism, rather then
presenting
the full hierarchy. For example, it is possible to present the hierarchical
indexing
concept display using only the "Countries" sub-tree of the hierarchy. This
non-limiting modification also falls in the definition of predetermined
hierarchical indexing concept display.
Dynamic Hierarchical Document Indexing Concept Set Display
The hierarchical indexing concept set display serves as a "table of contents"
for the document set and can be used as a method for displaying document sets
to
~ 5 the user. However, the hierarchical indexing concept set display is
limited
because it has a static structure, which is equivalent to the structure of the
concept hierarchy. For example, when presenting a large document set by the
hierarchical indexing concept display, one of the leaves of the tree may be
the
country "France", as in Fig. 11 (55), containing 45 documents. No further
20 organization is given for these 45 documents, since "France" is a leaf in
the
concept hierarchy. This section defines a novel mechanism provided by the
invention for presenting dynamic "tables of contents" displays for document
sets,
enabling the user to dynamically modify and refine the document display whilst
maintaining the predetermined hierarchical indexing concept display. This
25 mechanism is called the dynamic Izierarchical document set display (dynamic
display). The dynamic display is by itself hierarchical utilizing the
specified
predetermined hierarchy of categories, and thus provides all the functionality
of
the hierarchical document set display, as described above..
.;o In accordance with one embodiment, at the initial stage of the dynamic
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00100117
-22-
display, a document set is presented in some manner, possibly by the (static)
hierarchical indexing concept display. The dynamic display is created by a
series
of "o~°ganize by" operations, each specified by two definitions:
1. Defining a document subset (or set), to be organized (constituting
"organized" document subset) by the "organize by" operation. For a
hierarchical presentation, selecting the document set may, preferably,
correspond to selecting a node in the hierarchical document set
display. For example, selecting the node "France" in Fig. 11 (55)
to defines the document set associated with this node as the subset to be
organized. This subset is termed the organized document subset.
When the selected subset corresponds to a node in the display, that
node is termed the organized node. The selection of the "organized"
document subset is performed on the basis of information displayed
in the hierarchy, e.g. defining an indexing concept in the hierarchy as
an organized by concept and rendering the documents associated
therewith as the specified "organized" document subset.
2. Defining a node of the concept hierarchy to serve as the root of
the sub-tree by which the document subset will be organized. This
2o node (or corresponding sub-tree) is termed the organizing node
(sub-tree). For example, the node "Companies" may be selected as
an organizing node (57 in Fig. 9), to organize the document subset
associated with the node "France".
2s The effect of applying the "organize by" operation is to provide an
"organizing" hierarchical indexing concept display (as defined above) for the
organized document subset, which is restricted to the sub-hierarchy under the
organizing node. In the above example, the documents associated with the node
''France" will be displayed in a hierarchical indexing concept display that is
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
- 23 -
restricted to the sub-tree of the concept hierarchy rooted by the node
"Companies" (having all companies as daughters). This display appears in Fig.
9
(60), where the "Companies" node (60) is the root of the hierarchical display
for
the "France'" document set, and (58) are the daughters of (60). This would
have
s the effect of presenting which companies appear as indexing concepts in
documents that are also indexed by "France", along with quantitative
information
about the documents indexed by each company. For example, there are 27
documents indexed by both "France" and "Boeing". The indexing concept
Boeing (61) signifies, due to its position in the hierarchy, the path from the
root
to wit: All->countries->West Europe->France->Companies-> Boeing. Put
differently, indexing concept (61) is associated with the documents indexed by
both "France" (a country in west Europe) and "Boeing (company). The pertinent
information that is associated with this concept is 27 (No. of documents) and
60% (standing for 27 documents out of the 45 associated with indexing concept
~ s (60) - Boeing. Accordingly, any concept in the indexing concept hierarchy
display is associated with respective sub set of documents from among the
organized document subset. Obviously a document may be associated with more
than one concept of the organizing hierarchical display. A "respective" subset
of
documents encompasses also the special situation in which a concept is
2o associated with no documents.
The "organize by" operation may be interpreted as a recursive application of
the hierarchical indexing concept display, as its effect is to provide a new
hierarchical display for a node within a previously displayed hierarchy.
However, the hierarchical display is maintained predetermined considering that
in
2s the modified presentation, substantially, the same concepts are employed,
which
makes it easier for the user to follow "well known" and familiar concepts,
even
after applying the "organizing" operation.
As a special case, the organizing node can be the root of the concept
hierarchy, in which case the organized document subset will be displayed by a
3o hierarchical indexing concept set display that corresponds to the entire
concept
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-24-
hierarchy. A system may apply only this special case (always organizing by the
full hierarchy considering the root as the organizing node), in which case it
is
necessary to define only the organized node in order to apply an "organize by"
operation. Furthermore, a system may implement the hierarchical document
display such that at each point of time the user view is focused only on one
node
of the tree. In this case, applying the "organize by" operation implies
implicitly
that the organized node is the currently displayed node, saving the need of an
explicit definition of the organized node. If desired, by a specific
embodiment,
the default definition of organizing concept as the root node and the
organized by
concept as the currently displayed node may be realized by a single user
operation
say, for example, clicking on a predetermined icon.
As a particular (but not the only) mechanism of operation, in the case
where the organized document subset corresponds to a specific selected node in
a
hierarchical display, the hierarchical display of the organized subset is
displayed
as a new, dynamically created, daughter (or daughters) of the selected
organized
node. In the example above, the node "Companies" in Fig. 7 (60) is added
dynamically as a new daughter node of (59) the node "France", modifying the
hierarchical display that was presented to the user just before applying the
"organize by" operation. Several variations of the method may be implemented,
2o in which a new daughter node either replaces or is added as a sibling to
the
previously existing daughters of the organized node. Notwithstanding the
modification. the predetermined hierarchy of concepts is maintained in the
sense
that the category "company" is already known to the user (see e.g. Fig. 3)
before
applying the specified ''organize by" operation.
2s Once a modified hierarchical display has been created by applying an
"organize by" operation, as described above, any part of the new display may
be
subject to further "organize by" operations. In particular, a node that was
added
to the hierarchy in a previous "organize by" operation may be selected as the
organized subset in a later operation. Subsequent "organize by" operations on
the
;o modified dynamic display may be applied as requested by the user. In Fig 13
(69)
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-25-
the node "Boeing" which has been created by a previous "organize by" operation
(as in Fig. 9) is later selected as an organized node, where the organizing
node is
(65) "Activities". Thus, in this example, a node "Activities" (70) is
dynamically
added to the display, and its daughters (64) (signifying documents indexed by
both "France" and "Boeing" and by some activity) are associated, each, with
information that pertains to these documents. For example, there are 19
documents indexed by "France" (67) "Boeing" (69) and "Agreement" (71). The
specified organized by operation may be applied recursively (repeated) as many
time as required each time in respect of new selected "organized by" and
l o "oi°ganizing" concepts.
The basic form of the "organize by" operation may consist selecting one
node in a hierarchical display as the organized node, and one node in the
concept
hierarchy as the organizing node. The following paragraph describes extensions
to the basic form.
Multiple selection for simultaneous operation
Multiple selection of organizing nodes within a single ''organize by"
operation has the effect by one embodiment of adding all the selected nodes as
2o daughters of the organized node. For example, the organizing node "France"
may
be organized by, "Companies" and "Activities", which means that all the
documents associated with the indexing concept France will be organized by the
indexing concept "Companies" and separately by the indexing concept
"Activities'". If desired. the nodes "Companies" and ''Activities" are added
as
2s daughters to ''France".
Multiple selection of organized nodes has the effect of applying the "organize
by'' operation simultaneously to all selected nodes. For example, applying an
"organize by" operation with the same organizing node to both nodes "France"
and ''Spain". The net effect of selecting more than one organized nodes is
that
;o each node is associated with its respective organized by subset of
documents and
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-26-
then some operator or operators is (are) applied to the specified subsets so
as to
constitute resulting organized subset of documents that is then subject to the
organizing operation. In the latter example there is a first subset of
documents
associated with France, a second subset of documents associated with Spain. By
this particular example the operator that is applied to the subsets is OR
giving rise
to a document subset that includes documents that pertain only to Spain, only
to
France or to both. This resulting subset of documents is than being subject to
the
organizing operation by one or more organizing concepts.
In accordance with an embodiment of the invention, the set of documents may
be obtained by applying a search query to say conventional search engine that
operates similarly to as AltaVistaTM and display the resulting set along with
the
hierarchical display of the invention.
Thus, for example, Figs. 14 to 21 showing a succession of screen results
by applying the method in accordance with one embodiment of the invention.
Fig. 14 illustrates a predetermined indexing concept hierarchy (140) that
includes 11,000 documents (142) that constitute the document set and are
broken
down by the hierarchy concepts.
Applying a query (e.g. pagers 143) results in 318 documents (see 151 in
Fig. l~) that are broken down by the concept hierarchy. The list of documents
is
2o displayed (152), and, by this example, the first four documents are shown
in the
first page. The query itself ("pagers") is automatically assigned to
categories in
the hierarchy as if it were a document. The resulting category is illustrated
in the
Related category" field (153), to wit: Telecom All > Applications > Messaging
>
Paging. All the categories, except from "Paging" are shown in the hierarchical
?5 presentation (151, 154, and 155). Paging is a sub category of Applications
and
can be shown if the Bs°owse section of the screen is enlarged, or if
the user
decides to show it by, say, clicking a specified symbol (as described above).
Clicking the Paging will render the latter organized by category and the
All (i.e. the root ) organizing category. The net effect is that the 122
documents
;o that are associated with Paging are now broken down by the entire
hierarchical
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-27-
tree, as shown in Fig. 16. The "results for" field shows that the display
corresponds to the query "paging" (which by this example matches one of the
categories). The four documents shown in the search section are the first 4
out of
122 documents that meet the search.
Fig. 17 is the same as Fig. 16 except that now the documents that are
associated with sub-category Telecom Service Companies (171) are shown. This
may be achieved by simply clicking the relevant category in the hierarchy (by
this
particular example Telecom Service Companies - not shown in the hierarchy Fig.
17) and the documents associated therewith are shown. The documents that are
shown obviously relate to "paging" and telecom service companies.
Pig. 18 illustrates yet another degree of detail wherein only documents that
pertain to SIcyTel 181 (which forms sub-category of the specified Telecom
Service
CouZpaf7ies - not shown in the hierarchy of Fig. 18) are shown.
Now, Skytel constitutes the organized by concepts and the documents
~ s associated therewith constitute the organized document subset. Next,
clicking the
Zoom In symbol (182) will render the Telecom All root category (183) the
organizing category and the resulting hierarchical display is depicted in Fig.
19.
There are 12 documents (191) broken down by the predetermined
categories. Thus, for example, 8 documents are associated with the category
2o Business (192). Categories that have no documents associated therewith are
not
shown. Incidentally, the information that pertains to the sub documents
associated with each category is simply the number of documents ( 12 and 8 in
the latter example). The 12 documents concern both Skytel and paging. Four
out of the 12 pertinent documents are shown in the Search section of the
2s screen (193).
Considering now that only the documents from among the specified 12
documents that concern product companies (the "products" node) are of
interest, the user simply clicks the products category (200) in Fig. 20 and
the
8 relevant documents are shown at the search section of the screen (201)
3o If, from among the specified 8 documents only those that concern
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-28-
Motof°ola are of interest the user simply clicks the Motorola category
(210) in
Fig. 21 and in response thereto the pertinent 3 documents are shown.
Selecting text terms and segments for focused reading
s
In addition to the dynamic display, which provides a "table of contents"
style display for document sets, the invention provides in accordance with
another aspect thereof, new mechanisms for presenting parts of or all of the
text
of a document in a dynamic and effective manner. These mechanisms direct the
i o attention of the user to relevant parts of the document and enables quick
focusing
on these parts. For example, these relevant parts might be text segments that
contain relevant information for the user or can help deciding about the
relevance
of the document. The decision of which parts of the document should be in
focus
is dynamic, and may be changed according to user guidance or to the context in
~s which the document is being displayed.
There are two typical ways in an embodiment of the invention for focusing
the user attention on particular parts or pieces of information in the
document.
The first is by highlighting the parts of the text that should be in focus
(based on
important triggering terms) , and the second is by creating a summary for the
?o document that contains the parts in focus (based on important triggering
terms).
According to the invention, the parts of the document which should be
highlighted or be included in a summary are determined according to a set of
(one
or more) indexing concepts, among the indexing concepts of the document, that
are considered to be in focus at a certain stage of user interaction with the
system.
2s These indexing concepts are called focus indexing concepts.
According to the invention, the highlighting and summarization for a
given focus indexing concept is determined by the important triggering terms
for
that concept. The triggering terms for a concept are the occurrences in the
document of all terms which entail the attachment (or classification) of the
;p concept to the document. Highlighting and an extracted summary will include
the
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-29-
important triggering terms for the concept, or short segments of text that are
considered to be important. The degree of importance of terms and segments may
be quantified by some scoring mechanism, where the degree of importance of the
terms in a segment is factor in determining the degree of the segment
importance.
s The invention provides dynamic methods for determining (quantifying) which
triggering terms and segments are important in a given context of the user
interaction with the system that displays the documents.
It should be noted that in the context of obtaining a summary according to
one embodiment of the invention, the quantifying step assigns the same degree
of
~ o importance to all triggering terms. The latter option does not apply to
the aspect
which concerns emphasizing important triggering terms. Put differently,
insofar
as emphasizing important triggering terms, not all the triggering terms are
ranked
with the same degree of importance.
The important triggering terms and segments are presented to the user,
i 5 either in a form of an extracted summary, which contains the important
terms
and/or segments, or by highlighting the important terms within the display of
the
full document, or by some combination of the two methods. When using the term
important, one refers to the case where the degree of importance of triggering
terms and segments can be quantified and the display is restricted those with
the
2o highest importance. The amount of terms or segments to be included in the
display is determined by some mechanism, such as a threshold on the degree of
importance or on the number of items to be included. This ranking mechanism by
degree of importance is necessary when there are many important terms or
segments and it is desired to limit their display in order to achieve optimal
focus
2s of attention by the user. Fig. 10 (48) displays a summary of a document, in
which
the important terms are highlighted. (The important terms were determined
relative to the highlighted indexing concepts "Latin America" (50) and
''Lockheed Martin" (51) which are in the focus of interest to the user, as
explained below). The summary includes segments of the text that contain the
;o important terms. Fig. 8 (52) presents a full display of a document text, in
which
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-30-
important terms (relative to the indexing concept (54), see below) are
highlighted.
While the general scheme of making some form of highlighting triggering terms
in a document for display is available in previous systems, the invention, by
this
aspect, concerns selecting important terms, described below.
s
Selecting the important triggering terms within a text classification system
that quantifies the importance of triggering terms
One non-limiting method in the context of the invention refers to selecting
the important triggering terms in a document with respect to an indexing
concept
that is determined to be in focus (of interest) at a certain stage of the user
interaction with the system. For example, in Fig. 8 the indexing concept
''Product
specifications/capabilities"' (54) is selected to be in focus. This part of
the
invention refers to the case where the indexing concept was assigned to the
i s document by some text classification method, as described above. Such a
method
classifies the document to a certain indexing concept based on words, terms or
their combinations that appear in the document. It is assumed that it is
possible to
trace within the classification system which words or terms in the document
entailed the classification to the given indexing concept. Optionally, it is
possible
2o to quantify within the system the relative contribution of each term to the
classification of the document to the indexing concept. In certain
embodiments, a
trainable text classification method in which the terms and the degree to
which
they entail classification to the indexing concept are learned from training
documents, for which it is previously known whether they belong to the
indexing
?5 concept or not.
As non-limiting examples for possibilities for determining triggering terms,
consider the following trainable text classification methods.
D. Lewis, 1992, An evaluation of phrasal and clustered representations
on a text categorization problem, in Proc. of the 15th Int. ACM-SIGIR
3o Conference on Information Retrieval, pages 37-50. This method
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-31-
applies a Bayesian learning scheme for text classification. For a given
category, the method computes (during the training phase) certain
weights for terms (words or phrases) in the text, with respect to the
category. The score of the category for a particular document is
computed as a function (usually some sort of a normalized sum) of the
weights of the terms that appear in the document. When computing the
category score for a document, it is possible to trace the relative
contribution of each term in the document to the accumulative score.
Thus, triggering terms in this method will be those terms that provided
the highest contribution to the accumulative score of the document.
E. Wiener and J. Pedersen and A. Weigend, 1995, A neural network
approach to topic spotting, in Symposium on Document Analysis and
Information Retrieval, pages 317-332.
W.W. Cohen, Text categorization and relational learning, in Machine
> > Learning Journal, 1995, pages 124-132. This method learns
classification rules for each category, that consist of words or combination
of words. Each "tiring" of a rule, that is, the occurrence of the word or
word combination of the rule in the document, entails the classification of
the document to the category. Thus, in this method, the words and word
2o combinations in the rules that matched in the document will be considered
as triggering terms in the document.
According to the invention, the important triggering terms, to be included in
a summary or to be highlighted, are those term occurrences that signivicantly
2s contributed to the classification of the document to the focus indexing
concept. In
Fig. 8 the triggering terms for the indexing concept "Product
Specification/Capabilities" (54) are highlighted within the text (52) of the
document. Furthermore, when the relative contribution of triggering terms to
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-32-
classification can be determined (traced) then their degree of importance
would
be proportional to this degree of relative contribution to classification.
It should be noted that the method described above for selecting the
important triggering terms for an indexing concept in focus could be combined
with simpler methods for identifying the triggering terms for an indexing
concept
(such methods are not part of the invention). For example, when the indexing
concept is identical to a term or name that appears explicitly in the document
text
then the important term is simply the occurrence of the indexing concept in
the
text. (E.g. when the indexing concept is "France" and the important terms are
i o simply the explicit occurrences of the term "France'' in the text).
Another
example is a topical indexing concept that is identified in the text by a
manually
defined query. In this case the triggering terms are simply all the terms that
appear in the query (similar to document search systems that highlight
matching
query terms in the retrieved documents).
Those versed in the art will readily appreciate that the invention is not
bound to the specified specific techniques for determining important
triggering
terms.
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-33-
Multiple focus indexing concepts
Another method within the invention refers to selecting important terms
and segments for display by selecting dynamically several focus indexing
concepts. One way of selecting the focus indexing concepts is by letting the
user
select them interactively from the list of all indexing concepts of the
document. In
Fig. 10 the user have selected (50) "Latin America" and (51) "Lockheed Martin"
as focus indexing concepts. Consequently, the selected important terms, which
i o are highlighted in the document text (48), are the triggering terms for
both (50)
and (51). Other mechanisms for selecting the set of focus indexing concepts
may
be applied as well, such as the method described next. According to the
invention, the important triggering terms and segments are selected from the
important triggering terms and segments of each one of the focus indexing
concepts, applying some procedure that combines them and reevaluates their
degree of importance with respect to the complete set of focus indexing
concepts.
For example, the degree of importance of a triggering term or segment with
respect to the complete set of focus indexing concepts may be defined
(referred to
also as quantified) by its maximal (or minimal) degree of importance for any
of
?o the individual indexing concepts (applying a disjunctive (or conjunctive)
reasoning criterion), or by computing some averaging function of the
individual
importance degrees. According to the invention, the display of important terms
or
segments for the complete set of focus indexing concepts may distinguish
between terms that were selected originally for the different indexing
concepts
25 that compose the set. For example, a different color is attributed to each
indexing
concept, and the important terms related to this concept are highlighted by
the
corresponding color. In Fig. 10 the indexing concept "LATIN AMERICA" (50)
is highlighted with a blue background and "LOCKHEED MARTIN" (51) is
highlighted with a pink background (blue appear dancer than pink in the black
;o and white printing). Accordingly, the triggering terms for both concepts
("Brazil"
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-34-
and "Amazon" for "LATIN AMERICA" and "Lockheed Martin" for the indexing
concept "LOCKHEED MARTIN") are highlighted in the corresponding colors in
the document text (48).
Default focus indexing concepts
Another method within the invention refers to the selection of default
focus indexing concepts, to be used automatically as the focus indexing
concepts
when the document is presented to the user. According to the invention, the
default focus indexing concepts are selected according to the selection
conditions
that were applied in the process that led to the display of the document. In
particular, when the document is displayed as a result of a search query that
contains indexing concepts then the indexing concepts contained in the query
become the default focus indexing concepts. A particular setting for this
method
occurs when the document is selected for display within the hierarchical
document set display or within the dynamic hierarchical document set display.
In
Pig. 12 a document was selected for display from the node (document subset)
(61) "ARGENTINA". Accordingly, the default focus indexing concept is (62)
"ARGENTINA'' and the triggering term "Argentina'' (63) is highlighted within
2o the document text.
In this setting of a hierarchical display a document is selected for display
from the document set that is associated with a certain node in the hierarchy.
The
documents in this set satisfy a logical condition that is equivalent to a
search
query which is a conjunction (logical AND) of all indexing concepts in the
path
2s from the root of the displayed hierarchy to the selected node. Thus,
according to
the invention, the default focus indexing concepts are the concepts along this
path. Recall that parts of this path may correspond to paths within the
concept
hierarchy and parts of the path might be created dynamically within the
dynamic
hierarchical document set display. For example, in Fig. 13 the documents
;o associated with the node "Agreement" (71) satisfy a logical AND condition
for
CA 02371244 2001-08-23
WO 00/51024 PCT/IL00/00117
-35-
all indexing concepts on the displayed path from the root of the tree to this
node.
Optionally, for a pair of concepts x and y in the set of default focus
indexing
concepts, such that x is an ancestor of y in the concept hierarchy, it is
possible to
exclude x from the set of default focus indexing concepts. In the example of
Fig.
11, it is possible to exclude "West Europe" from the set of default focus
indexing
concepts since it is likely that the focus of interest for the user is
concerned in
particular with "France", which is a daughter of "West Europe" in the concept
hierarchy.
In some systems, the method of viewing document sets that are attached to
concept nodes in a (possibly dynamic) hierarchical document set display may be
combined with the use of explicit search queries issued by the user. In this
case, if
the document set attached to a concept node is restricted by an additional
condition supplied in an explicit search query, then the default focus
indexing
concepts will be a combination of the concepts of the path, as described
above,
t s and the concepts that are included in the query.
Alphabetical characters and Roman symbols are designated in the
description below for convenience only and do not necessarily imply a
particular
order of the method steps.
The present invention has been described with a certain degree of
2o particularity. but those versed in the art will readily appreciate that
various
alterations and modifications will be carried out without departing from the
scope
of the following Claims. Thus, by way of example, whereas, typically, the
organized document subset is determined by defining one (or more) of the
concepts in the hierarchy as "organized by" concept, thereby rendering the
subset
2s of documents associated therewith ''organized document subset'' this is not
necessarily always the case. Thus, according to a more generalized embodiment
any determination of subset of documents (organized document subset) by
utilizing the so displayed hierarchy (i.e. implemented using information
derived
from the so displayed hierarchy) is embraced by the invention.