Patent 3172963 Summary

(12) Patent Application:	(11) CA 3172963
(54) English Title:	ARTIFICIAL INTELLIGENCE ASSISTED REVIEWER RECOMMENDER AND ORIGINALITY EVALUATOR
(54) French Title:	MECANISME DE RECOMMANDATION DE REVISEUR ASSISTE PAR INTELLIGENCE ARTIFICIELLE ET EVALUATEUR DE L'ORIGINALITE
Status:	Report sent

Bibliographic Data

(51) International Patent Classification (IPC):	G06F 40/146 (2020.01) G06F 16/22 (2019.01)
(72) Inventors :	MA, YINGHAO (United States of America) TEJOOKAYA, UTPAL (United States of America) LI, JINGLEI (United States of America) KRANE, SONJA (United States of America) PRAKASH, JOFIA JOSE (United States of America) ZHU, MARLEY (United States of America) VAN PROOIJEN, JEROEN (United States of America) HANSFORD, JONATHAN (United States of America) SCOTT, WALLACE (United States of America) JIANG, CHENGMIN (United States of America)
(73) Owners :	AMERICAN CHEMICAL SOCIETY (United States of America)
(71) Applicants :	AMERICAN CHEMICAL SOCIETY (United States of America)
(74) Agent:	GOWLING WLG (CANADA) LLP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date:	2022-04-29
(87) Open to Public Inspection:	2022-10-29
Examination requested:	2022-09-22
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	Yes
(86) PCT Filing Number:	PCT/US2022/026916
(87) International Publication Number:	3172963
(85) National Entry:	2022-09-22

(30) Application Priority Data:

Application No.	Country/Territory	Date
63/181,539	United States of America	2021-04-29
63/181,560	United States of America	2021-04-29

Abstracts

English Abstract

A method is disclosed, involving converting each structured text document stored in a database into a vector, building a search index using the one or more vectors of the structured text documents stored in the database, then receiving a new structured text document, converting the structured text document into a vector, then searching the search index using the one or more vectors of the new structured text document, and generating a list of N structured text document from the database similar to the new structured text document based on said search.

Claims

Note: Claims are shown in the official language in which they were submitted.

PATENT
Attorney Docket No.: 09275.0348-00304
CLAIMS
What is claimed is:
1. A method comprising:
converting each structured text document stored in a database into one or
more vectors, each structured text document in the database having a
title, an abstract, and author;
building one or more search index using the one or more vectors of the
structured text documents stored in the database;
receiving a new structured text document, the structured text document
having a title, an abstract, and an author;
converting the new structured text document into one or more vectors;
searching the search index using the one or more vectors of the new
structured text document; and
generating a list of N structured text document from the database similar
to the new structured text document based on said search.
2. The method of claim 1, further comprising:
converting each structured text document stored in a database into one or
more vectors by means of SPECTER embedding; and
converting the new structured text document into one or more vectors by
means of SPECTER embedding.
3. The method of claim 1, further comprising:
searching the search index using the vector or vectors of the new
structured text document using the KNN algorithm.
4. The method of claim 1, wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, and a full text; and
the new structured text document is associated with a title, abstract,
author, and full text.
32
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
5. The method of claim 1 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, and metadata; and
the new structured text document is associated with a title, an abstract, an
author, and metadata.
6. The method of claim 1, wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, a full text, and metadata; and
the new structured text document is associated with a title, an abstract, an
author, a full text, and metadata.
7. A method comprising:
converting each structured text document stored in a database into one or
more vectors, each structured text document in the database associated
with a title, an abstract, an author, and a reviewer;
building a search index using the vectors of the structured text documents
stored in the database;
receiving a new structured text document, the structured text document
associated with a title, an abstract, and an author;
converting the new structured text document into a one or more vectors
searching the search index using the one or more vectors of the new
structured text document; and
generating a list of N structured text document from the database similar
to the new structured text document based on said search; and
compiling the authors and reviewers of the N most similar structured text
document from the database.
8. The method of claim 7, further comprising:
converting each structured text document stored in a database into one or
more vectors by means of SPECTER embedding; and
converting the new structured text document into one or more vectors by
means of SPECTER embedding.
33
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
9. The method of claim 7, further comprising:
searching the search index using the one or more vectors of the new
structured text document using the KNN algorithm.
10.The method of claim 7, wherein:
N equals 100.
11.The method of claim 7, wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, and a full text; and
the new structured text document is associated with a title, abstract,
author, and full text.
12.The method of claim 7 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, and metadata; and
the new structured text document is associated with a title, abstract,
author, and metadata.
13.The method of claim 7, wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, a full text, and metadata; and
the new structured text document is associated with a title, abstract,
author, full text, and metadata.
14.A system for identifying similar structured text documents to a new
structured text
document, comprising:
at least one processor, and
At least one non-transitory computer readable media storing instructions
configured to cause the processor to:
convert each structured text document stored in a database into
one or more vectors, each structured text document in the
database having a title, an abstract, and an author;
build a search index using the vectors of the structured text
documents stored in the database;
34
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
receive a new structured text document, the structured text
document having a title, an abstract, and an author;
convert the structured text document into one or more vectors;
search the search index using the vector of the new structured text
document; and
generate a list of N structured text document from the database
similar to the new structured text document based on said search.
15.The system of claim 14, wherein:
each structured text document stored in a database is converted into one
or more vectors by means of SPECTER embedding; and
the new structured text document is converted into one or more vectors by
means of SPECTER embedding.
16.The system of claim 14, wherein:
the search of the search index using the vector or vectors of the new
structured text document uses the KNN algorithm.
17. The system of claim 14, wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, and a full text; and
the new structured text document is associated with a title, abstract,
author, and full text.
18.The system of claim 14 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, and metadata; and
the new structured text document is associated with a title, abstract,
author, and metadata.
19.The system of claim 14 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, a full text, and metadata; and
the new structured text document is associated with a title, abstract,
author, full text, and metadata.
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
20.A system for identifying similar structured text documents to a new
structured text
document, comprising:
at least one processor, and
At least one non-transitory computer readable media storing instructions
configured to cause the processor to:
convert each structured text document stored in a database into
one or more vectors, each structured text document in the
database having a title, an abstract, and an author;
build a search index using the vectors of the structured text
documents stored in the database;
receive a new structured text document, the structured text
document having a title, an abstract, and an author;
convert the structured text document into a one or more vectors;
search the search index using the one or more vectors of the new
structured text document; and
generate a list of N structured text document from the database
similar to the new structured text document based on said search;
compile the authors and reviewers of the N most similar structured
text document from the database.
21.The system of claim 20, wherein:
each structured text document stored in a database is converted into one
or more vectors by means of SPECTER embedding; and
the new structured text document is converted into one or more vectors by
means of SPECTER embedding.
22.The system from claim 20, wherein:
the search of the search index using the vector or vectors of the new
structured text document uses the KNN algorithm.
23. The system from claim 20, wherein:
N equals 100.
36
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
24.The system from claim 20, wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, and a full text; and
the new structured text document is associated with a title, abstract,
author, and full text.
25.The system from claim 20 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, and metadata; and
the new structured text document is associated with a title, abstract,
author, and metadata.
26.The system from claim 20 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a reviewer, a full text, and metadata; and
the new structured text document is associated with a title, abstract,
author, full text, and metadata.
27.A method comprising:
converting each structured text document stored in a database into one or
more vectors, each structured text document in the database having a title, an

abstract, and an author;
using the vectors of the structured text documents stored in a database to
create one or more similarity search index;
for each structured text document from the database, searching the search
index using the one or more vectors of the structured text document-
for each structured text document from the database, generating a list of N
other structured text document from the database similar to the structured
text
document based on said search; and
storing each list of N other structured text document from the database
similar
to the structured text document in a table.
28.the method of claim 27, further comprising:
receiving a new structured text document, the structured text document
having a title, an abstract, and an author;
37
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
converting the new structured text document into one or more vectors;
searching the search index using the one or more vectors of the structured
text document;
generating a list of N structured text document from the database similar to
the new structured text document using the said search index; and
storing the list of N other structured text document from the database similar

to the new text document in a table.
29.the method from claim 27 further comprising:
converting the structured text document into one or more vectors by means of
SPECTER embedding.
30.the method from claim 29 further comprising:
receiving a new structured text document, the structured text document
having a title, an abstract, and an author;
converting the new structured text document into one or more vectors by
means of SPECTER embedding.;
searching the search index using the one or more vectors of the structured
text document;
generating a list of N structured text document from the database similar to
the new structured text document using the said search index; and
storing the list of N other structured text document from the database similar

to the new text document in a table.
31.the method from claim 27 wherein:
N equals 50.
32.the method from claim 28 wherein:
N equals 50.
33.the method from claim 27, wherein:
each structured text document stored in a database associated with a title, an

abstract, an author, and a full text.
38
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
34.the method from claim 33, further comprising:
receiving a new structured text document, the structured text document
having a title, an abstract, an author, and a full text;
converting the new structured text document into one or more vectors;
searching the search index using the one or more vectors of the new
structured text document;
generating a list of N structured text document from the database similar to
the new structured text document using the said search index; and
storing the list of N other structured text document from the database similar

to the new text document in a table.
35.the method from claim 27, wherein:
each structured text document stored in a database associated with a title, an

abstract, an author, and metadata.
36.the method from claim 35, further comprising:
receiving a new structured text document, the structured text document
having a title, an abstract, an author, and metadata;
converting the new structured text document into one or more vectors;
searching the search index using the one or more vectors of the new
structured text document;
generating a list of N structured text document from the database similar to
the new structured text document based on said search; and
storing the list of N other structured text document from the database similar

to the new text document in a table.
37.the method from claim 27, wherein:
each structured text document stored in a database associated with a title, an

abstract, an author, a full text, and metadata.
38.the method from claim 37, further comprising:
receiving a new structured text document, the structured text document
having a title, an abstract, authors, a full text, and metadata;
39
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
converting the new structured text document into one or more vectors;
searching the search index using the one or more vectors of the new
structured text document;
generating a list of N structured text document from the database similar to
the new structured text document based on said search; and
storing the list of N other structured text document from the database similar

to the new text document in a table.
39.A system for identifying similar structured text documents to a new
structured text
document, comprising:
at least one processor, and
at least one non-transitory computer readable media storing instructions
configured to cause the processor to:
convert each structured text document stored in a database into
one or more vectors, each structured text document in the
database having a title, an abstract, and an author;
use the vectors of the structured text documents stored in a
database to create a similarity search index;
for each structured text document from the database, search the
search index using the one or more vectors of the structured text
document
for each structured text document from the database, generate a
list of N other structured text document from the database similar to
the structured text document based on said search; and
store each list of N other structured text document from the
database similar to the structured text document in a table.
40.The system from claim 39, further comprising:
the at least one non-transitory computer readable media storing
instructions further configured to cause the processor to:
receive a new structured text document, the structured text
document having a title, an abstract, and an author;
convert the new structured text document into one or more vectors;
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
search the search index using the one or more vectors of the
structured text document;
generate a list of N structured text document from the database
similar to the new structured text document using the said search
index; and
store the list of N other structured text document from the database
similar to the new text document in a table.
41.The system from claim 39 further comprising:
the at least one non-transitory computer readable media storing instructions
further configured to cause the processor to:
convert the structured text document into one or more vectors by means
of SPECTER embedding.
42.The system from claim 41 further comprising:
the at least one non-transitory computer readable media storing
instructions further configured to cause the processor to:
receive a new structured text document, the structured text
document having a title, an abstract, and an author;
convert the new structured text document into one or more vectors
by means of SPECTER embedding.;
search the search index using the one or more vectors of the
structured text document;
generate a list of N structured text document from the database
similar to the new structured text document using the said search
index; and
store the list of N other structured text document from the database
similar to the new text document in a table.
43.The system of claim 39 wherein:
N equals 50.
44.The system of claim 40 wherein:
N equals 50.
41
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
45.The system from claim 39 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, and a full text.
46.The system from claim 45 further comprising:
the at least one non-transitory computer readable media storing instructions
further configured to cause the processor to:
receive a new structured text document, the structured text document
having a title, an abstract, an author, and a full text;
convert the new structured text document into one or more vectors;
search the search index using the one or more vectors of the new
structured text document;
generate a list of N structured text document from the database similar to
the new structured text document using the said search index; and
store the list of N other structured text document from the database similar
to the new text document in a table.
47.The system from claim 39 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, and metadata.
48.The system from claim 47 further comprising:
the at least one non-transitory computer readable media storing instructions
further configured to cause the processor to:
receive a new structured text document, the structured text document
having a title, an abstract, an author, and metadata;
convert the new structured text document into vectors;
compare the new vector of each structured text document to the vectors of
the structured text documents from the database;
generate a list of N structured text document from the database similar to
the new structured text document based on said comparison;
store the list of N other structured text document from the database similar
to the new text document in a table.
42
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
49.The system of claim 39 wherein:
each structured text document stored in a database associated with a title,
an abstract, an author, a full text, and metadata.
50.The system of claim 49 further comprising:
the at least one non-transitory computer readable media storing instructions
further configured to cause the processor to:
receive a new structured text document, the structured text document
having a title, an abstract, an author, a full text, and metadata;
convert the new structured text document into one or more vectors;
search the search index using the one or more vectors of the new
structured text document;
generate a list of N structured text document from the database similar to
the new structured text document based on said search; and
store the list of N other structured text document from the database similar
to the new text document in a table.
43
CA 03172963 2022- 9- 22

Description

Note: Descriptions are shown in the official language in which they were submitted.

PATENT
Attorney Docket No.: 09275.0348-00304
ARTIFICIAL INTELLIGENCE ASSISTED REVIEWER RECOMMENDER AND
ORIGINALITY EVALUATOR
Cross-Reference to Related Applications
[0001] This application claims priority to provisional patent applications
Nos.
63/181,539, and 63/181,560, filed April 29, 2021.
BACKGROUND
Field
[0002] Embodiments of the present disclosure relate to Artificial Intelligence

Tools for identifying reviewers and locating similar papers. In particular,
some
embodiments disclose a system and a method for identifying relent reviewers
for submissions, while some embodiments disclose a system for identifying
similar documents to a given submission.
Description of Related Art
[0003] Before the advent of modern machine learning, the peer review process
could be assisted only minimally by computers. Editors and managers of
scientific or academic journals or publishers rely on external reviewers in
the
field of a particular paper or submission to ensure the submission complies
with best practices and methodologies of the relevant scientific or academic
field.
[0004] However, one risk of identifying reviewers personally known to editors
and managers is reviewer fatigue - editors and managers who return to
known reliable reviewers time and again may exhaust the willingness of that
reviewers to contribute, or overburden the reviewer accidentally. Similarly,
1
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
without a pipeline from which to draw new reviewers outside of personal
connections, identifying new reviewers is a time-consuming and unreliable
process.
[0005] A similar problem exists for evaluating the originality of a
submission.
Conventional search techniques or reliance on the personal knowledge of
editors are limited by the ability of the editor in question to generate
search
terms or properly identify similar documents. Existing tools cannot accurately

perform this analysis.
[0006] Therefore, there is a need for improved methods for leveraging machine
learning to identify and recommend reviewers for scientific or academic
journal submissions, as well as to evaluate the originality of each
submission.
SUMMARY
[0007] One aspect of the present disclosure is directed to a method for
identifying similar structured text documents to a new structured text
document. The method comprises, for example, converting each structured
text document stored in a database into one or more vectors, each structured
text document in the database having a title, an abstract, and author. The
method further comprises, for example, building a search index using the one
or more vectors of the structured text documents stored in the database. The
method further comprises, for example, receiving a new structured text
document, the structured text document having a title, an abstract, and
2
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
author. The method further comprises, for example, converting the structured
text document into one or more vectors. The method further comprises, for
example, searching the search index using the one or more vectors of the
new structured text document; Finally, the method further comprises, for
example, generating a list of N structured text document from the database
similar to the new structured text document based on said search
[0008] Yet another aspect of the present disclosure is directed to suggesting
reviewers for a new structured text document. The method comprises, for
example, converting each structured text document stored in a database into
one or more vectors, each structured text document in the database
associated with a title, an abstract, an author, and a reviewer. The method
further comprises, for example, building a search index using the vectors of
the structured text documents stored in the database. The method further
comprises, for example, receiving a new structured text document, the
structured text document associated with a title, an abstract, and an author.
The method further comprises, for example, converting the new structured
text document into a one or more vectors. The method further comprises, for
example, searching the search index using the one or more vectors of the
new structured text document. The method further comprises, for example,
generating a list of N structured text document from the database similar to
the new structured text document based on said search. Finally, the method
further comprises, for example, compiling the authors and reviewers of the N
most similar structured text document from the database.
3
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
[0009] Yet another aspect of the present disclosure is directed to a system
for
identifying similar structured text documents to a new structured text
document. The system comprises, for example, at least one processor, and at
least one non-transitory computer readable media storing instructions
configured to cause the processor, to for example, convert each structured
text document stored in a database into one or more vectors, each structured
text document in the database having a title, an abstract, and an author. The
processor may also, for example, build a search index using the vectors of
the structured text documents stored in the database. The processor may
also, for example, receive a new structured text document, the structured text

document having a title, an abstract, and an author. The processor may also,
for example, convert the structured text document into one or more vectors.
The processor may also, for example, search the search index using the
vector of the new structured text document. Finally, the processor may also,
for example, and generate a list of N structured text document from the
database similar to the new structured text document based on said search.
[0010] Yet another aspect of the present disclosure is directed to a system
for
identifying similar structured text documents to a new structured text
document. The system comprises, for example, at least one processor, and at
least one non-transitory computer readable media storing instructions
configured to cause the processor, to for example, convert each structured
text document stored in a database into one or more vectors, each structured
text document in the database having a title, an abstract, and an author. The
4
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
processor may also, for example, build a search index using the vectors of
the structured text documents stored in the database. The processor may
also, for example, receive a new structured text document, the structured text

document having a title, an abstract, and an author. The processor may also,
for example convert the structured text document into a one or more vectors.
The processor may also, for example, search the search index using the one
or more vectors of the new structured text document. The processor may
also, for example, generate a list of N structured text document from the
database similar to the new structured text document based on said search.
Finally, the processor may also, for example, compile the authors and
reviewers of the N most similar structured text document from the database.
[0011] One aspect of the present disclosure is directed at a method for
identifying similar structured text documents to other structured text
documents. The method comprises, for example, converting each structured
text document stored in a database into one or more vectors, each structured
text document in the database having a title, an abstract, an author, a full
text,
and metadata. The method further comprises, for example using the vectors
of the structured text documents stored in a database to create a similarity
search index. The method further comprises, for example, for each structured
text document from the database, searching the search index using the one
or more vectors of the structured text document. The method further
comprises, for example, for each structured text document from the database,
generating a list of N other structured text document from the database
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
similar to the structured text document based on said search. Finally, the
method further comprises, for example storing each list of N other structured
text document from the database similar to the structured text document in a
table.
[0012] Yet another aspect of the present disclosure is directed at a system
for
identifying similar structured text documents to other structured text
documents, comprising at least one processor, and at least one non-transitory
computer readable media storing instructions configured to cause the
processor to, for example, convert each structured text document stored in a
database into one or more vectors, each structured text document in the
database having a title, an abstract, and an author. The processor may
further be configured to use the vectors of the structured text documents
stored in a database to create a similarity search index. The processor may
further be configured to, for each structured text document from the database,

search the search index using the one or more vectors of the structured text
document. The processor may further be configured to, for each structured
text document from the database, generate a list of N other structured text
document from the database similar to the structured text document based on
said search. Finally, the processor may further be configured to store each
list
of N other structured text document from the database similar to the
structured text document in a table.
6
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
BRIEF DESCRIPTION OF DRAWING(S)
[0013] FIG. 1 depicts a system for performing a method of finding similar
structured text documents stored in a database to a new structured text
document.
[0014] FIG. 2 depicts further embodiments of the system from FIG. 1, where the

method is used to generate a list of recommended reviewers for the new
structured text document.
[0015] FIG. 3 depicts further embodiments of the system from FIG 2.
[0016] FIG. 4 depicts a system for performing a method of generating a table
of
similar structured text documents.
DETAILED DESCRIPTION
[0017] It is an object of embodiments of the present disclosure to improve the

workflow for editors and managers of scientific or academic publishing
houses. Scientific articles and other similar types of academic works,
submitted as structured text documents, require peer-review before
publication in order to ensure the structured text document comports with best

practices and methodologies of the relevant scientific or academic field. It
would also be useful to evaluate these structured text document for
originality
by comparing pre-publication works to published works. Methods are
provided for identifying reviewers and similar articles to structured text
documents.
7
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
[0018] It should be understood that the disclosed embodiments are intended to
be performed by a system or similar electronic device capable of
manipulating, storing, and transmitting information or data represented as
electronic signals as needed to perform the disclosed methods. The system
may be a single computer, or several computers connected via the internet or
other telecommunications means.
[0019] A method involves the comparison of structured text documents, the
structured text documents having a title, an abstract, and an author. A
structured text document may be a draft, a manuscript, a book, an article, a
thesis, a dissertation, a monograph, a report, a proceeding, a standard, a
patent, a preprint, a grant, or other working text. An abstract may be a
summary, synopsis, digest, precis, or other abridgment of the structured text
document. An author may be any number of individuals or organizations. A
structured text document may also have a full text, body, or other content. A
structured text document may also have metadata, such as citations. A
person of ordinary skill in the art would understand that a structured text
document could take many forms, such as a Word file, PDF, LaTeX, or even
raw text.
[0020] A method may involve the system receiving a new structured text
document. The new structured text document may be received by various
means, including electronic submission portal, email, a fax or scan of a
physical copy converted into a structured text document through a process
8
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
such as optical character recognition or similar means, or other means for
digital transmission.
[0021] The system may convert the new structured text document into a vector
or vectors using a natural language processing algorithm with a vector output.

In broad terms, suitable algorithms accept text as input and render a
numerical representation of the input text, known as a vector, as output.
Suitable natural language processing algorithms include examples such as
Doc2Vec, GloVe/PCA projection, BERT, SciBERT, or SPECTER, or
Universal Sentence Encoder, though a person of ordinary skill in the art may
recognize other possible natural language processing algorithms. A vector, in
some embodiments, can be a mathematical concept with magnitude and
direction. In other embodiments, a vector can be a collection of values
representing a word's meaning in relation to other words. In yet other
embodiments, a vector can be a collection of values representing a text's
value in relation to other texts.
[0022] Two embodiments of a vector can be vector 1 with the values (A, B) and
vector 2 with the values (C, D) where A, B, C, and D are variables
representing any number. One possible measure of distance, the Euclidean
distance, between vector 1 and vector 2 is equal to ,AC
___________________________ - A)2 + (D - B)2. Of
course, one skilled in the art can recognize that vectors can have any number
of values. One skilled in the art would also recognize measures of distance
between vectors beyond the Euclidean distance.
9
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
[0023] In some embodiments, different components of a structured text
document may be converted into separate vectors. In other embodiments, not
all components of a structured text document are converted into a vector. For
example, if a structured text document has a title, abstract, author,
metadata,
and full text, the title and abstract may be converted into one vector, the
full
text into another, citation into another, the metadata into one or more
vectors,
and the author is not converted into a vector.
[0024] In some embodiments, the structured text document database may be
implemented as a collection of training data, such as MSPUBS database or
the Microsoft Academic Graph, or may be implemented using any desired
collection of structured text documents such as a journal's archive or
catalog.
The database may be implemented through any suitable database
management system such as Oracle, SQL Server, MySQL, PostgreSQL,
Microsoft Access, Amazon RDS, HBase, Cassandra, MongoDB, Neo4J,
Redis, Elasticsearch, Snowflake, BigQuery, or the like.
[0025] In some embodiments, the system may convert each structured text
document stored in a database into a vector or vectors using the same
algorithms as described for the new structured text document. The structured
text documents stored in the database may also have a reviewer, editor, or
other non-authorial contributor.
[0026] In some embodiments, the system may build a search index using the
vectors of the structured text documents stored in the database. The search
index may be of any suitable type, such as a flat index, a locality sensitive
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
hash(LSH), an inverted file index (IVF), or a Hierarchical Navigable Small
World (HNSW) graph. In embodiments where structured text documents are
converted into multiple vectors, the multiple vectors of a structured text
document can be concatenated into a single vector before the search index is
built. In other embodiments where structured text documents are converted
into multiple vectors, each type of vector is used to build a search index for

that type of vector.
[0027] In some embodiments, the system uses the vector or vectors of the new
structured text document to search the search index. The search may be
performed using any suitable algorithm, such as K-nearest neighbors or K-
means clustering. In embodiments using multiple vectors for structured text
documents, where the vectors are concatenated into one vector before the
search index is built, the multiple vectors of the new structured text
document
are likewise concatenated before searching the search index. In other
embodiments where structured text documents are converted into multiple
vectors, and where different search indexes are built for different types of
vectors, each vector of the additional structured text document are searched
separately, and the results ensembled together. Ensembling the results may
be as simple as averaging the results, though more complex methods of
ensembling are possible. Based on the search results, the system identifies
the N most similar structured text documents from the database to the new
structured text documents are. N may be any desired number, such as 10, or
100, based on the needs of the implementers. For example, after
11
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
experimentation, the inventors determined that looking at the top 100 most
similar structured text documents and compiling their authors and reviewers
led to a higher chance of overlap. The inventors found that 100 structured
text
documents strike a good balance between good results and manageable data
size and computation cost.
[0028] In some embodiments, once the system identifies the N most similar
structured text documents from the database, the system compiles the
reviewers and authors associated with the N most similar structured text
documents from the database. In one embodiment, the structured text
document database is queried to provide the information on authors and
reviewers for each of the N most similar structured text documents from the
database. In some embodiments, compilation may consist of listing all
authors and reviewers of each structured text document on the list of similar
documents. In some embodiments, compilation can consist of weighing
authors and reviewers, by, for example, listing authors and reviewers who
authored or reviewed more than one document from the list of similar
documents as the first recommended reviewers. In another embodiment,
compilation can consist of listing the authors and reviewers of the most
similar
structured text document to the new structured text document first on the list

of recommended reviewers.
[0029] For an illustrative example of compiling the reviewers and authors,
consider a system configured to identify the two most similar (N = 2)
structured text documents. The system identifies Text 1, and Text 2. Text 1
12
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
has a similarity score of 0.5 and the authors author 1, author 2, and author 3

and the reviewers reviewer 1, reviewer 2, and reviewer 3. The system also
identifies Text 2 with similarity score 0.4, authors author 1, author 4, and
author 5, and reviewers reviewer 4, reviewer 2, and reviewer 3. In some
embodiments, the similarity score of a structured text document is 1 minus
the distance between the vector of the additional structured text document
and the structured text document, though other methods of calculating
similarity score are possible. The compilation of authors would result in the
follows ranking:
= Author 1 = 0.5 + 0.4 = 0.9
= Author 2 = 0.5
= Author 3 = 0.5
= Author 4 = 0.4
= Author 5 = 0.4
The compilation of reviewers would result in the follows ranking:
= Reviewer 2 = 0.5 + 0 . 4 = 0.9
= Reviewer 3 = 0.5 + 0.4 = 0.9
= Reviewer 1 = 0.5
= Reviewer 4 = 0.4
[0030] A further method may involve the system converting each structured text

document stored in a database into a vector, each structured text document
in the database having a title, an abstract, and an author. A structured text
document may be a draft, a manuscript, a book, an article, a thesis, a
13
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
dissertation, a monograph, a report, a proceeding, a standard, a patent, a
preprint, a grant, or other working text. An abstract may be a summary,
synopsis, digest, precis, or other abridgment of the structured text document.

An author may be any number of individuals or organizations. A structured
text documents may also have a full text, body, or other content. A structured

text document may also have metadata, such as citations. The structured text
documents stored in the database may also have a reviewer, editor, or other
non-authorial contributor. A person of ordinary skill in the art would
understand that a structured text document could take many forms, such as a
Word file, PDF, LaTeX, or even raw text.
[0031] In some embodiments, the structured text document database may be
implemented as a collection of training data, such as Microsoft Academic
Graph, or may be implemented using any desired collection of structured text
documents such as a journal's archive or catalog. The database may be
implemented through any suitable database management system such as
Oracle, SQL Server, MySQL, PostgreSQL, Microsoft Access, Amazon RDS,
HBase, Cassandra, MongoDB, Neo4J, Redis, Elasticsearch, Snowflake,
BigQuery, or the like.
[0032] The system's conversion of each structured text document into a vector
or vectors may be accomplished by a natural language processing algorithm
with a vector output. In broad terms, suitable algorithms accept text as input

and render a numerical representation of the input text, known as a vector, as

output. Suitable natural language processing algorithms include examples
14
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
such as Doc2Vec, GloVe/PCA projection, BERT, SciBERT, SPECTER, or
Universal Sentence Encoder, though a person of ordinary skill in the art may
recognize other possible natural language processing algorithms. A vector, in
some embodiments, can be a mathematical concept with magnitude and
direction. In other embodiments, a vector can be a collection of values
representing a word's meaning in relation to other words. In yet other
embodiments, a vector can be a collection of values representing a texts
value in relation to other texts.
[0033] In some embodiments, different components of a structured text
document may be converted into separate vectors by the system. In other
embodiments, not all components of a structured text document are
converted into a vector. For example, if a structured text document in the
structured text document database has a title, abstract, author, full text,
and
metadata, the title and abstract may be converted into one vector, the full
text
into another, metadata into one or more vectors, and the author is not
converted into a vector.
[0034] In some embodiments, the system may build a search index using the
vectors of the structured text documents stored in the database. The search
index may be of any suitable type, such as a flat index, a locality sensitive
hash(LSH), an inverted file index (IVF), or a Hierarchical Navigable Small
World (HNSW) graph. In embodiments where structured text documents are
converted into multiple vectors, the multiple vectors of a structured text
document can be concatenated into a single vector before the search index is
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
built. In other embodiments where structured text documents are converted
into multiple vectors, each type of vector is used to build a search index for

that type of vector. In embodiments where structured text documents are
converted into multiple vectors, the multiple vectors of a structured text
document can be concatenated into a single vector before the search index
is built. In other embodiments where structured text documents are converted
into multiple vectors, each type of vector is used to build a search index for

that type of vector.
[0035] In some embodiments, the system uses the vector or vectors of the new
structured text document to search the search index. The search may be
performed using any suitable algorithm, such as K-nearest neighbors or K-
means clustering. In embodiments using multiple vectors for structured text
documents, where the vectors are concatenated into one vector before the
search index is built, the multiple vectors of the new structured text
document
are likewise concatenated before searching the search index. In other
embodiments where structured text documents are converted into multiple
vectors, and where different search indexes are built for different types of
vectors, each vector of the additional structured text document are searched
separately, and the results ensembled together. Ensembling the results may
be as simple as averaging the results, though more complex methods of
ensembling are possible. Based on the search results, the system identifies
the N most similar structured text documents from the database to the new
structured text documents. N may be any desired number, such as 50, based
16
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
on the needs of the implementers. For example, after experimentation, the
inventors determined that looking at the top 50 most similar structured text
documents led to higher quality lists of similar structured text documents.
The
inventors found that 50 structured text documents strike a good balance
between good results and manageable data size and computation cost.
[0036] In some embodiments, the system may select the list of similar
documents based on the search of the similarity search index for most similar
vectors to the vector of the new structured text document. The list of similar

documents may be the N-nearest structured text documents to each
structured text document, or it may be all structured text documents within a
certain distance D of each structured new structured text document.
[0037] In some embodiments, the system stores each list of similar documents
in a table constructed by aggregating each list of similar documents for each
document in the structured text document database. In some embodiments,
the table may be implemented as a simple array of arrays or list of lists. In
other embodiments, the table may be a more sophisticated data structure,
such as an Oracle, SQL Server, MySQL, PostgreSQL, Microsoft Access,
Amazon RDS, HBase, Cassandra, MongoDB, Neo4J, Redis, Elasticsearch,
Snowflake, or BigQuery database or data structure store.
[0038] New structured text documents may be searched against the database
of structured text documents. The new structured text document may be
received by various means, including electronic submission portal, email, a
fax or scan of a physical copy converted into a structured text document
17
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
through a process such as optical character recognition or similar means, or
other means for digital transmission.
[0039] Once received by the system performing a disclosed embodiment, the
new structured text document may be converted into a vector. Conversion of
the new structured text document into a vector may be accomplished as
previously described. Then, the vector of the new structured text document is
searched against the vectors of each structured text document in the
database, which are stored in the similarity search index. Based on that
comparison a list of similar documents is generated, consistent with the
preceding description generating a list of similar documents. The list of
similar
documents for the new structured text document is added to the results table.
[0040] FIG 1 shows a schematic block diagram 100 of a system for performing
the disclosed exemplary embodiment of a method including computerized
systems for identifying similar structured text documents. In some
embodiments, system 100 involves structured text document database 101,
vector calculations 102a and 102b, search index 103, search 110, new
structured text document 104, and a list of similar documents 105.
[0041] In some embodiments, system 100 should be understood as a computer
system or similar electronic device capable of manipulating, storing, and
transmitting information or data represented as electronic signals as needed
to perform the disclosed methods. System 100 may be a single computer, or
several computers connected via the internet or other telecommunications
means.
18
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
[0042] In some embodiments, the structured text document database 101 may
be implemented as a collection of training data, such as the Microsoft
Academic Graph, or may be implemented using any desired collection of
structured text documents such as a journal's archive. The database may be
implemented through any suitable database management system such as
Oracle, SQL Server, MySQL, PostgreSQL, Microsoft Access, Amazon RDS,
HBase, Cassandra, MongoDB, Neo4J, Redis, Elasticsearch, Snowflake,
BigQuery, or the like.
[0043] In some embodiments, vector calculation 102a and 102b may be
implemented by system 100 using a natural language processing algorithm
with a vector output. In broad terms, suitable algorithms accept text as input

and render a numerical representation of the input text, known as a vector, as

output. Suitable natural language processing algorithms include examples
such as Doc2Vec, GloVe/PCA projection, BERT, SciBERT, or SPECTER, or
Universal Sentence Encoder, though a person of ordinary skill in the art may
recognize other possible natural language processing algorithms. A vector, in
some embodiments, can be a mathematical concept with magnitude and
direction. In other embodiments, a vector can be a collection of values
representing a word's meaning in relation to other words. In yet other
embodiments, a vector can be a collection of values representing a texts
value in relation to other texts.
[0044] In some embodiments, different components of a structured text
document may be converted into separate vectors. In other embodiments, not
19
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
all components of a structured text document are converted into a vector. For
example, if a structured text document has a title, abstract, author, full
text
and metadata, the title and abstract may be converted into one vector, the
full
text into another, the metadata into one or more vectors, and the author is
not
converted into a vector. In other embodiments, different components of a
structured text documents may be combined and converted into a single
vector or may be converted into respective vectors.
[0045] In some embodiments, the system may build a search index 103 using
the vectors of the structured text documents stored in the database. The
search index may be of any suitable type, such as a flat index, a locality
sensitive hash (LSH), an inverted file index (IVF), or a Hierarchical
Navigable
Small World (HNSW) graph.
[0046] In some embodiments, the system uses the vector or vectors of the new
structured text document to search 110 the search index 103. The search
may be performed using any suitable algorithm, such as K-nearest neighbors
or K-means clustering. In embodiments using multiple vectors for structured
text documents, where the vectors are concatenated into one vector before
the search index is built, the multiple vectors of the new structured text
document are likewise concatenated before searching the search index. In
other embodiments where structured text documents are converted into
multiple vectors, and where different search indexes are built for different
types of vectors, each vector of the additional structured text document are
searched separately, and the results ensembled together. Ensembling the
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
results may be as simple as averaging the results, though more complex
methods of ensembling are possible. Based on the search results, the system
identifies the N most similar structured text documents from the database to
the new structured text documents are. N may be any desired number, such
as 10, or 100, based on the needs of the implementers. For example, after
experimentation, the inventors determined that looking at the top 100 most
similar structured text documents and compiling their authors and reviewers
led to a higher chance of overlap. The inventors found that 100 structured
text
documents strike a good balance between good results and manageable data
size and computation cost.
[0047] In some embodiments, the new structured text document 104 may be a
draft, manuscript, a book, an article, a thesis, a dissertation, a monograph,
a
report, a proceeding, a standard, a patent, a preprint, a grant, or other
working text. An abstract may be a summary, synopsis, digest, precis, or
other abridgment of the structured text document. An author may be any
number of individuals or organizations. The new structured text documents
may also have a full text, body, or other content. The new structured text
document may also have metadata, such as citations.
[0048] In some embodiments, the system may select the list of similar
documents 105 based on the search 110 of the search index 103 using the
vector of the new structured text document 104. The list of similar documents
105 may be the N-nearest structured text documents to the new structured
21
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
text document, or it may be all structured text documents within a certain
distance D of the new structured text document.
[0049] Referring now to FIG. 2, further embodiments of the disclosed system
100 are shown for performing the disclosed methods of providing a list of
recommended reviewers for a new structured text document. In one
embodiment, the structured text document database 101 includes information
on the authors and reviewers of the structured text documents it contains.
[0050] The vector calculation 102a and 102b, search index 103, search 110
new structured text document 104, and list of similar documents 105 should
be understood to have the same scope and functionality as disclosed in FIG.
1.
[0051] In some embodiments, the system performs compilation 201 after the list

of similar documents, 105, is generated. For each structured text document
on the list of similar documents 105, the structured text document database is

queried to provide the information on authors and reviewers for each
document. This information is compiled to provide a list of recommended
reviewers for the new structured text document, 202.
[0052] In some embodiments, the system performs compilation 201 by
generating a list, stored as an index or other data structure, consist of all
authors and reviewers of each structured text document on the list of similar
documents. In some embodiments, the list generated as a result of
compilation 201 can consist of a ranked list where authors and reviewers are
listed in order of, for example, the number of times their name appears as an
22
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
authors or reviewers of a document on the list of similar documents 105. In
another embodiment, the list generated by compilation 201 can consist of a
listing of the authors and reviewers in order of how similar the structured
text
document they are associated with is to the new structured text document.
[0053] Referring now to FIG. 3, further embodiments of the disclosed system
100 for performing the disclosed methods are shown. The vector calculation
102, new structured text document 104, and list of similar documents 105
should be understood to have the same scope and functionality as disclosed
in FIG. 1. The compilation 201 and list of recommended reviewers, 202
should be understood to have the same scope and functionality as disclosed
in FIG. 2.
[0054] In the disclosed embodiments consistent with FIG. 3, the search 310
should be understood as being performed using the KNN algorithm for
calculating the N-nearest structured text document vectors to the new
structured text document vector. Distance between vectors can be calculated
using any measure of similarity for comparing vectors, such as the Euclidian
distance, Gaussian distance, or cosine similarity.
[0055] In some embodiments, KNN, or K-nearest neighbor, can be implemented
using any measure of distance between vectors as previously described,
such as Euclidian or Cosine distance. The algorithm is performed by a system
such as system 100. Given a vector and a number of neighbors, K (hence the
name, K nearest neighbor), the algorithm could calculate the distance
between the vector and each vector in the search index. In practice, this is
not
23
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
done to improve search efficiency. Instead, the different types of search
index, such as inverted file index (IVF), or a Hierarchical Navigable Small
World (HNSW) graph break the search index into partitions of vectors, and
KNN operates to search some of those partitions to identify the K-nearest
vectors in the search index to the vector being using to search the search
index. For example, if the search index is an IVF, the partitions are
constructed as a Dirichlet tessellation. In a Dirichlet tessellation, the
partitions
start with centroids, which are fictional vectors placed into the search index
as
dividing points, but not associated with a structured text document. Each
centroid defines a partition consisting of all vectors closer to the centroid
than
any other centroid. Searching the IVF search index with a vector (the search,
or query vector) begins with identifying the centroid closed to the search
vector. Then, the KNN algorithm is used to compute the distances between
each vector in that centroid's partition, and the search vector. The K vectors

in the partition associated with the K smallest distances are reported by the
KNN algorithm as the K-nearest neighbors to the search vector. In some
embodiments, the KNN algorithm is used to compute the distances between
the search vector and the vectors associated with the centroids adjacent to
the centroid closest to the search vector, to avert omitting vectors that are
close to the search vector but in another partition.
[0056] FIG 4 shows a schematic block diagram 400 of a system for performing
the disclosed exemplary embodiment of another method including
computerized systems for identifying similar structured text documents. In
24
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
some embodiments, system 400 involves structured text document database
401, vector calculation 402a, and 402b, search 403, new structured text
document 404, a list of similar documents 405, results table 407 and
similarity
search index 406.
[0057] In some embodiments, system 400 should be understood as a computer
system or similar electronic device capable of manipulating, storing, and
transmitting information or data represented as electronic signals as needed
to perform the disclosed methods. System 400 may be a single computer, or
several computers connected via the internet or other telecommunications
means.
[0058] In some embodiments, the structured text document database 401 may
be implemented as a collection of training data, such as the Microsoft
Academic Graph, or may be implemented using any desired collection of
structured text documents such as a journal's archive. The database may be
implemented through any suitable database management system such as
Oracle, SQL Server, MySQL, PostgreSQL, Microsoft Access, Amazon RDS,
HBase, Cassandra, MongoDB, Neo4J, Redis, Elasticsearch, Snowflake,
BigQuery, or the like.
[0059] In some embodiments, vector calculation 402a and 402b may be
implemented using a natural language processing algorithm with a vector
output. In broad terms, suitable algorithms accept text as input and render a
numerical representation of the input text, known as a vector, as output.
Suitable natural language processing algorithms include examples such as
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
Doc2Vec, GloVe/PCA projection, BERT, SciBERT, SPECTER, or Universal
Sentence Encoder, though a person of ordinary skill in the art may recognize
other possible natural language processing algorithms. A vector, in some
embodiments, can be a mathematical concept with magnitude and direction.
In other embodiments, a vector can be a collection of values representing a
word's meaning in relation to other words. In yet other embodiments, a vector
can be a collection of values representing a text's value in relation to other

texts.
[0060] In some embodiments, different components of a structured text
document may be converted into separate vectors by vector calculations
402a, and 402b. In other embodiments, not all components of a structured
text document are converted into a vector by vector calculations 402a, and
402b. For example, if a structured text document in structured text document
database 401 has a title, abstract, author, full text, and metadata, the title
and
abstract may be converted into one vector, the full text into another,
metadata
into one or more vectors, and the author is not converted into a vector.
[0061] In some embodiments, the similarity search index 406 is constructed by
system 100, which converts each structured text document stored in the
structured text document database 401 into a vector through vector
calculation 402a. The vector for each structured text document is stored in
the
similarity search index. The search index may be of any suitable type, such
as a flat index, a locality sensitive hash (LSH), an inverted file index
(IVF), or
a Hierarchical Navigable Small World (HNSW) graph. In some embodiments,
26
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
the similarity search index 406 may be a implemented using specialized
vector search index tools such as FAISS (Facebook Al Similarity Search) or
Non-Metric Space Library (NMSLIB). In embodiments where structured text
documents are converted into multiple vectors, the multiple vectors of a
structured text document can be concatenated into a single vector before the
search index 406 is built. In other embodiments where structured text
documents are converted into multiple vectors, each type of vector is used to
build a search index 406 for that type of vector.
[0062] In some embodiments, the search 403 is a mathematical operation that
can be performed by system 100 using any measure of similarity for
comparing vectors, such as the Euclidian distance, Gaussian distance, or
cosine similarity.
[0063] In some embodiments, the system uses the vector or vectors of each
structured text document to search the search index 406. The search 403
may be performed using any suitable algorithm, such as K-nearest neighbors
or K-means clustering. In embodiments using multiple vectors for structured
text documents, where the vectors are concatenated into one vector before
the search index is built, the multiple vectors of the structured text
document
are likewise concatenated before searching the search index. In other
embodiments where structured text documents are converted into multiple
vectors, and where different search indexes are built for different types of
vectors, each vector of each structured text document are searched
separately, and the results ensembled together. Ensembling the results may
27
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
be as simple as averaging the results, though more complex methods of
ensembling are possible. Based on the search results, the system identifies
the N most similar structured text documents from the database to each
structured text documents are. N may be any desired number, such as 50,
based on the needs of the implementers. For example, after experimentation,
the inventors determined that looking at the top 50 most similar structured
text
documents led to higher quality lists of similar structured text documents.
The
inventors found that 50 structured text documents strike a good balance
between good results and manageable data size and computation cost.
[0064] In some embodiments, the system may select the list of similar
documents 405 based on the search 403 of the similarity search index for
most similar vectors to the vector of the new structured text document 404.
The list of similar documents 405 may be the N-nearest structured text
documents to each structured text document, or it may be all structured text
documents within a certain distance D of each structured new structured text
document.
[0065] In some embodiments, the results table 407 is constructed by system
100 by aggregating each list of similar documents for each document in the
structured text document database 401. In some embodiments, the results
table 407 may be implemented as a simple array of arrays or list of lists. In
other embodiments, results table 407 may be a more sophisticated data
structure, such as an Oracle, SQL Server, MySQL, PostgreSQL, Microsoft
28
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
Access, Amazon RDS, HBase, Cassandra, MongoDB, Neo4J, Redis,
Elasticsearch, Snowflake, or BigQuery database or data structure store.
[0066] In some embodiments, the new structured text document 404 may be a
draft, manuscript, a book, an article, a thesis, a dissertation, a monograph,
a
report, a proceeding, a standard, a patent, a preprint, a grant, or other
working text. An abstract may be a summary, synopsis, digest, precis, or
other abridgment of the structured text document. An author may be any
number of individuals or organizations. The new structured text documents
may also have a full text, body, or other content. The new structured text
document may also have metadata, such as citations
[0067] In some embodiments, the new structured text document 404 is
converted into one or more vector using vector calculation 402b. Then, the
vector of the new structured text document is searched 403 against the
vectors of each structured text document in the database, which are stored in
the similarity search index 406. Based on that search 403 a list of similar
documents 405 is generated, consistent with the preceding description
generating a list of similar documents 405. The list of similar documents 405
for the new structured text document 404 is added to the results table 407.
[0068] While the present disclosure has been shown and described with
reference to particular embodiments thereof, it will be understood that the
present disclosure can be practiced, without modification, in other
environments. The foregoing description has been presented for purposes of
illustration. It is not exhaustive and is not limited to the precise forms or
29
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
embodiments disclosed. Modifications and adaptations will be apparent to
those skilled in the art from consideration of the specification and practice
of
the disclosed embodiments. Additionally, although aspects of the disclosed
embodiments are described as being stored in memory, one skilled in the art
will appreciate that these aspects can also be stored on other types of
computer readable media, such as secondary storage devices, for example,
hard disks or CD ROM, or other forms of RAM or ROM, USB media, DVD,
Blu-ray, or other optical drive media.
[0069] While illustrative embodiments have been described herein, the scope of

any and all embodiments having equivalent elements, modifications,
omissions, combinations (e.g., of aspects across various embodiments),
adaptations and/or alterations as would be appreciated by those skilled in the

art based on the present disclosure. The limitations in the claims are to be
interpreted broadly based on the language employed in the claims and not
limited to examples described in the present specification or during the
prosecution of the application. The examples are to be construed as non-
exclusive. Furthermore, the steps of the disclosed methods may be modified
in any manner, including by reordering steps and/or inserting or deleting
steps. It is intended, therefore, that the specification and examples be
considered as illustrative only, with a true scope and spirit being indicated
by
the following claims and their full scope of equivalents.
[0070] Computer programs based on the written description and disclosed
methods are within the skill of an experienced developer. Various programs
CA 03172963 2022- 9- 22

PATENT
Attorney Docket No.: 09275.0348-00304
or program modules can be created using any of the techniques known to
one skilled in the art or can be designed in connection with existing
software.
For example, program sections or program modules can be designed in or by
means of .Net Framework, .Net Compact Framework (and related languages,
such as Visual Basic, C, etc.), Python, Java, C/C++, Objective-C, Swift,
HTML, HTML/AJAX combinations, XML, or HTML with included Java applets.
31
CA 03172963 2022- 9- 22

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(86) PCT Filing Date	2022-04-29
(85) National Entry	2022-09-22
Examination Requested	2022-09-22
(87) PCT Publication Date	2022-10-29

Abandonment History

There is no abandonment history.

Maintenance Fee

Last Payment of $125.00 was received on 2024-03-05

Upcoming maintenance fee amounts

Description	Date	Amount
Next Payment if standard fee	2025-04-29	$125.00
Next Payment if small entity fee	2025-04-29	$50.00

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

the reinstatement fee;
the late payment fee; or
additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Request for Examination			$814.37	2022-09-22
Application Fee			$407.18	2022-09-22
Registration of a document - section 124		2022-10-21	$100.00	2022-10-21
Registration of a document - section 124		2022-10-21	$100.00	2022-10-21
Maintenance Fee - Application - New Act	2	2024-04-29	$125.00	2024-03-05

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
AMERICAN CHEMICAL SOCIETY

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
National Entry Request	2022-09-22	1	31
Declaration of Entitlement	2022-09-22	1	18
Patent Cooperation Treaty (PCT)	2022-09-22	1	41
Patent Cooperation Treaty (PCT)	2022-09-22	1	41
Description	2022-09-22	31	1,062
Claims	2022-09-22	12	405
Drawings	2022-09-22	4	44
Correspondence	2022-09-22	2	50
Abstract	2022-09-22	1	13
National Entry Request	2022-09-22	11	291
Change to the Method of Correspondence	2022-10-21	3	75
Representative Drawing	2023-01-25	1	6
Cover Page	2023-01-25	2	44
Abstract	2022-12-01	1	13
Claims	2022-12-01	12	405
Drawings	2022-12-01	4	44
Description	2022-12-01	31	1,062
Missing priority documents - PCT National	2023-02-27	4	121
Examiner Requisition	2024-01-25	4	183

Language selection

Menus

English Abstract

Administrative Status

Abandonment History

Maintenance Fee

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 3172963 Summary

English Abstract

Administrative Status

Abandonment History

Maintenance Fee

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.