Language selection

Search

Patent 3217669 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3217669
(54) English Title: COMMODITY SHORT TITLE GENERATION METHOD AND APPARATUS
(54) French Title: PROCEDE ET APPAREIL DE GENERATION DE TITRE D'ABREGE DE PRODUIT
Status: Examination Requested
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06F 40/258 (2020.01)
  • G06F 16/35 (2019.01)
  • G06F 16/38 (2019.01)
  • G06F 16/951 (2019.01)
  • G06F 40/211 (2020.01)
(72) Inventors :
  • ZHU, BIN (China)
  • SHEN, YI (China)
  • QI, KANG (China)
  • NI, HEQIANG (China)
  • CHEN, SHU (China)
(73) Owners :
  • 10353744 CANADA LTD. (Canada)
(71) Applicants :
  • 10353744 CANADA LTD. (Canada)
(74) Agent: HINTON, JAMES W.
(74) Associate agent:
(45) Issued:
(22) Filed Date: 2020-08-28
(41) Open to Public Inspection: 2021-07-01
Examination requested: 2023-10-25
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
201911373120.5 China 2019-12-27

Abstracts

English Abstract


An apparatus for matching words in a merchandise title library is provided,
including a word
matching unit that performs word segmentation on original merchandise title
data to obtain
plural title words, matches each of the title words with keywords in a
library, and outputs the key
words with matches; a processing unit that sieves out at least two effective
keywords from plural
key words, stitches the keywords into merchandise short-titles; a data
collecting unit that crawls
merchandise title data and collects search term data to construct a data set
that includes original
merchandise title data; a word library unit that categorizes entries in the
data set by merchandise
categories and extracts the keywords to construct the word library; and a word
tagging unit that
tags each keyword in the word library as a modifier or category word,
according to the
keyword's part of speech.


Claims

Note: Claims are shown in the official language in which they were submitted.


1. An apparatus comprising:
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural
title words;
match each of the title words with key words in a word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words;
stitch the effective key words into merchandise short-title according to their

parts of speech;
a data collecting unit, configured to crawl merchandise title data and collect
search term
data, to construct a corpus data set, wherein the corpus data set includes the
original
merchandize title data;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data
set by merchandise categories;
extract the key words to construct the word library; and
a word tagging unit, configured to tag each key word in the word library as a
modifier
word or a category word according to a part of speech of word.
2. The apparatus of claim 1, wherein based on the merchandise category
table, categorizing
corpuses in the corpus data set by the merchandise categories, and extracting
the key words
to construct the word library comprises:
13
Date Recue/Date Received 2023-10-25

based on the merchandise category table, categorizing the corpuses in the
corpus data set
one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to
obtain key
word sets each corresponding to the merchandise category; and
uniting the key words sets to fomi the word library.
3. The apparatus of claim 2, wherein tagging each key word in the word
library as the modifier
word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words
from the
word library by means of manual tagging and tagging corresponding parts of
speech.
4. The apparatus of claim 2, wherein tagging each key word in the word
library as the modifier
word or the category word according to the part of speech of the word
comprises:
extracting the key words that are the modifier words or the category words
from the word
library using a machine tagging model and tagging corresponding parts of
speech.
5. The apparatus of any one of claim 3 to 4, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold,
adding the
key words in the new merchandise title data into corresponding key word sets,
and
tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the
threshold, crawling
the new merchandise title data, perfoming word segmentation on the new
merchandise
title data, and matching resulting words with the key words in the word
library; and
14
Date Recue/Date Received 2023-10-25

wherein based on a semantic recognition technology in a machine model,
extracting the
key words that are the modifier words or the category words from newly crawled

merchandise title data, adding them into the corresponding key word sets, and
tagging the
newly added key words for their corresponding parts of speech.
6. The apparatus of any one of claims 3 to 5, wherein performing word
segmentation on the
original merchandise title data to obtain the plural title words, matching
each of the title
words with the key words in the word library, and outputting the key words
with matches
comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and
sieving out the key words with matches.
7. The apparatus of any of claims 1 to 6, wherein sieving out at least two
effective key words
from the plural key words, and stitching the effective key words into the
merchandise short-
title according to their parts of speech comprises:
recording location information of each of the key words in the original
merchandise title
data;
wherein the key words tagged as the modifier words, there are plural key words
whose
lexical scopes have intersection, only one key word in the intersection is
kept;
wherein the key words tagged as the modifier words, there are plural key words
in which
the lexical scope of one key word contains the lexical scope of another key
word, only
the key word has largest lexical scope is kept;
Date Recue/Date Received 2023-10-25

wherein the key words tagged as the category words have word sense containing
word
sense of any key word tagged as the modifier word, the key word corresponding
to the
modifier word is removed;
defining remaining key words as the effective key words;
stitching the remaining key words into the merchandise short-title according
to locational
sequence;
matching different original merchandise title data with the word library;
performing parallel processing; and
outputting plural corresponding merchandise short-titles.
8. A method comprising:
performing word segmentation on original merchandise title data to obtain
plural title
words;
matching each of the title words with key words in a word library;
outputting the key words with matches;
sieving out at least two effective key words from plural key words;
stitching the effective key words into merchandise short-title according to
their parts of
speech;
crawling merchandise title data and collecting search term data, to construct
a corpus data
set, wherein the corpus data set includes the original merchandize title data;
based on a merchandise category table, categorizing corpuses in the corpus
data set by
merchandise categories;
extracting the key words to construct the word library; and
16
Date Recue/Date Received 2023-10-25

tagging each key word in the word library as a modifier word or a category
word
according to a part of speech of the key word.
9. The method of claim 8, wherein based on the merchandise category table,
categorizing
corpuses in the corpus data set by the merchandise categories, and extracting
the key words
to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the
corpus data set
one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to
obtain key
word sets each corresponding to the merchandise category; and
uniting the key words sets to fomi the word library.
10. The method of claim 9, wherein tagging each key word in the word library
as the modifier
word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words
from the
word library by means of manual tagging and tagging corresponding parts of
speech;
11. The method of claim 9, wherein tagging each key word in the word library
as the modifier
word or the category word according to the part of speech of the word
comprises:
extracting the key words that are the modifier words or the category words
from the word
library using a machine tagging model and tagging corresponding parts of
speech.
12. The method of any one of claim 10 to 11, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
17
Date Recue/Date Received 2023-10-25

wherein a number of the key words have matches is smaller than a threshold,
adding the
key words in the new merchandise title data into corresponding key word sets,
and
tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the
threshold, crawling
the new merchandise title data, perfonning word segmentation on the new
merchandise
title data, and matching resulting words with the key words in the word
library; and
wherein based on a semantic recognition technology in a machine model,
extracting the
key words that are the modifier words or the category words from newly crawled

merchandise title data, adding them into the corresponding key word sets, and
tagging the
newly added key words for their corresponding parts of speech.
13. The method of any one of claims 10 to 12, wherein performing word
segmentation on the
original merchandise title data to obtain the plural title words, matching
each of the title
words with the key words in the word library, and outputting the key words
with matches
comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and
sieving out the key words with matches.
14. The method of any of claims 8 to 13, wherein sieving out at least two
effective key words
from the plural key words, and stitching the effective key words into the
merchandise short-
title according to their parts of speech comprises:
recording location information of each of the key words in the original
merchandise title
data;
18
Date Recue/Date Received 2023-10-25

wherein the key words tagged as the modifier words, there are plural key words
whose
lexical scopes have intersection, only one key word in the intersection is
kept;
wherein the key words tagged as the modifier words, there are plural key words
in which
the lexical scope of one key word contains the lexical scope of another key
word, only
the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing
word
sense of any key word tagged as the modifier word, the key word corresponding
to the
modifier word is removed;
defining remaining key words as the effective key words;
stitching the remaining key words into the merchandise short-title according
to locational
sequence;
matching different original merchandise title data with the word library;
performing parallel processing; and
outputting plural corresponding merchandise short-titles.
15. A computer readable physical memory having stored thereon a computer
program executed
by a computer configured to:
perform word segmentation on original merchandise title data to obtain plural
title words;
match each of the title words with key words in a word library;
output the key words with matches;
sieve out at least two effective key words from plural key words;
stitch the effective key words into merchandise short-title according to their
parts of
speech;
19
Date Recue/Date Received 2023-10-25

crawl merchandise title data and collecting search term data, to construct a
corpus data
set, wherein the corpus data set includes the original merchandize title data;
based on a merchandise category table, categorize corpuses in the corpus data
set by
merchandise categories;
extract the key words to construct the word library; and
tag each key word in the word library as a modifier word or a category word
according to
a part of speech of the key word.
16. The memory of claim 15, wherein based on the merchandise category table,
categorizing
corpuses in the corpus data set by the merchandise categories, and extracting
the key words
to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the
corpus data set
one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to
obtain key
word sets each corresponding to the merchandise category; and
uniting the key words sets to form the word library; and
wherein tagging each key word in the word library as the modifier word or the
category
word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words
from the
word library by means of manual tagging and tagging corresponding parts of
speech.
17. The memory of claim 16, wherein tagging each key word in the word library
as the modifier
word or the category word according to the part of speech of the word
comprises:
extracting the key words that are the modifier words or the category words
from the word
library using a machine tagging model and tagging corresponding parts of
speech.
Date Recue/Date Received 2023-10-25

18. The memory of any one of claim 16 to 17, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold,
adding the
key words in the new merchandise title data into corresponding key word sets,
and
tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the
threshold, crawling
the new merchandise title data, perfoming word segmentation on the new
merchandise
title data, and matching resulting words with the key words in the word
library; and
wherein based on a semantic recognition technology in a machine model,
extracting the
key words that are the modifier words or the category words from newly crawled

merchandise title data, adding them into the corresponding key word sets, and
tagging the
newly added key words for their corresponding parts of speech.
19. The memory of any one of claims 16 to 18, wherein perfoming word
segmentation on the
original merchandise title data to obtain the plural title words, matching
each of the title
words with the key words in the word library, and outputting the key words
with matches
comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and
sieving out the key words with matches.
21
Date Recue/Date Received 2023-10-25

20. The memory of any of claims 15 to 19, wherein sieving out at least two
effective key words
from the plural key words, and stitching the effective key words into the
merchandise short-
title according to their parts of speech comprises:
recording location information of each of the key words in the original
merchandise title
data;
wherein the key words tagged as the modifier words, there are plural key words
whose
lexical scopes have intersection, only one key word in the intersection is
kept;
wherein the key words tagged as the modifier words, there are plural key words
in which
the lexical scope of one key word contains the lexical scope of another key
word, only
the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing
word
sense of any key word tagged as the modifier word, the key word corresponding
to the
modifier word is removed;
defining remaining key words as the effective key words;
stitching the remaining key words into the merchandise short-title according
to locational
sequence;
matching different original merchandise title data with the word library;
performing parallel processing; and
outputting plural corresponding merchandise short-titles.
22
Date Recue/Date Received 2023-10-25

Description

Note: Descriptions are shown in the official language in which they were submitted.


COMMODITY SHORT TITLE GENERATION METHOD AND APPARATUS
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to the technical field of text
abstracting, and more
particularly to a method and an apparatus for generating merchandise short-
titles.
Description of Related Art
[0002] Merchandise short-titles are generally formed by compressing a standard-
length titles of
merchandise items. As implied in the name, short-titles are simple, concise,
and short.
The purpose of short-titles is to describe key information of merchandise
items with the
least possible words so that users can get such key information at a glance.
An example
of a short-title is "Korean-cutting all-over print dress." This can be
regarded as a special
text abstracting technology in the sense of natural language processing.
[0003] The traditional text abstracting techniques, such as TextRank, and Lead-
3, are about
abstracting sentences from articles, and are not really suitable for
generation of
merchandise titles. With the rapid development of deep learning, various deep-
learning
models, like seq2seq and pointer-generation, can be used to generate
compressed short-
titles. However, without sufficient short-title trained corpus, these models
are not
applicable to practical applications, particularly for generation of
merchandise titles.
SUMMARY OF THE INVENTION
[0004] The objective of the present invention is to provide a method and an
apparatus for
generating a merchandise short-titles, which can generate merchandise short-
titles with
improved efficiency and precision.
[0005] For achieving the foregoing objective, in a first aspect, the present
invention provides a
method for generating a merchandise short-title, which comprises:
[0006] crawling merchandise title data and/or collecting search term data, so
as to construct a
1
Date Recue/Date Received 2023-10-25

corpus data set;
[0007] based on a merchandise category table, categorizing corpuses in the
corpus data set by
merchandise categories, and then extracting key words to construct a word
library;
[0008] tagging each key word in the word library as either a modifier word or
a category word
according to a part of speech of the word;
[0009] performing word segmentation on the original merchandise title data so
as to obtain plural
title words, matching each of the title words with the key words in the word
library,
respectively, and outputting the key words that have matches; and
[0010] sieving out at least two effective key words from the plural key words,
and stitching the
effective key words into the merchandise short-title according to their parts
of speech.
[0011] Preferably, the step of based on a merchandise category table,
categorizing corpuses in
the corpus data set by merchandise categories, and then extracting key words
to construct
a word library comprises:
[0012] based on the merchandise category table, categorizing the corpuses in
the corpus data set
one by one according to the merchandise categories;
[0013] performing word segmentation on the corpuses, respectively, so as to
obtain the plural
key words, and de-duplicating and then filtering the key words in every
merchandise
category so as to obtain key word sets each corresponding to a said
merchandise category;
and
[0014] uniting the plural key words sets to form the word library.
[0015] More preferably, the step of tagging each key word in the word library
as either a modifier
word or a category word according to a part of speech of the word comprises:
[0016] extracting the key words that are the modifier words or the category
words from the word
library by means of manual tagging and tagging the corresponding parts of
speech; and/or
[0017] extracting the key words that are the modifier words or the category
words from the word
library using a machine tagging model and tagging the corresponding parts of
speech
using a machine tagging model.
[0018] Further, after the step of extracting the key words that are the
modifier words or the
category words from the word library by means of manual tagging and tagging
the
2
Date Recue/Date Received 2023-10-25

corresponding parts of speech, the method further comprises:
[0019] crawling new merchandise title data, performing word segmentation
thereon, and
matching resulting words with the key words in the word library;
[0020] if a number of the key words that have matches is smaller than a
threshold, adding the
key words in the new merchandise title data into the corresponding key word
sets, and
tagging the newly added key words for their parts of speech; or
[0021] if the number of the key words that have matches is greater than the
threshold, crawling
new merchandise title data, performing word segmentation thereon, and matching

resulting words with the key words in the word library again.
[0022] Preferably, after the step of extracting the key words that are the
modifier words or the
category words from the word library using a machine tagging model and tagging
the
corresponding parts of speech using a machine tagging model, the method
further
comprises:
[0023] based on a semantic recognition technology in the machine model,
extracting the key
words that are the modifier words or the category words from the newly crawled

merchandise title data, adding them into the corresponding key word sets, and
tagging
the newly added key words for their corresponding parts of speech.
[0024] Preferably, the step of performing word segmentation on the original
merchandise title
data so as to obtain plural title words, matching each of the title words with
the key words
in the word library, respectively, and outputting the key words that have
matches
comprises:
[0025] recognizing the merchandise categories in the original merchandise
title data, and
matching them with the corresponding key word sets; and
[0026] segmenting the original merchandise title data into the plural title
words, matching each
of the title words with the key words in the corresponding key word set, and
sieving out
the key words that have matches.
[0027] Preferably, the step of sieving out at least two effective key words
from the plural key
words, and stitching the effective key words into the merchandise short-title
according to
their parts of speech comprises:
3
Date Recue/Date Received 2023-10-25

[0028] recording location information of each of the key words in the original
merchandise title
data;
[0029] if in the key words tagged as the modifier words, there are plural said
key words whose
lexical scopes have intersection, only one said key word in the intersection
is kept;
[0030] if in the key words tagged as the modifier words, there are plural said
key words in which
the lexical scope of one said key word contains the lexical scope of another
said key word,
only the key word has the largest lexical scope is kept;
[0031] if the key words tagged as the category words have word sense
containing word sense of
any said key word tagged as the modifier word, the key word corresponding to
the
modifier word is removed; and
[0032] defining the left key words as the effective key words, and stitching
them into the
merchandise short-title according to locational sequence thereof.
[0033] Optionally, matching the different original merchandise title data with
the word library,
respectively, performing parallel processing, and outputting plural
corresponding
merchandise short-titles.
[0034] Exemplarily, the search term data represent a collection of search
terms to be input by a
user for searching for a merchandise item.
[0035] As compared to the prior art, the method for generating merchandise
short-titles of the
present invention provides the following beneficial effects:
[0036] In the method for generating merchandise short-titles according to the
present invention,
a corpus data set is first constructed. Then, based on the merchandise
category table,
corpuses in the corpus data set as categorized. From the categorized corpuses,
key words
are extracted to form a word library. Every key word in the word library is
tagged as a
modifier word or a category word according to its part of speech. The word
library is so
established. Afterward, original merchandise title data are acquired and to be
compressed.
The original merchandise title data are segmented to obtain plural title
words. These title
words are entered into the word library to be matched with the key words. From
the key
words that have matches, at least two effective key words are sieved out, and
stitched into
a merchandise short-title according to the order of their parts of speech.
4
Date Recue/Date Received 2023-10-25

[0037] It is thus clear that the present invention categorizes corpuses before
tagging them,
thereby effectively reducing difficulty of the tagging process and tagging key
words more
efficiency. By segmenting the original merchandise title data and directly
matching the
data with the key words in the word library, the sieved and stitched
merchandise short-
title is more precise.
[0038] In another aspect, the present invention provides an apparatus for
generating merchandise
short-titles, which is applied with the method for generating merchandise
short-titles as
described above. The apparatus comprises:
[0039] a data collecting unit, for crawling merchandise title data and/or
collecting search term
data, so as to construct a corpus data set;
[0040] a word library unit, for based on a merchandise category table,
categorizing corpuses in
the corpus data set by merchandise categories, and then extracting key words
to construct
a word library;
[0041] a word tagging unit, for tagging each key word in the word library as
either a modifier
word or a category word according to a part of speech of the word;
[0042] a word matching unit, for performing word segmentation on the original
merchandise title
data so as to obtain plural title words, matching each of the title words with
the key words
in the word library, respectively, and outputting the key words that have
matches; and
[0043] a processing unit, for sieving out at least two effective key words
from the plural key
words, and stitching the effective key words into the merchandise short-title
according to
their parts of speech.
[0044] As compared to the prior art, the disclosed apparatus for generating
merchandise short-
titles provides beneficial effects that are similar to those provided by the
method for
generating merchandise short-titles as enumerated above, and thus no
repetitions are
made herein.
[0045] In a third aspect, the present invention provides a computer-readable
storage medium, in
which a computer program is stored. When run by a processor, the computer
program
executes the steps of the method for generating merchandise short-titles as
described
above.
Date Recue/Date Received 2023-10-25

[0046] As compared to the prior art, the disclosed computer-readable storage
medium provides
beneficial effects that are similar to those provided by the method for
generating
merchandise short-titles as enumerated above, and thus no repetitions are made
herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0047] The accompanying drawing is provided herein for better understanding of
the present
invention and form a part of this disclosure. The illustrative embodiments and
their
descriptions are for explaining the present invention and by no means form any
improper
limitation to the present invention, wherein:
[0048] FIG. 1 is a flowchart of a method for generating merchandise short-
titles according to a
first embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0049] To make the foregoing objectives, features, and advantages of the
present invention
clearer and more understandable, the following description will be directed to
some
embodiments as depicted in the accompanying drawings to detail the technical
schemes
disclosed in these embodiments. It is, however, to be understood that the
embodiments
referred herein are only a part of all possible embodiments and thus not
exhaustive. Based
on the embodiments of the present invention, all the other embodiments can be
conceived
without creative labor by people of ordinary skill in the art, and all these
and other
embodiments shall be encompassed in the scope of the present invention.
[0050] Embodiment 1
[0051] Referring to FIG. 1, the present embodiment provides a method for
generating a
merchandise short-title, comprising:
[0052] crawling merchandise title data and/or collecting search term data, so
as to construct a
corpus data set; based on a merchandise category table, categorizing corpuses
in the
corpus data set by merchandise categories, and then extracting key words to
construct a
word library; tagging each key word in the word library as either a modifier
word or a
6
Date Recue/Date Received 2023-10-25

category word according to a part of speech of the word; performing word
segmentation
on the original merchandise title data so as to obtain plural title words,
matching each of
the title words with the key words in the word library, respectively, and
outputting the
key words that have matches; and sieving out at least two effective key words
from the
plural key words, and stitching the effective key words into the merchandise
short-title
according to their parts of speech.
[0053] In the method for generating merchandise short-titles according to the
present
embodiment, a corpus data set is first constructed. Then, based on the
merchandise
category table, corpuses in the corpus data set as categorized. From the
categorized
corpuses, key words are extracted to form a word library. Every key word in
the word
library is tagged as a modifier word or a category word according to its part
of speech.
The word library is so established. Afterward, original merchandise title data
are acquired
and to be compressed. The original merchandise title data are segmented to
obtain plural
title words. These title words are entered into the word library to be matched
with the key
words. From the key words that have matches, at least two effective key words
are sieved
out, and stitched into a merchandise short-title according to the order of
their parts of
speech.
[0054] It is thus clear that the present invention categorizes corpuses before
tagging them,
thereby effectively reducing difficulty of the tagging process and tagging key
words more
efficiency. By segmenting the original merchandise title data and directly
matching the
data with the key words in the word library, the sieved and stitched
merchandise short-
title is more precise.
[0055] It is to be noted that the data of the corpus data sets are obtained by
crawling the
merchandise title data and collecting the search term data. For crawling the
merchandise
title data, it is important to crawl merchandise short-titles from major e-
commerce
platforms. For collecting the search term data, search terms used for
searching for various
merchandise items, namely query data, are gathered.
[0056] In the embodiment, the step of based on a merchandise category table,
categorizing
corpuses in the corpus data set by merchandise categories, and then extracting
key words
7
Date Recue/Date Received 2023-10-25

to construct a word library comprises:
[0057] based on the merchandise category table, categorizing the corpuses in
the corpus data set
one by one according to the merchandise categories; performing word
segmentation on
the corpuses, respectively, so as to obtain the plural key words, and de-
duplicating and
then filtering the key words in every merchandise category so as to obtain key
word sets
each corresponding to a said merchandise category; and uniting the plural key
words sets
to form the word library.
[0058] Since tagging corpuses directly represents a prodigious workload, for
reducing difficulty
and improving efficiency of the tagging task, it is desired to categorize
corpuses in the
corpus data set according to a merchandise category table (e.g., a quaternary
merchandise
group). For example, the categories may include a clothes corpus group, a
pants corpus
group, a mobile phone corpus group, etc. Then the categorized corpuses are
segmented
so that every category group is formed by plural key words. Those irrelevant
key words
are filtered out (denoising key words), and the key words in every category
group are de-
duplicated, so as to ensure every key word is unique in its group. Eventually,
key word
sets are formed and each correspond to a category group. By uniting all the
key word sets,
the word library is formed.
[0059] In the embodiment, the step of tagging each key word in the word
library as either a
modifier word or a category word according to a part of speech of the word
comprises:
[0060] extracting the key words that are the modifier words or the category
words from the word
library by means of manual tagging and tagging the corresponding parts of
speech; and/or
extracting the key words that are the modifier words or the category words
from the word
library using a machine tagging model and tagging the corresponding parts of
speech
using a machine tagging model.
[0061] As implied in the name, manual tagging refers to manually determining
whether a key
word in the word library is a modifier word or a category word, and manually
tagging the
key word. Differently, a machine tagging model implements automatically
recognizing
and tagging techniques. When the number of key words in the word library is
huge, such
a machine model is effective in improving tagging efficiency. However, as
demonstrated
8
Date Recue/Date Received 2023-10-25

in practice, while a machine model provides high efficiency, its tagging
results are less
precise than those from manual operation. Therefore, it is preferred to
combine the two
solutions for tagging key words in the word library. For example, a machine
model is first
used to pre-tag numerous key words, and then manual verification is performed,
so as to
balance and maximize efficiency and precision of key-word tagging.
[0062] after the step of extracting the key words that are the modifier words
or the category
words from the word library by means of manual tagging and tagging the
corresponding
parts of speech, the method further comprises:
[0063] crawling new merchandise title data, performing word segmentation
thereon, and
matching resulting words with the key words in the word library; if a number
of the key
words that have matches is smaller than a threshold, adding the key words in
the new
merchandise title data into the corresponding key word sets, and tagging the
newly added
key words for their parts of speech; or if the number of the key words that
have matches
is greater than the threshold, crawling new merchandise title data, performing
word
segmentation thereon, and matching resulting words with the key words in the
word
library again.
[0064] The objective of the embodiment is to increase word sources for the
word library. By
keeping acquiring new merchandise title data, the robustness of the key words
in the word
library can be evaluated. Specifically, word segmentation is performed on the
merchandise title data, and the results are filtered so that only those key
words whose
parts of speech are identified as modifier words and category words are kept.
When the
number of the left key words and the number of the key words in the word
library are
smaller than a threshold, it indicates that the key words in the word library
are not robust
enough. At this time, the key words in the merchandise title data that do not
have matches
are supplemented into the corresponding key word sets. The newly added key
words are
tagged by their parts of speech. On the contrary, if the number of the left
key words and
the number of the key words in the word library are greater than the
threshold, it indicates
that the collection of the key words in the word library is competent to deal
with the
current merchandise title data. Thus, a user can continue to crawl new
merchandise title
9
Date Recue/Date Received 2023-10-25

data and repeat the foregoing process to continuously assess the word library.

Exemplarily, the threshold is 3.
[0065] after the step of extracting the key words that are the modifier words
or the category
words from the word library using a machine tagging model and tagging the
corresponding parts of speech using a machine tagging model, the method
further
comprises:
[0066] based on a semantic recognition technology in the machine model,
extracting the key
words that are the modifier words or the category words from the newly crawled

merchandise title data, adding them into the corresponding key word sets, and
tagging
the newly added key words for their corresponding parts of speech.
[0067] Optionally, the machine model may be a BiLSTM+CRF deep learning model.
By using
such a deep learning model to extract the key words that are modifier words or
category
words from the newly crawled merchandise title data, tagging the key words and
adding
them into the corresponding key word sets, the deep learning model
demonstrates great
adaptivity and can automatically recognizing category words and modifiers in
the
merchandise title according to contextual information.
[0068] Further, in the embodiment, the step of performing word segmentation on
the original
merchandise title data so as to obtain plural title words, matching each of
the title words
with the key words in the word library, respectively, and outputting the key
words that
have matches comprises:
[0069] recognizing the merchandise categories in the original merchandise
title data, and
matching them with the corresponding key word sets; and segmenting the
original
merchandise title data into the plural title words, matching each of the title
words with
the key words in the corresponding key word set, and sieving out the key words
that have
matches.
[0070] Preferably, multiple different original merchandise title data may be
acquired at the same
time and matched with the word library, respectively. Then parallel processing
is
performed to output plural merchandise short-titles.
[0071] In practical implementations, merchandise categories in different
original merchandise
Date Recue/Date Received 2023-10-25

title data can be recognized at the same time and have respective matched key
word sets.
The original merchandise title data are segmented into plural title words.
Then each of
the title words is matched with the key words in the corresponding key word
set, and the
key words have matches in the original merchandise title data are sieved out.
[0072] Further, in the embodiment, the step of sieving out at least two
effective key words from
the plural key words, and stitching the effective key words into the
merchandise short-
title according to their parts of speech comprises:
[0073] recording location information of each of the key words in the original
merchandise title
data; if in the key words tagged as the modifier words, there are plural said
key words
whose lexical scopes have intersection, only one said key word in the
intersection is kept;
if in the key words tagged as the modifier words, there are plural said key
words in which
the lexical scope of one said key word contains the lexical scope of another
said key word,
only the key word has the largest lexical scope is kept; if the key words
tagged as the
category words have word sense containing word sense of any said key word
tagged as
the modifier word, the key word corresponding to the modifier word is removed;
and
defining the left key words as the effective key words, and stitching them
into the
merchandise short-title according to locational sequence thereof. In practical

implementations, the key words tagged as the category words in the original
merchandise
title data are processed first.
[0074] It is understandable that, according to the word count of the
merchandise short-title,
modifier key words and category key words satisfying preset criteria can be
found and
then they can be stitched together according to their locational sequence, so
as to form a
fluent merchandise short-title. The described embodiment is for explaining how
to
generate a merchandise short-title from original merchandise title data. If
there are
different original merchandise title data, the foregoing process may be
repeated as many
times as required, thereby facilitating batch generation of merchandise short-
titles.
[0075] Embodiment 2
[0076] The present embodiment provides an apparatus for generating merchandise
short-titles,
11
Date Recue/Date Received 2023-10-25

comprising:
[0077] a data collecting unit, for crawling merchandise title data and/or
collecting search term
data, so as to construct a corpus data set;
[0078] a word library unit, for based on a merchandise category table,
categorizing corpuses in
the corpus data set by merchandise categories, and then extracting key words
to construct
a word library;
[0079] a word tagging unit, for tagging each key word in the word library as
either a modifier
word or a category word according to a part of speech of the word;
[0080] a word matching unit, for performing word segmentation on the original
merchandise title
data so as to obtain plural title words, matching each of the title words with
the key words
in the word library, respectively, and outputting the key words that have
matches; and
[0081] a processing unit, for sieving out at least two effective key words
from the plural key
words, and stitching the effective key words into the merchandise short-title
according to
their parts of speech.
[0082] As compared to the prior art, the disclosed apparatus for generating
merchandise short-
titles provides beneficial effects that are similar to those provided by the
disclosed smart
method for generating merchandise short-titles as enumerated above, and thus
no
repetitions are made herein.
[0083] Embodiment 3
[0084] The present embodiment provides a computer-readable storage medium, in
which a
computer program is stored. When run by a processor, the computer program
executes
the steps of the method for generating merchandise short-titles as described
previously.
[0085] As compared to the prior art, the disclosed computer-readable storage
medium provides
beneficial effects that are similar to those provided by the disclosed smart
method for
generating merchandise short-titles as enumerated above, and thus no
repetitions are
made herein.
[0086] As will be appreciated by people of ordinary skill in the art,
implementation of all or a
part of the steps of the method of the present invention as described
previously may be
12
Date Recue/Date Received 2023-10-25

realized by having a program instruct related hardware components. The program
may
be stored in a computer-readable storage medium, and the program is about
performing
the individual steps of the methods described in the foregoing embodiments.
The storage
medium may be a ROM/RAM, a hard drive, an optical disk, a memory card or the
like.
[0087] The present invention has been described with reference to the
preferred embodiments
and it is understood that the embodiments are not intended to limit the scope
of the present
invention. Moreover, as the contents disclosed herein should be readily
understood and
can be implemented by a person skilled in the art, all equivalent changes or
modifications
which do not depart from the concept of the present invention should be
encompassed by
the appended claims. Hence, the scope of the present invention shall only be
defined by
the appended claims.
13
Date Recue/Date Received 2023-10-25

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(22) Filed 2020-08-28
(41) Open to Public Inspection 2021-07-01
Examination Requested 2023-10-25

Abandonment History

There is no abandonment history.

Maintenance Fee

Last Payment of $100.00 was received on 2023-12-15


 Upcoming maintenance fee amounts

Description Date Amount
Next Payment if small entity fee 2025-08-28 $100.00
Next Payment if standard fee 2025-08-28 $277.00

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Advance an application for a patent out of its routine order 2023-10-25 $526.29 2023-10-25
DIVISIONAL - MAINTENANCE FEE AT FILING 2023-11-03 $200.00 2023-10-25
Filing fee for Divisional application 2023-11-03 $421.02 2023-10-25
DIVISIONAL - REQUEST FOR EXAMINATION AT FILING 2024-08-28 $816.00 2023-10-25
Maintenance Fee - Application - New Act 4 2024-08-28 $100.00 2023-12-15
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
10353744 CANADA LTD.
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Representative Drawing 2023-12-12 1 80
Cover Page 2023-12-12 1 112
Examiner Requisition 2023-12-28 5 245
Claims 2024-04-29 9 481
Amendment 2024-04-29 28 1,069
New Application 2023-10-25 9 272
Abstract 2023-10-25 1 22
Claims 2023-10-25 10 371
Description 2023-10-25 13 637
Drawings 2023-10-25 1 159
Divisional - Filing Certificate 2023-11-06 2 212
Acknowledgement of Grant of Special Order 2023-11-29 1 185