Language selection

Search

Patent 3144122 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3144122
(54) English Title: DATA VERIFYING METHOD, DEVICE AND SYSTEM
(54) French Title: METHODE, DISPOSITIF ET SYSTEME DE VERIFICATION DE DONNEES
Status: Deemed Abandoned
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06F 07/00 (2006.01)
  • G06F 16/00 (2019.01)
(72) Inventors :
  • CAO, HAIYANG (China)
  • WANG, ZHENZHEN (China)
  • SUN, QIAN (China)
  • GUO, WENPING (China)
  • XU, WEI (China)
(73) Owners :
  • 10353744 CANADA LTD.
(71) Applicants :
  • 10353744 CANADA LTD. (Canada)
(74) Agent: JAMES W. HINTONHINTON, JAMES W.
(74) Associate agent:
(45) Issued:
(22) Filed Date: 2021-12-29
(41) Open to Public Inspection: 2022-06-30
Examination requested: 2022-09-16
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
202011625467.7 (China) 2020-12-31

Abstracts

English Abstract


Pertaining to the field of big data processing technology, the present
invention makes public a
data verifying method, and corresponding device and system. The method
comprises: creating a
data verifying task based on an offline data task, the offline data task
including extracting target
data from a source database and writing the target data in a target database;
determining an
execution order of the data verifying task relative to the offline data task;
executing the data
verifying task according to the execution order; and judging during execution
whether the target
data is abnormal according to an abnormality judging condition, if abnormality
is verified,
interrupting the data verifying task and generating verification information,
and, after receiving
data amendment information provided by a user according to the verification
information,
continuing to execute the data verifying task.


Claims

Note: Claims are shown in the official language in which they were submitted.


CLAIMS
What is claimed is:
1. A data verifying method, characterized in comprising:
creating a data verifying task based on an offline data task, wherein the
offline data task
includes extracting target data from a source database and writing the target
data in a target
database;
determining an execution order of the data verifying task relative to the
offline data task;
executing the data verifying task according to the execution order; and
judging during execution whether the target data is abnormal according to an
abnormality
judging condition, if abnormality is verified, interrupting the data verifying
task and
generating verification information, and, after receiving data amendment
information
provided by a user according to the verification information, continuing to
execute the data
verifying task.
2. The method according to Claim 1, characterized in that the step of
executing the data
verifying task according to the execution order includes:
if the data verifying task is a task being executed, extracting the target
data from the source
database and writing the target data in a temporary database, and performing
synchronous
data verification on the target data in the temporary database;
if the target data passes verification, synchronously writing in the target
database the target
data in the temporary database, and deleting the temporary database after the
target data
extracted from the source database has all passed verification and been
written in the target
database; and
if the target data does not pass verification, deleting the temporary
database.
3. The method according to Claim 1, characterized in that the step of
executing the data
verifying task according to the execution order includes:
24
Date Recue/Date Received 2021-12-29

if the data verifying task is a predecessor task, executing the data verifying
task in the source
database before the target data is extracted; if the target data passes
verification, extracting it
from the source database and writing it in the target database.
4. The method according to Claim 1, characterized in that the step of
executing the data
verifying task according to the execution order includes:
if the data verifying task is a successor task, executing the data verifying
task in the target
database after the target data has been extracted from the source database and
written in the
target database.
5. The method according to anyone of Claims 1 to 4, characterized in
that the step of creating a
data verifying task based on an offline data task includes:
obtaining the offline data task;
judging whether the offline data task has a corresponding data verifying rule,
if yes,
configuring the data verifying rule for the offline data task, and obtaining
resource metadata
and a verification parameter table; and
creating the data verifying task according to the data verifying rule, the
resource metadata
and the verification parameter table, wherein the data verifying rule includes
the abnormality
judging condition and the execution order.
6. The method according to Claim 5, characterized in that the step ofjudging
whether the offline
data task has a corresponding data verifying rule includes:
reading a verification rule table and a task ID of the offline data task,
wherein the verification
rule table contains the data verifying rule to which various task IDs
correspond; and
matching the task IDs with the verification rule table, and determining the
data verifying rule
to which the offline data task corresponds.
7. The method according to Claim 5, characterized in that the generation of
the verification
parameter table includes:
Date Recue/Date Received 2021-12-29

obtaining tables and/or fields contained in the source database and the target
database to
which the offline data task corresponds; and
generating a verification parameter table corresponding to the offline data
task according to
verification parameters configured by the user for the tables and/or fields.
8. The method according to Claim 7, characterized in that the step of
obtaining tables and/or
fields contained in the source database and the target database to which the
offline data task
corresponds includes: automatically obtaining and analyzing a task script of
the offline data
task, if analysis succeeds, obtaining tables and/or fields contained in the
source database and
the target database, if analysis fails, receiving tables and/or fields input
by the user.
9. A data verifying device, characterized in comprising:
a data verifying task obtaining module, for obtaining a data verifying task
based on an offline
data task, the offline data task including extracting target data from a
source database and
writing the target data in a target database;
an execution order judging module, for determining an execution order of the
data verifying
task relative to the offline data task; and
a verifying module, for executing the data verifying task according to the
execution order,
judging during execution whether the target data is abnormal according to an
abnormality
judging condition, if abnormality is verified, interrupting the data verifying
task and
generating verification information, and, after receiving data amendment
information
provided by a user according to the verification information, continuing to
execute the data
verifying task.
10. A computer system, characterized in comprising:
one or more processor(s); and
a memory, associated with the one or more processor(s), wherein the memory is
employed to
store a program instruction, and the program instruction executes the method
according to
anyone of Claims 1 to 8 when it is read and executed by the one or more
processor(s).
26
Date Recue/Date Received 2021-12-29

Description

Note: Descriptions are shown in the official language in which they were submitted.


DATA VERIFYING METHOD, DEVICE AND SYSTEM
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to the field of big data processing
technology, and more
particularly to a data verifying method, and corresponding device and system.
Description of Related Art
[0002] Data warehouse storage technique (ETL, Extract-Transform-Load) is a
technique that
extracts, cleans and transforms, then loads data of business systems into a
data warehouse
for storage and administration to provide basic data for subsequent online
analytical
processing and data mining. In order to ensure the quality of the incoming
data, data
verification should be performed on the data extracted from a data source
before the data
is disposed in the data warehouse. Data verification is mainly directed to the
verification
of data type, and valuation range of data, and such bad point data as invalid
and repetitive
data, as well as the checking on uniqueness, relevance, consistency,
precision, single
fields, and statistic types of record rows, etc. In the state of the art,
since the quality
appraisal criteria are different for different data, a new verifying method
should be
introduced for everyday data verification, but with increasing data volume of
incoming
data, the pressure on data verification is also increasingly greater, and it
is therefore
required to take a technical solution enabling quick data verification into
consideration.
SUMMARY OF THE INVENTION
[0003] In order to overcome the problems pending in the state of the art,
embodiments of the
present invention provide a data verifying method, and corresponding device
and system.
1
Date Recue/Date Received 2021-12-29

The technical solutions are as follows:
[0004] According to the first aspect, there is provided a data verifying
method that comprises:
[0005] creating a data verifying task based on an offline data task, the
offline data task including
extracting target data from a source database and writing the target data in a
target
database;
[0006] determining an execution order of the data verifying task relative to
the offline data task;
[0007] executing the data verifying task according to the execution order; and
[0008] judging during execution whether the target data is abnormal according
to an abnormality
judging condition, if abnormality is verified, interrupting the data verifying
task and
generating verification information, and, after receiving data amendment
information
provided by a user according to the verification information, continuing to
execute the
data verifying task.
[0009] Further, the step of executing the data verifying task according to the
execution order
includes:
[0010] if the data verifying task is a task being executed, extracting the
target data from the
source database and writing the target data in a temporary database, and
performing
synchronous data verification on the target data in the temporary database;
[0011] if the target data passes verification, synchronously writing in the
target database the
target data in the temporary database, and deleting the temporary database
after the target
data extracted from the source database has all passed verification and been
written in the
target database; and
[0012] if the target data does not pass verification, deleting the temporary
database.
[0013] Further, the step of executing the data verifying task according to the
execution order
includes:
[0014] if the data verifying task is a predecessor task, executing the data
verifying task in the
source database before the target data is extracted; if the target data passes
verification,
2
Date Recue/Date Received 2021-12-29

extracting it from the source database and writing it in the target database.
[0015] Further, the step of executing the data verifying task according to the
execution order
includes:
[0016] if the data verifying task is a successor task, executing the data
verifying task in the target
database after the target data has been extracted from the source database and
written in
the target database.
[0017] Further, the step of creating a data verifying task based on an offline
data task includes:
[0018] obtaining the offline data task;
[0019] judging whether the offline data task has a corresponding data
verifying rule, if yes,
configuring the data verifying rule for the offline data task, and obtaining
resource
metadata and a verification parameter table; and
[0020] creating the data verifying task according to the data verifying rule,
the resource metadata
and the verification parameter table, wherein the data verifying rule includes
the
abnormality judging condition and the execution order.
[0021] Further, the step of judging whether the offline data task has a
corresponding data
verifying rule includes:
[0022] reading a verification rule table and a task ID of the offline data
task, wherein the
verification rule table contains the data verifying rule to which various task
IDs
correspond; and
[0023] matching the task IDs with the verification rule table, and determining
the data verifying
rule to which the offline data task corresponds.
[0024] Further, generation of the verification parameter table includes:
[0025] obtaining tables and/or fields contained in the source database and the
target database to
which the offline data task corresponds; and
[0026] generating a verification parameter table corresponding to the offline
data task according
3
Date Recue/Date Received 2021-12-29

to verification parameters configured by the user for the tables and/or
fields.
[0027] Further, the step of obtaining tables and/or fields contained in the
source database and the
target database to which the offline data task corresponds includes:
[0028] automatically obtaining and analyzing a task script of the offline data
task, if analysis
succeeds, obtaining tables and/or fields contained in the source database and
the target
database, if analysis fails, receiving tables and/or fields input by the user.
[0029] According to the second aspect, there is provided a data verifying
device that comprises:
[0030] a data verifying task obtaining module, for obtaining a data verifying
task based on an
offline data task, the offline data task including extracting target data from
a source
database and writing the target data in a target database;
[0031] an execution order judging module, for determining an execution order
of the data
verifying task relative to the offline data task; and
[0032] a verifying module, for executing the data verifying task according to
the execution order,
judging during execution whether the target data is abnormal according to an
abnormality
judging condition, if abnormality is verified, interrupting the data verifying
task and
generating verification information, and, after receiving data amendment
information
provided by a user according to the verification information, continuing to
execute the
data verifying task.
[0033] Further, the verifying module is specifically employed for:
[0034] if it is judged that the data verifying task is a task being executed,
extracting the target
data from the source database and writing the target data in a temporary
database, and
performing synchronous data verification on the target data in the temporary
database;
[0035] if the target data passes verification, synchronously writing in the
target database the
target data in the temporary database, and deleting the temporary database
after the target
data extracted from the source database has all passed verification and been
written in the
target database; and
4
Date Recue/Date Received 2021-12-29

[0036] if the target data does not pass verification, deleting the temporary
database.
[0037] Further, the verifying module is specifically employed for:
[0038] if it is judged that the data verifying task is a predecessor task,
executing the data verifying
task in the source database before the target data is extracted;
[0039] if the target data passes verification, extracting it from the source
database and writing it
in the target database.
[0040] Further, the verifying module is specifically employed for:
[0041] If it is judged that the data verifying task is a successor task,
executing the data verifying
task in the target database after the target data has been extracted from the
source database
and written in the target database.
[0042] Further, the data verifying task obtaining module includes:
[0043] an offline data task obtaining module, for obtaining the offline data
task;
[0044] a data verifying task creating module, for judging whether the offline
data task has a
corresponding data verifying rule, if yes, configuring the data verifying rule
for the offline
data task, and obtaining resource metadata and a verification parameter table;
and creating
the data verifying task according to the data verifying rule, the resource
metadata and the
verification parameter table, wherein the data verifying rule includes the
abnormality
judging condition and the execution order.
[0045] Further, the data verifying task creating module includes:
[0046] a verification rule table determining module for:
[0047] reading a verification rule table and a task ID of the offline data
task, wherein the
verification rule table contains the data verifying rule to which various task
IDs
correspond; and
[0048] matching the task IDs with the verification rule table, and determining
the data verifying
rule to which the offline data task corresponds.
Date Recue/Date Received 2021-12-29

[0049] Further, the data verifying task creating module further includes:
[0050] a verification parameter table determining module for:
[0051] obtaining tables and fields contained in the source database and the
target database to
which the offline data task corresponds; and
[0052] generating a verification parameter table corresponding to the offline
data task according
to verification parameters configured by the user for tables and/or fields.
[0053] Further, the data verifying task creating module further includes:
[0054] an analyzing module, for automatically obtaining and analyzing a task
script of the offline
data task, if analysis succeeds, obtaining tables and/or fields contained in
the source
database and the target database, if analysis fails, receiving tables and/or
fields input by
the user.
[0055] According to the third aspect, there is provided a computer system that
comprises:
[0056] one or more processor(s); and
[0057] a memory, associated with the one or more processor(s), wherein the
memory is employed
to store a program instruction, and the program instruction executes the
method according
to anyone of the aforementioned first aspect when it is read and executed by
the one or
more processor(s).
[0058] The technical solutions provided by the embodiments of the present
invention bring about
the following advantageous effects:
[0059] The technical solutions disclosed by the present invention provide
several possibilities
for the execution order of data verification relative to the offline data
task, realize setup
of the execution order through preconfiguration of the user or based on
default
configuration generated by automatically obtaining and analyzing scripts, and
make it
possible to automatically judge the execution order based on the offline data
task,
whereby flexibility of data verifying operation is enhanced, and verification
efficiency is
6
Date Recue/Date Received 2021-12-29

enhanced.
[0060] The technical solutions disclosed by the present invention supply a
technical solution in
which the verifying task is interrupted to generate verification information
when
abnormality occurs to the data, the data verifying task is continued to be
executed after
the user has made amendment, upstream and downstream tasks can be called
during the
continued execution, and it is not required to notify or operate downstream
tasks on a
one-by-one basis, and it is also not required to perform sorting, appraising
and checking
operations back and forth, whereby production accidents of data verification
are better
avoided.
[0061] The technical solutions disclosed by the present invention also contain
a task being
executed, that is to say, the data verifying task is executed at the same time
when the
offline data task is being executed, and pressure on data verification is
reduced with a
temporary database serving as buffer.
[0062] The technical solutions disclosed by the present invention economize on
production
machine resources, and prevent abnormal data tasks from being executed to
waste spaces
of CPU/memory and magnetic disk, whereby machine cost is further reduced.
BRIEF DESCRIPTION OF THE DRAWINGS
[0063] To more clearly describe the technical solutions in the embodiments of
the present
invention, drawings required to illustrate the embodiments will be briefly
introduced
below. Apparently, the drawings introduced below are merely directed to some
embodiments of the present invention, while persons ordinarily skilled in the
art may
further acquire other drawings on the basis of these drawings without spending
creative
effort in the process.
7
Date Recue/Date Received 2021-12-29

[0064] Fig. 1 is a flowchart illustrating a data verifying method provided by
an embodiment of
the present invention;
[0065] Fig. 2 is a view schematically illustrating the structure of a data
verifying device provided
by an embodiment of the present invention; and
[0066] Fig. 3 is a view schematically illustrating the structure of a computer
system provided by
an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0067] To make more lucid and clear the objectives, technical solutions and
advantages of the
present invention, the technical solutions in the embodiments of the present
invention will
be clearly and comprehensively described below with reference to the
accompanying
drawings in the embodiments of the present invention. Apparently, the
embodiments as
described are merely partial, rather than the entire, embodiments of the
present invention.
Any other embodiments makeable by persons ordinarily skilled in the art on the
basis of
the embodiments in the present invention without creative effort shall all
fall within the
protection scope of the present invention.
[0068] As noted in the Description of Related Art, in the data warehouse
storage technique, data
should be verified in the process of writing the data from the source database
in the target
database, so as to ensure validity of the incoming data. A main objective of
the technical
solutions disclosed by the present invention is to propose a data verifying
method capable
of enhancing flexibility and execution efficiency of data verification, with
technical
solutions specified as follows:
[0069] Si - creating a data verifying task based on an offline data task, the
offline data task
including extracting target data from a source database and writing the target
data in a
8
Date Recue/Date Received 2021-12-29

target database.
[0070] The offline data task mainly indicates an offline data task in the data
warehouse storage
technique (ETL), and the task not only includes extracting data from a source
database
and writing the data in a target database, but can also include analyzing and
processing
the data. Specifically, the offline data task can be Sqoop, Datax, Spark, Py
Spark, SparkSql,
Hive, and MR, etc. Accordingly, the above step 51 also includes judging
whether the
offline task is of a preset offline data task type.
[0071] Taking the SparkSql task for example, Job Schedule Service receives the
task from a
T WAIT FOR TAKE table, and judges whether the task type is the SparkSql task,
specifically, each offline data task type has a task ID, so it is possible to
judge the task
type according to the task ID.
[0072] In one embodiment, step 51 includes:
[0073] Sll - obtaining the offline data task;
[0074] S12 -judging whether the offline data task has a corresponding data
verifying rule, if yes,
configuring the data verifying rule for the offline data task, and obtaining
resource
metadata and a verification parameter table; and creating the data verifying
task according
to the data verifying rule, the resource metadata and the verification
parameter table,
wherein the data verifying rule includes the abnormality judging condition and
the
execution order.
[0075] The data verifying rule, the resource metadata and the verification
parameter table are all
preconfigured. The abnormality judging condition is a condition to judge
whether data is
valid, and can specifically judge: whether the data is void; the valuation
range of the data;
and the enumeration range of data valuation, etc. The execution order is an
execution
order of the data verifying rule relative to the offline data task, and
includes a predecessor
task, a task being executed, and a successor task, of which the predecessor
task means
9
Date Recue/Date Received 2021-12-29

that the data verifying task is executed before the data is extracted, the
task being executed
means that the data verifying task is executed during the process of
extracting and
importing the data, and the successor task means that the data verifying task
is executed
after the data has been imported. The resource metadata file is data resource
required to
execute the data verifying task, and specifically is task configuration or
dependency jar
resource, and the verification parameter table contains task parameters such
as time
parameters and frequencies, etc., that are required to execute the data
verifying task.
[0076] Likewise taking the SparkSql task for example, the data verifying rule
is stored in a
T JOB DATA QUALITY RULE table. The verification parameter table is
specifically
a T JOB PRAMAS table. The resource metadata is read by reading the
configuration
value whose key is dataquality.file.id in a T SYSTEM CONFIG table as an id to
enquire
a T FILE RESOURCE table, and the resource metadata as read is added to the
T FILE RESOURCE list to wait for being downloaded. The specific constructing
process is as follows:
[0077] writing an environment variable whose key is DATE QUALITY JARNAME in
resource
name of the resource metadata;
[0078] writing in the data verifying rule according to T JOB DATA QUALITY
RULE;
[0079] reading T JOB PRAMAS table to obtain verification parameters;
[0080] reading T FILE RESOURCE table to obtain the resource metadata, waiting
for the
resource metadata to be downloaded to completion, completing creation of the
data
verifying task.
[0081] In one embodiment, step S12 ofjudging whether the offline data task has
a corresponding
data verifying rule includes:
[0082] reading a verification rule table and a task ID of the offline data
task, wherein the
verification rule table contains the data verifying rule to which various task
IDs
correspond; and
[0083] matching the task IDs with the verification rule table, and determining
the data verifying
Date Recue/Date Received 2021-12-29

rule to which the offline data task corresponds.
[0084] In one embodiment, generation of the verification parameter table
includes:
[0085] obtaining tables and/or fields contained in the source database and the
target database to
which the offline data task corresponds; and
[0086] generating a verification parameter table corresponding to the offline
data task according
to verification parameters configured by the user for the tables and/or
fields.
[0087] Specifically, taking the SparkSql task for example, verification
parameters are configured
through the T JOB PRAMAS table.
[0088] Preferably, the data verifying rule to which the offline data task
corresponds is firstly
matched from the verification rule table through the task ID of the offline
data task,
including predecessor, being executed, and successor, and verification rules
are thereafter
configured according to the execution order.
[0089] In one embodiment, the step of obtaining tables and/or fields contained
in the source
database and the target database to which the offline data task corresponds
includes:
[0090] automatically obtaining and analyzing a task script of the offline data
task, if analysis
succeeds, obtaining tables and/or fields contained in the source database and
the target
database, if analysis fails, receiving tables and/or fields input by the user.
[0091] Specifically, the user develops such offline data tasks as Sqoop,
Datax, SparkSql, Hive
and MR at the foreground page, and reading/writing script information is
verified in real
time at the foreground. With respect to such jar-type tasks as SparkSql and so
on, the
script is captured via external script parameters. Information of tables and
fields of the
source database and the target database contained in the offline data task is
analyzed via
an automatic and real-time sql script blood analyzing module. Based on the
automatic
analysis, data verifying rules and verification parameters of tables or fields
of interest to
11
Date Recue/Date Received 2021-12-29

the user are configured.
[0092] S2 - determining an execution order of the data verifying task relative
to the offline data
task.
[0093] S3 ¨ executing the data verifying task according to the execution
order.
[0094] As previously mentioned, the execution order of the data verifying task
relative to the
offline data task includes: predecessor, being executed, and successor.
Accordingly,
executing the data verifying task according to the execution order can
specifically include
one or more circumstances in the following embodiment:
[0095] In one embodiment, step S3 of executing the data verifying task
according to the
execution order includes:
[0096] if the data verifying task is a task being executed, extracting the
target data from the
source database and writing the target data in a temporary database, and
performing
synchronous data verification on the target data in the temporary database;
[0097] if the target data passes verification, synchronously writing in the
target database the
target data in the temporary database, and deleting the temporary database
after the target
data extracted from the source database has all been written in the target
database; and
[0098] if the target data does not pass verification, deleting the temporary
database.
[0099] The temporary database includes a temporary datasheet, and the specific
writing-in is in
the temporary datasheet.
[0100] In one embodiment, step S3 of executing the data verifying task
according to the
execution order includes:
[0101] if the data verifying task is a predecessor task, executing the data
verifying task in the
source database before the target data is extracted;
12
Date Recue/Date Received 2021-12-29

[0102] if the target data passes verification, writing the target data
extracted from the source
database in the target database.
[0103] In one embodiment, step S3 of executing the data verifying task
according to the
execution order includes:
[0104] if the data verifying task is a successor task, executing the data
verifying task in the target
database after the target data has been extracted from the source database and
written in
the target database.
[0105] During the process of executing the data verifying task with the
aforementioned three
different execution orders, preferably, the predecessor and successor
execution orders are
default execution orders of the data verifying task, and the task being
executed execution
order requires the opening by the user before it is configured in the data
verifying task.
[0106] During specific execution, it can be sequentially judged in step S2
whether the data
verifying task is a predecessor task, a successor task, or a being executed
task, taking for
example the data verifying task in the SparkSql task:
[0107] judging whether there is a PER CHECK variable (predecessor) in the
environment
variable in the data verifying task;
[0108] if there is a PER CHECK variable, submitting the predecessor data
verifying task, and
continuing to execute the SparkSql task when the predecessor task is
successfully
executed;
[0109] if there is no PER CHECK variable, directly executing the SparkSql
task, and judging
whether there is a POST CHECK variable (successor) in the environment variable
after
the SparkSql task has been executed;
[0110] if yes, submitting the successor data verifying task, and executing the
data verifying task;
if not, completing task execution.
[0111] judging whether there is a RUNNING CHECK variable (being executed) in
the
environment variable in the data verifying task;
13
Date Recue/Date Received 2021-12-29

[0112] if yes, upgrading the task as a Spark execution engine to execute the
Spark task, importing
the data from the source database into the temporary database, executing the
data
verifying task in the temporary database, and writing the data in the target
database after
verification has succeeded;
[0113] if not, directly executing the SparkSql task.
[0114] In another circumstance, it is sequentially judged in step S2 whether
the data verifying
task is a predecessor task, a being executed task, or a successor task, taking
for example
the data verifying task in the SparkSql task:
[0115] judging whether there is a PER CHECK variable (predecessor) in the
environment
variable in the data verifying task;
[0116] if there is a PER CHECK variable, submitting the predecessor data
verifying task, and
continuing to execute the SparkSql task when the predecessor task is
successfully
executed;
[0117] if there is no PER CHECK variable, directly executing the SparkSql
task, and judging
whether there is a RUNNING CHECK variable (being executed) in the environment
variable after the SparkSql task has been executed;
[0118] if yes, upgrading the task as a Spark execution engine to execute the
Spark task, importing
the data from the source database into the temporary database, executing the
data
verifying task in the temporary database, and writing the data in the target
database after
verification has succeeded;
[0119] if not, judging whether there is a POST CHECK variable (successor) in
the environment
variable;
[0120] if yes, submitting the successor data verifying task, and executing the
data verifying task;
if not, completing task execution.
[0121] S4 - judging during execution whether the target data is abnormal
according to an
abnormality judging condition, if abnormality is verified, interrupting the
data verifying
task and generating verification information, and, after receiving data
amendment
14
Date Recue/Date Received 2021-12-29

information provided by a user according to the verification information,
continuing to
execute the data verifying task.
[0122] The verification information mainly indicates a quality report. After
it has been judged
that abnormality occurs to the data, the data verifying task disclosed by the
embodiments
of the present invention can be interrupted, and the data verifying task is
resumed to be
executed after the user has processed the abnormal data, upstream and
downstream tasks
are automatically called during the continued execution, and it is not
required to notify or
operate downstream tasks on a one-by-one basis. After the data verifying task
has been
executed to completion, the user is notified to timely check the result. By
checking the
circumstance of analyzed data quality in the quality report, by starting an
offline quality
report analyzing module, and by automatically collecting the quality report
stored on hdfs,
it is made possible to base on user dimension analysis to analyze abnormality
details of
specific tables and fields for the user, to base on keywords to categorize and
summarize
abnormality index types, to issue data quality common abnormality reports, and
to feed
back to the data user. Taking the data verifying task in the SparkSql task for
example,
after the data verifying task has been executed to completion, the SparkSql
offline task
execution engine stores the quality report via hdfs-api interface in a path
specified by hdfs,
for analysis by the SparkSql offline task. The user can precisely locate any
problematic
data on the basis of the quality report, after the problematic data has been
amended, tasks
are pulled up by resetting data verifying tasks of the predecessor, being
executed, and
successor quality rules to ensure normal execution of downstream tasks, thus
providing
reliable guarantee for precise administration of data.
[0123] As shown in Fig. 2, based on the aforementioned data verifying method,
an embodiment
of the present invention further provides a data verifying device that
comprises the
following modules.
[0124] A data verifying task obtaining module 201 is employed for obtaining a
data verifying
Date Recue/Date Received 2021-12-29

task based on an offline data task, the offline data task including extracting
target data
from a source database and writing the target data in a target database.
[0125] The offline data task mainly indicates an offline data task in the data
warehouse storage
technique (ETL), and the task not only includes extracting data from a source
database
and writing the data in a target database, but can also include analyzing and
processing
the data. Specifically, the offline data task can be Sqoop, Datax, Spark, Py
Spark, SparkSql,
Hive, and MR, etc. Accordingly, the data verifying task obtaining module is
further
employed for judging whether the offline task is of a preset offline data task
type, and
preferably for judging a type of the offline data task according to the task
ID.
[0126] In one embodiment, the data verifying task obtaining module 201
includes:
[0127] an offline data task obtaining module, for obtaining the offline data
task; and
[0128] a data verifying task creating module, for judging whether the offline
data task has a
corresponding data verifying rule, if yes, configuring the data verifying rule
for the offline
data task, and obtaining resource metadata and a verification parameter table;
and creating
the data verifying task according to the data verifying rule, the resource
metadata and the
verification parameter table, wherein the data verifying rule includes the
abnormality
judging condition and the execution order.
[0129] The data verifying rule, the resource metadata and the verification
parameter table are all
preconfigured. The abnormality judging condition is a condition to judge
whether data is
valid, and can specifically judge: whether the data is void; the valuation
range of the data;
and the enumeration range of data valuation, etc. The execution order is an
execution
order of the data verifying rule relative to the offline data task, and
includes a predecessor
task, a task being executed, and a successor task. The resource metadata file
is data
resource required to execute the data verifying task, and the verification
parameter table
contains task parameters that are required to execute the data verifying task.
16
Date Recue/Date Received 2021-12-29

[0130] In one embodiment, the data verifying task creating module includes:
[0131] a verification rule table determining module for:
[0132] reading a verification rule table and a task ID of the offline data
task, wherein the
verification rule table contains verifying rules to which various task IDs
correspond; and
[0133] matching the task IDs with the verification rule table, and determining
the data verifying
rule to which the offline data task corresponds.
[0134] In one embodiment, the data verifying task creating module further
includes:
[0135] a verification parameter table determining module for:
[0136] obtaining tables and/or fields contained in the source database and the
target database to
which the offline data task corresponds; and
[0137] generating a verification parameter table corresponding to the offline
data task according
to verification parameters configured by the user for tables and/or fields.
[0138] In one embodiment, the data verifying task creating module further
includes:
[0139] an analyzing module, for automatically obtaining and analyzing a task
script of the offline
data task, if analysis succeeds, obtaining tables and/or fields contained in
the source
database and the target database, if analysis fails, receiving tables and/or
fields input by
the user.
[0140] An execution order judging module 202 is employed for determining an
execution order
of the data verifying task relative to the offline data task.
[0141] The execution order judging module judges the execution order of the
data verifying task
relative to the offline data task mainly through an environment variable in
the data
verifying task.
[0142] The execution order of the data verifying task relative to the offline
data task includes:
predecessor, being executed, and successor. During the process of executing
the data
17
Date Recue/Date Received 2021-12-29

verifying task with the aforementioned three different execution orders,
preferably, the
predecessor and successor execution orders are default execution orders of the
data
verifying task, and the task being executed execution order requires the
opening by the
user before it is configured in the data verifying task.
[0143] During specific execution, it is possible to sequentially judge whether
the data verifying
task is a predecessor task, a successor task, or a task being executed, or to
sequentially
judge whether the data verifying task is an predecessor task, a task being
executed, or a
successor task.
[0144] A verifying module 203 is employed for executing the data verifying
task according to
the execution order, judging during execution whether the target data is
abnormal
according to an abnormality judging condition, if abnormality is verified,
interrupting the
data verifying task and generating verification information, and, after
receiving data
amendment information provided by a user according to the verification
information,
continuing to execute the data verifying task.
[0145] In one embodiment, the verifying module 203 is specifically employed
for:
[0146] if it is judged that the data verifying task is a task being executed,
extracting the target
data from the source database and writing the target data in a temporary
database, and
performing synchronous data verification on the target data in the temporary
database;
[0147] if the target data passes verification, synchronously writing in the
target database the
target data in the temporary database, and deleting the temporary database
after the target
data extracted from the source database has all been written in the target
database; and
[0148] if the target data does not pass verification, deleting the temporary
database.
[0149] In one embodiment, the verifying module 203 is specifically employed
for:
[0150] if it is judged that the data verifying task is a predecessor task,
executing the data verifying
task in the source database before the target data is extracted;
18
Date Recue/Date Received 2021-12-29

[0151] if the target data passes verification, writing the data extracted from
the source database
in the target database.
[0152] In one embodiment, the verifying module 203 is specifically employed
for:
[0153] If it is judged that the data verifying task is a successor task,
executing the data verifying
task in the target database after the target data has been extracted from the
source database
and written in the target database.
[0154] Based on the aforementioned data verifying method, the present
invention further
provides a computer system that comprises:
[0155] one or more processor(s); and
[0156] a memory, associated with the one or more processor(s), wherein the
memory is employed
to store a program instruction, and the program instruction executes the
aforementioned
data verifying method when it is read and executed by the one or more
processor(s).
[0157] Fig. 3 exemplarily illustrates the framework of the computer system
that can specifically
include a processor 310, a video display adapter 311, a magnetic disk driver
312, an
input/output interface 313, a network interface 314, and a memory 320. The
processor
310, the video display adapter 311, the magnetic disk driver 312, the
input/output
interface 313, the network interface 314, and the memory 320 can be
communicably
connected with one another via a communication bus 330.
[0158] The processor 310 can be embodied as a general CPU (Central Processing
Unit), a
microprocessor, an ASIC (Application Specific Integrated Circuit), or one or
more
integrated circuit(s) for executing relevant program(s) to realize the
technical solutions
provided by the present application.
[0159] The memory 320 can be embodied in such a form as an ROM (Read Only
Memory), an
RAM (Random Access Memory), a static storage device, or a dynamic storage
device.
19
Date Recue/Date Received 2021-12-29

The memory 320 can store an operating system 321 for controlling the running
of an
electronic equipment 300, and a basic input/output system 322 (BIOS) for
controlling
lower-level operations of the electronic equipment 300. In addition, the
memory 320 can
also store a web browser 323, a data storage administration system 324, and an
equipment
identification information processing system 325, etc. The equipment
identification
information processing system 325 can be an application program that
specifically
realizes the aforementioned various step operations in the embodiments of the
present
application. To sum it up, when the technical solutions provided by the
present application
are to be realized via software or firmware, the relevant program codes are
stored in the
memory 320, and invoked and executed by the processor 310.
[0160] The input/output interface 313 is employed to connect with an
input/output module to
realize input and output of information. The input/output module can be
equipped in the
device as a component part (not shown in the drawings), and can also be
externally
connected with the device to provide corresponding functions. The input means
can
include a keyboard, a mouse, a touch screen, a microphone, and various sensors
etc., and
the output means can include a display screen, a loudspeaker, a vibrator, an
indicator light
etc.
[0161] The network interface 314 is employed to connect to a communication
module (not
shown in the drawings) to realize intercommunication between the current
device and
other devices. The communication module can realize communication in a wired
mode
(via USB, network cable, for example) or in a wireless mode (via mobile
network, WIFI,
Bluetooth, etc.).
[0162] The bus 330 includes a passageway transmitting information between
various component
parts of the device (such as the processor 310, the video display adapter 311,
the magnetic
disk driver 312, the input/output interface 313, the network interface 314,
and the memory
320).
Date Recue/Date Received 2021-12-29

[0163] Additionally, the electronic equipment 300 may further obtain
information of specific
collection conditions from a virtual resource object collection condition
information
database for judgment on conditions, and so on.
[0164] As should be noted, although merely the processor 310, the video
display adapter 311,
the magnetic disk driver 312, the input/output interface 313, the network
interface 314,
the memory 320, and the bus 330 are illustrated for the aforementioned device,
the device
may further include other component parts prerequisite for realizing normal
running
during specific implementation. In addition, as can be understood by persons
skilled in
the art, the aforementioned device may as well only include component parts
necessary
for realizing the solutions of the present application, without including the
entire
component parts as illustrated.
[0165] As can be known through the description to the aforementioned
embodiments, it is clearly
learnt by person skilled in the art that the present application can be
realized through
software plus a general hardware platform. Based on such understanding, the
technical
solutions of the present application, or the contributions made thereby over
the state of
the art, can be essentially embodied in the form of a software product, and
such a
computer software product can be stored in a storage medium, such as an
ROM/RAM, a
magnetic disk, an optical disk etc., and includes plural instructions enabling
a computer
equipment (such as a personal computer, a server, or a network device etc.) to
execute the
methods described in various embodiments or some sections of the embodiments
of the
present application.
[0166] The various embodiments are progressively described in the Description,
identical or
similar sections among the various embodiments can be inferred from one
another, and
each embodiment stresses what is different from other embodiments.
Particularly, with
respect to the system or system embodiment, since it is essentially similar to
the method
embodiment, its description is relatively simple, and the relevant sections
thereof can be
21
Date Recue/Date Received 2021-12-29

inferred from the corresponding sections of the method embodiment. The system
or
system embodiment as described above is merely exemplary in nature, units
therein
described as separate parts can be or may not be physically separate, parts
displayed as
units can be or may not be physical units, that is to say, they can be located
in a single
site, or distributed over a plurality of network units. It is possible to base
on practical
requirements to select partial modules or the entire modules to realize the
objectives of
the embodied solutions. It is understandable and implementable by persons
ordinarily
skilled in the art without spending creative effort in the process.
[0167] The technical solutions provided by the embodiments of the present
invention bring about
the following advantageous effects:
[0168] The technical solutions disclosed by the present invention provide
several possibilities
for the execution order of data verification relative to the offline data
task, realize setup
of the execution order through preconfiguration of the user or based on
default
configuration generated by automatically obtaining and analyzing scripts, and
make it
possible to automatically judge the execution order based on the offline data
task,
whereby flexibility of data verifying operation is enhanced, and verification
efficiency is
enhanced.
[0169] The technical solutions disclosed by the present invention supply a
technical solution in
which the verifying task is interrupted to generate verification information
when
abnormality occurs to the data, the data verifying task is continued to be
executed after
the user has made amendment, upstream and downstream tasks can be called
during the
continued execution, and it is not required to notify or operate downstream
tasks on a
one-by-one basis, and it is also not required to perform sorting, appraising
and checking
operations back and forth, whereby production accidents of data verification
are better
avoided.
22
Date Recue/Date Received 2021-12-29

[0170] The technical solutions disclosed by the present invention also contain
a task being
executed, that is to say, the data verifying task is executed at the same time
when the
offline data task is being executed, and pressure on data verification is
reduced with a
temporary database serving as buffer.
[0171] The technical solutions disclosed by the present invention economize on
production
machine resources, and prevent abnormal data tasks from being executed to
waste spaces
of CPU/memory and magnetic disk, whereby machine cost is further reduced.
[0172] All of the aforementioned optional technical solutions are randomly
combinable to form
optional embodiments of the present invention, to which no repetition is made
in this
context.
[0173] What is described above is merely directed to preferred embodiments of
the present
invention, and is not meant to restrict the present invention. Any amendment,
equivalent
replacement and improvement makeable within the spirit and principle of the
present
invention shall all fall within the protection scope of the present invention.
23
Date Recue/Date Received 2021-12-29

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Deemed Abandoned - Failure to Respond to an Examiner's Requisition 2024-08-29
Examiner's Report 2024-03-04
Inactive: Report - QC failed - Minor 2024-01-09
Letter Sent 2023-02-07
Inactive: Correspondence - PAPS 2022-12-23
Request for Examination Received 2022-09-16
Request for Examination Requirements Determined Compliant 2022-09-16
All Requirements for Examination Determined Compliant 2022-09-16
Inactive: Cover page published 2022-08-12
Application Published (Open to Public Inspection) 2022-06-30
Inactive: IPC assigned 2022-04-26
Inactive: First IPC assigned 2022-04-26
Inactive: IPC assigned 2022-04-26
Letter sent 2022-01-24
Filing Requirements Determined Compliant 2022-01-24
Request for Priority Received 2022-01-19
Priority Claim Requirements Determined Compliant 2022-01-19
Application Received - Regular National 2021-12-29
Inactive: Pre-classification 2021-12-29
Inactive: QC images - Scanning 2021-12-29

Abandonment History

Abandonment Date Reason Reinstatement Date
2024-08-29

Maintenance Fee

The last payment was received on 2023-12-20

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Application fee - standard 2021-12-29 2021-12-29
Request for examination - standard 2025-12-29 2022-09-16
MF (application, 2nd anniv.) - standard 02 2023-12-29 2023-06-15
MF (application, 3rd anniv.) - standard 03 2024-12-30 2023-12-20
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
10353744 CANADA LTD.
Past Owners on Record
HAIYANG CAO
QIAN SUN
WEI XU
WENPING GUO
ZHENZHEN WANG
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 2021-12-28 23 1,022
Claims 2021-12-28 3 132
Abstract 2021-12-28 1 22
Drawings 2021-12-28 2 93
Representative drawing 2022-08-11 1 28
Examiner requisition 2024-03-03 3 171
Courtesy - Filing certificate 2022-01-23 1 568
Courtesy - Acknowledgement of Request for Examination 2023-02-06 1 423
New application 2021-12-28 7 227
Request for examination 2022-09-15 9 301
Correspondence for the PAPS 2022-12-22 4 150