Note: Descriptions are shown in the official language in which they were submitted.
DATA SYNCHRONIZATION METHOD AND DEVICE, COMPUTER EQUIPMENT
AND STORAGE MEDIUM
Technical Field
[0001] The present disclosure relates to the big data analysis technology
field, particularly to
a data synchronization method, apparatus, computer device, and storage medium.
Background
[0002] On-Line Analysis Processing (OLAP) is a rapid analysis technology of
sharing multi-
dimensional information, the technology uses multi-dimensional database
technology to enable
user to observe data from different angles. OLAP is mainly used to support
complex analysis
operations, focusing on decision support for management people, meeting the
requirements of
analysts for complex queries of large data amount quickly and flexibly, and
presenting queries
in an intuitive and easy-to-understand form to assist decision-making.
[0003] At present, usually adopting full coverage mode to synchronize data to
OLAP platform
from data warehouse, the data set in the data warehouse is synchronized to the
OLAP platform
according to the partition time, the partition time of the data warehouse is
data processing time,
not master time dimension, the billing time of data recording is the master
time dimension,
when performing data synchronization operation, the data cannot be
synchronized to the OLAP
platform according to the specified master time, the accuracy of data
synchronization is
reduced, in addition, the data recording of one partition in the data
warehouse can only be
written to one partition in the OLAP platform, concurrent data synchronization
cannot be
achieved.
Invention Content
[0004] Based on this, it is necessary to provide a method, apparatus, computer
device, and
storage medium to tackle the above-mentioned technical problem, the method can
perform
1
Date Recue/Date Received 2023-02-27
distributed writing of data according to the master time specified by user to
achieve the
concurrency of data synchronization and improve the accuracy of data
synchronization.
[0005] On the first aspect, providing a data synchronization method, the
method comprises:
[0006] Receiving user data synchronization request information, the request
information
includes field information of data analyzed by user online;
[0007] Obtaining a fact table and a dimension table corresponding to the field
information,
left joining the fact table and the dimension table to obtain a join table
corresponding to the
field information;
[0008] Performing a mapjoin operation to skewed data in the join table to
obtain an optimized
join table;
[0009] Classifying according to data billing time in the optimized join table,
saving the
classified data to corresponding partition of a distributed file system HDFS
cluster to obtain
partitioned data;
[0010] Writing the partitioned data into a temporary table of a column-
oriented database
management unit ClickHouse for data synchronization.
[0011] In an achievable method, writing the partitioned data into a temporary
table of a
column-oriented database management unit ClickHouse for data synchronization,
comprising:
[0012] Counting data amount of the temporary table and the optimized join
table to obtain
the data amount of the temporary table and the data amount of the join table;
[0013] When the data amount of the temporary table is consistent with the data
amount of the
optimized join table, writing the partitioned data in the temporary table into
a master table of
2
Date Recue/Date Received 2023-02-27
ClickHouse to complete data synchronization.
[0014] In an achievable method, the method also comprises:
[0015] When the data amount of the temporary table is inconsistent with the
data amount of
the join table, re-writing the partitioned data into the temporary table of
ClickHouse to obtain
a rewritten temporary table;
[0016] When the data amount in the rewritten temporary table is consistent
with the data
amount of the optimized join table, writing the partitioned data in the
rewritten temporary table
into a master table of ClickHouse to complete data synchronization.
[0017] In an achievable method, the method also comprises:
[0018] When the data amount of the temporary table is consistent with the data
amount of
the optimized join table, after writing the partitioned data in the temporary
table into a master
table of ClickHouse, recording first status information of data in the master
table as submission
status.
[0019] In an achievable method, the method also comprises:
[0020] When the data amount of the temporary table is inconsistent with the
data amount of
the join table, obtaining meta information and execution information of the
temporary table;
[0021] Recording second status information of data in the temporary table as
pre-submission
status;
[0022] Saving the meta information, the execution information and the second
status
information to a relational database management system MySQL.
3
Date Recue/Date Received 2023-02-27
[0023] In an achievable method, the method also comprises:
[0024] Obtaining a to-be-synchronized first fact table and a reverse table
within a preset time;
[0025] Multiplying measurement data in the first fact table with a preset
value to obtain a
prepared fact table;
[0026] Adding measurement data in the prepared fact table and the reverse
table to obtain a
fact reverse table;
[0027] Joining the fact reverse table for data synchronization.
[0028] The second aspect, a data synchronization apparatus is provided,
wherein, the
apparatus comprises:
[0029] A receiving module configured to receive user data synchronization
request
information, the request information includes field information of data
analyzed by user online;
[0030] A joining module configured to obtain a fact table and a dimension
table
corresponding to the field information, left join the fact table and the
dimension table to obtain
a join table corresponding to the field information;
[0031] An optimizing module configured to perform a mapjoin operation to
skewed data in
the join table to obtain an optimized join table;
[0032] A partitioning module configured to classify according to data billing
time in the
optimized join table and save the classified data to corresponding partition
of a distributed file
system HDFS cluster to obtain partitioned data;
[0033] A synchronizing module configured to write the partitioned data into a
temporary
4
Date Recue/Date Received 2023-02-27
table of a column-oriented database management unit ClickHouse for data
synchronization.
[0034] In an achievable method, wherein, the synchronizing module is
specifically used for:
[0035] Counting data amount of the temporary table and the optimized join
table to obtain
the data amount of the temporary table and the data amount of the join table;
[0036] When the data amount of the temporary table is consistent with the data
amount of
the optimized join table, writing the partitioned data in the temporary table
into a master table
of ClickHouse to complete data synchronization.
[0037] The third aspect, a computer device is provided, including a memory, a
processor and
a computer program stored in the memory and run on the processor configured to
achieve the
first aspect or any data synchronization method of the first aspect when the
processor executes
the computer program.
[0038] The fourth aspect, a computer readable storage medium stored with a
computer
program configured to achieve the first aspect or any data synchronization
method of the first
aspect when the processor executes the computer program.
[0039] The above-mentioned data synchronization method, computer device, and
storage
medium, receiving user data synchronization request information, the request
information
includes field information of data analyzed by user online; obtaining a fact
table and a
dimension table corresponding to the field information, left joining the fact
table and the
dimension table to obtain a join table corresponding to the field information;
performing a
mapjoin operation to skewed data in the join table to obtain an optimized join
table; classifying
according to data billing time in the optimized join table, saving the
classified data to
corresponding partition of a distributed file system HDFS cluster to obtain
partitioned data;
writing the partitioned data into a temporary table of a column-oriented
database management
unit ClickHouse for data synchronization, the method can perform distributed
writing of data
Date Recue/Date Received 2023-02-27
according to master time specified by user to achieve concurrency of data
synchronization and
improve the accuracy of data synchronization.
Drawing Description
[0040] Figure 1 is an application environment diagram of data synchronization
method in an
embodiment;
[0041] Figure 2 is a process diagram of data synchronization method in an
embodiment;
[0042] Figure 3 is a structural diagram of data synchronization apparatus in
an embodiment;
[0043] Figure 4 is an internal structural diagram of a computer device in an
embodiment;
Specific embodiment methods
[0044] In order to make clearer application purposes, technical solutions, and
advantages, the
present application is further explained in detail with a particular
embodiment thereof, and with
reference to the drawings. It shall be understood that the specific
embodiments described here
are only used to explain the present application, but not to limit the scope
of the present
application.
[0045] The data synchronization method provided by the present application can
be applied
to the data synchronization system shown in Figure 1, the system includes:
data warehouse
module 110, OLAP joining module 120, OLAP engine module 130, wherein, the OLAP
joining
module includes online analysis and processing data warehouse tool OLAP-HIVE
cluster, the
OLAP engine module 130 includes database management unit ClickHouse. The data
warehouse module 110 is configured to synchronize the fact table and the
latitude table to the
OLAP-HIVE cluster, the OLAP-HIVE cluster is configured to write the ClickHouse
after the
fact table and the latitude table are joined, the ClickHouse is configured to
synchronize the
6
Date Recue/Date Received 2023-02-27
joined data.
[0046] In some embodiments, as shown in Figure 2, a data synchronization
method is
provided, the method comprises following steps:
[0047] S210, receiving user data synchronization request information, the
request information
includes field information of data analyzed by user online.
[0048] when user performs data analysis through OLAP platform, inputting the
query
statement to query data through the OLAP platform interface, the interactive
interface of the
OLAP platform receives the query information input by user, generates request
information
according to the query statement and sends the request information to the data
synchronization
system. The data synchronization system receives request information, wherein,
the request
information includes field information, the field information is keywords for
user to obtain
online data analysis.
[0049] S220, obtaining a fact table and a dimension table corresponding to the
field
information, left joining the fact table and the dimension table to obtain a
join table
corresponding to the field information.
[0050] The fact table is the central table in the data warehouse structure,
the fact table
contains numeric measurement values and keys linked to the fact and dimension
tables, the fact
data table contains data describing specific event within the service.
[0051] The dimension table can be seen as window for user to analyze data, the
dimension
table contains features of fact records in fact data table, some features
provide descriptive
information, some features specify how to summarize fact data table data to
provide useful
information for analyst, the dimension table contains hierarchies of
attributes to help
summarize data.
7
Date Recue/Date Received 2023-02-27
[0052]
Obtaining the fact table and the dimension table corresponding to the field
information from the data warehouse module, joining the fact table and the
dimension table
through the OLAP-HIVE cluster, in other words, performing left join on the
fact table and the
dimension table, considering one of the tables as the left table, and the
other table as the right
table, all the data in the left table will be displayed in the join table, the
data of the right table
meets the field information conditions, the area in the left table corresponds
to the area with no
data in the right table is null. By joining the fact table and the dimension
table, the data used
by user for online analysis is associated with one table to facilitate user to
analyze data more
conveniently and intuitively.
[0053] S230, performing a mapjoin operation to skewed data in the join table
to obtain an
optimized join table.
[0054] During the join process of the fact table and the dimension table, if
the data amount
corresponding to each dimension in the table is quite different and existing a
particularly large
amount of data corresponding to one or several dimensions, data skew will be
caused, and the
data skew will extend the data synchronization time, therefore, optimizing the
join table is
required and performing mapjoin operation on the join table.
[0055] Divide the skewed data in the join table into large table and small
table, load the small
table to the memory, scan the large table sequentially, directly perform join
operation on the
map side to obtain the optimized join table. Since the optimized join table
performs data
synchronization, the impact of skewed data is greatly reduced, the time for
data synchronization
is reduced, and the data synchronization speed is improved.
[0056] S240, classifying according to data billing time in the optimized join
table, saving the
classified data to corresponding partition of a distributed file system HDFS
cluster to obtain
partitioned data.
[0057] Classifying the data with same billing time in the optimized join table
into one
8
Date Recue/Date Received 2023-02-27
category, the partitions in the distributed file system (Hadoop Distributed
File System, HDFS)
cluster are divided by the billing date, the classified data is saved to the
path corresponding to
the HDFS cluster and added to the Hive partition, for example, classifying the
data with billing
date of 2021.12.25 into one category, classifying the data with the billing
date of 2021.12.25
into one category, saving the data with the billing date of 2021.12.25 to the
2021.12.25 partition
in the HDFS cluster, and saving the data with the billing date of 2021.12.26
in the 2021.12.25
partition of the HDFS cluster.
[0058] Wherein, the classified data is saved to the corresponding partition of
the HDFS
cluster in the form of a global lock to ensure concurrent data
synchronization.
[0059] S250, writing the partitioned data into a temporary table of a column-
oriented
database management unit ClickHouse for data synchronization.
[0060] According to the billing time of the partitioned data, writing all the
partitioned data
into the position of ClickHouse temporary table corresponding to the billing
time, then
synchronizing to the master table to complete the data synchronization.
[0061] In the embodiment of the present application, through the method of
receiving user
data synchronization request information, the request information includes
field information of
data analyzed by user online, obtaining a fact table and a dimension table
corresponding to the
field information, left joining the fact table and the dimension table to
obtain a join table
corresponding to the field information, performing a mapjoin operation to
skewed data in the
join table to obtain an optimized join table, classifying according to data
billing time in the
optimized join table, saving the classified data to corresponding partition of
a distributed file
system HDFS cluster to obtain partitioned data, writing the partitioned data
into a temporary
table of a column-oriented database management unit ClickHouse for data
synchronization, the
method can perform distributed writing of data according to master time
specified by user to
achieve concurrency of data synchronization and improve the accuracy of data
synchronization.
9
Date Recue/Date Received 2023-02-27
[0062] In some embodiments, writing the partitioned data into a temporary
table of a column-
oriented database management unit ClickHouse for data synchronization,
comprising:
[0063] Counting data amount of the temporary table and the optimized join
table to obtain
the data amount of the temporary table and the data amount of the join table;
[0064] When the data amount of the temporary table is consistent with the data
amount of
the optimized join table, writing the partitioned data in the temporary table
into a master table
of ClickHouse to complete data synchronization.
[0065] Writing partitioned data into the ClickHouse temporary table, counting
the data
amount of the temporary table and the optimized join table, when the data
amount of the
temporary table is consistent with the data amount of the optimized join
table, indicating that
the HDFS cluster has accurately stored data, if synchronization task is
successful,
synchronizing the data of the temporary table to the master table through
attach-partition-from
method for presenting to user.
[0066] In some embodiments, the method also comprises:
[0067] When the data amount of the temporary table is inconsistent with the
data amount of
the join table, re-writing the partitioned data into the temporary table of
ClickHouse to obtain
a rewritten temporary table;
[0068] When the data amount in the rewritten temporary table is consistent
with the data
amount of the optimized join table, writing the partitioned data in the
rewritten temporary table
into a master table of ClickHouse to complete data synchronization.
[0069] When the data amount of the temporary table is inconsistent with the
data amount of
the join table, indicating that the HDFS cluster has not completely written
the partitioned data
into the ClickHouse temporary table, the synchronization task has failed, at
this time, the data
Date Recue/Date Received 2023-02-27
in the temporary table cannot be presented to user synchronously with the
master table, the data
synchronization needs to be performed again. Rewriting the partitioned data
into the temporary
table of ClickHouse to obtain a new temporary table, in other words, rewriting
the temporary
table, then comparing the data amount of the re-written temporary table with
the data amount
of the join table again, if still inconsistent, continue to rewrite the
partitioned data into the
ClickHouse temporary table until the data amount in the re-written temporary
table is consistent
with the data amount of the optimized join table, then writing the partitioned
data of the re-
written temporary table to the master table of ClickHouse to complete data
synchronization
and ensure the accuracy of data synchronization.
[0070] In some embodiments, the method also comprises:
[0071] When the data amount of the temporary table is consistent with the data
amount of
the optimized join table, after writing the partitioned data in the temporary
table into a master
table of ClickHouse, recording first status information of data in the master
table as submission
status.
[0072] Transaction is data consistency maintenance unit of a database,
transitioning the
database from a consistent status to a new consistent status, in short, a set
of processing steps
is called a transaction if either all or none of them are executed. Since the
data synchronization
by a plurality of nodes cannot all succeed, in order to ensure the integrity
and reliability of the
synchronized data, deploying corresponding distributed transaction during the
data
synchronization process.
[0073] The first status information is the status information of the data in
the master table
that has been successfully synchronized, the submission status indicates that
the transaction is
ended and all steps of data synchronization are completed. After the
synchronization task is
successful and the data of the temporary table is written to the master table,
recording the status
of data in the master table as submission status.
11
Date Recue/Date Received 2023-02-27
[0074] In some embodiments, the method also comprises:
[0075] When the data amount of the temporary table is inconsistent with the
data amount of
the join table, obtaining meta information and execution information of the
temporary table;
[0076] Recording second status information of data in the temporary table as
pre-submission
status;
[0077] Saving the meta information, the execution information and the second
status
information to a relational database management system MySQL.
[0078] The meta information is the service description information of the
data, the execution
information includes data synchronization failure information and data
synchronization
success information, when the data amount of the temporary table is
inconsistent with the data
amount of join table, the data synchronization failure information and the
meta information of
the temporary table are obtained. The second status information is the status
information of
data in the master table identified by synchronization, the pre-submission
status indicates the
end of transaction, all data synchronization steps have failed, the ClickHouse
data
synchronization needs to be performed again. When the synchronization task
fails, recording
the status of data in the temporary table as pre-submission, rewriting the
partitioned data into
the temporary table. Meanwhile, saving meta information, execution information
and second
status information in the relational database management system MySQL,
updating the data
records in My SQL.
[0079] After writing all the data in the temporary table to the master table,
verifying whether
the data amount of the temporary table is consistent with the data amount of
the master table,
if not, re-synchronizing the ClickHouse data, if the data is consistent,
restoring the execution
information of ClickHouse and unlock the global lock.
[0080] When the ClickHouse data synchronization process cannot guarantee the
transaction,
12
Date Recue/Date Received 2023-02-27
all the partitioned data will be covered in the master table of ClickHouse.
[0081] In some embodiments, the method also comprises:
[0082] Obtaining a to-be-synchronized first fact table and a reverse table
within a preset time;
[0083] Multiplying measurement data in the first fact table with a preset
value to obtain a
prepared fact table;
[0084] Adding measurement data in the prepared fact table and the reverse
table to obtain a
fact reverse table;
[0085] Joining the fact reverse table for data synchronization.
[0086] In some cases, synchronizing the data of the current day is required
and named as
reverse supplement, the reverse data of the current day is only valid for the
current day and
does not affect the scheduling data of the next day.
[0087] The preset time is the day specified by user, and the preset data is -
1, obtaining the
fact table, the dimension table and the reverse table of the current day,
multiplying the metering
data in the fact table by -1 to obtain the prepared fact table, adding the
metering data in the fact
table and the reverse table to obtain the fact reverse table, joining the fact
reverse table and
synchronizing the joined fact reverse table to the master table of ClickHouse,
so as to present
in report.
[0088] What should be noted is although the steps of the above-mentioned
process diagram
in Figure 2 are shown in sequence as indicated by the arrows, these steps are
not necessarily
executed in the order indicated by the arrows. Unless explicitly provided
instruction in this
article, there is no strict order in which these steps can be performed, and
they can be performed
in any other orders. In addition, at least parts of the appended drawings in
the steps of Figure
13
Date Recue/Date Received 2023-02-27
2 can include more sub steps or multiple stages, these sub steps or stages are
not necessarily
completed at the same time but can be executed in different time, the
execution order of these
sub steps or stages is also not necessarily in sequence order but can be
performed alternately
with the other steps or sub steps of other steps or at least one part of the
other stages.
[0089] In some embodiments, as shown in Figure 3, a data synchronization
apparatus is
provided, the apparatus comprises: receiving module 310, joining module 320,
optimizing
module 330, partitioning module 340 and synchronizing module 350, wherein:
[0090] A receiving module 310 configured to receive user data synchronization
request
information, the request information includes field information of data
analyzed by user online;
[0091] A joining module 320 configured to obtain a fact table and a dimension
table
corresponding to the field information, left join the fact table and the
dimension table to obtain
a join table corresponding to the field information;
[0092] An optimizing module 330 configured to perform a mapjoin operation to
skewed data
in the join table to obtain an optimized join table;
[0093] A partitioning module 340 configured to classify according to data
billing time in the
optimized join table and save the classified data to corresponding partition
of a distributed file
system HDFS cluster to obtain partitioned data;
[0094] A synchronizing module 350 configured to write the partitioned data
into a temporary
table of a column-oriented database management unit ClickHouse for data
synchronization.
[0095] In the embodiments of the present application, the method can perform
distributed
writing of data according to master time specified by user to achieve
concurrency of data
synchronization and improve the accuracy of data synchronization.
14
Date Recue/Date Received 2023-02-27
[0096] In some embodiments, the synchronizing module is specifically used for:
[0097] Counting data amount of the temporary table and the optimized join
table to obtain
the data amount of the temporary table and the data amount of the join table;
[0098] When the data amount of the temporary table is consistent with the data
amount of the
optimized join table, writing the partitioned data in the temporary table into
a master table of
ClickHouse to complete data synchronization.
[0099] In some embodiments, the apparatus also includes: a rewriting apparatus
360
configured to,
[0100] When the data amount of the temporary table is inconsistent with the
data amount of
the join table, re-writing the partitioned data into the temporary table of
ClickHouse to obtain
a rewritten temporary table;
[0101] When the data amount in the rewritten temporary table is consistent
with the data
amount of the optimized join table, writing the partitioned data in the
rewritten temporary table
into a master table of ClickHouse to complete data synchronization.
[0102] In some embodiments, the apparatus also includes: a recording module
370
configured to,
[0103] When the data amount of the temporary table is consistent with the data
amount of
the optimized join table, after writing the partitioned data in the temporary
table into a master
table of ClickHouse, recording first status information of data in the master
table as submission
status.
[0104] In some embodiments, the apparatus also includes:
Date Recue/Date Received 2023-02-27
[0105] An obtaining module 380 configured to when the data amount of the
temporary table
is inconsistent with the data amount of the join table, obtaining meta
information and execution
information of the temporary table;
[0106] A recording module 370 configured to record second status information
of data in the
temporary table as pre-submission status;
[0107] A storing module 390 configured to save the meta information, the
execution
information and the second status information to a relational database
management system
MySQL.
[0108] In some embodiments, the apparatus also includes:
[0109] An obtaining module 380 configured to obtain a to-be-synchronized first
fact table
and a reverse table within a preset time;
[0110] A multiplying module 3100 configured to multiply measurement data in
the first fact
table with a preset value to obtain a prepared fact table;
[0111] An adding module configured to add measurement data in the prepared
fact table and
the reverse table to obtain a fact reverse table;
[0112] A synchronizing module configured to join the fact reverse table for
data
synchronization.
[0113] For the specific limitation of data synchronization apparatus can refer
to the above-
mentioned data synchronization method, which will not be repeated here. Each
module of the
above data synchronization apparatus can be achieved fully or partly by
software, hardware,
and their combinations. The above modules can be embedded in the processor or
independent
of the processor in computer device and can store in the memory of computer
device in form
16
Date Recue/Date Received 2023-02-27
of software, so that the processor can call and execute the operations
corresponding to the
above modules.
[0114] In some embodiments, a computer device is provided, the computer device
can be a
server and whose internal structure diagram is shown in Figure 4. The computer
device includes
a processor, a memory, a network interface, and a database connected through a
system bus.
The processor of the computer device is configured to provide calculation and
control
capabilities. The memory of the computer device includes non-volatile storage
medium and
internal memory. The memory of non-volatile storage medium has an operation
system,
computer programs and database. The internal memory provides an environment
for the
operation system and computer program running in a non-volatile storage
medium. The
network interface of the computer device is used to communicate with an
external terminal
through a network connection. The computer program is executed by the
processor to
implement a data synchronization method.
[0115] The skilled in the art can understand that the structure shown in
Figure 4 is only partial
structural diagram related this application solution and not constitute
limitation to the computer
device applied on the current application solution, the specific computer
device can include
more or less components than what is shown in the figure, or combinations of
some
components or different components to what is shown in the figure.
[0116] In some embodiments, a computer device is provided, including a memory,
a
processor and a computer program stored in the memory and ran on the processor
configured
to achieve the following steps when the processor executes the computer
program:
[0117] Receiving user data synchronization request information, the request
information
includes field information of data analyzed by user online;
[0118] Obtaining a fact table and a dimension table corresponding to the field
information,
left joining the fact table and the dimension table to obtain a join table
corresponding to the
17
Date Recue/Date Received 2023-02-27
field information;
[0119] Performing a mapjoin operation to skewed data in the join table to
obtain an optimized
join table;
[0120] Classifying according to data billing time in the optimized join table,
saving the
classified data to corresponding partition of a distributed file system HDFS
cluster to obtain
partitioned data;
[0121] Writing the partitioned data into a temporary table of a column-
oriented database
management unit ClickHouse for data synchronization.
[0122] In some embodiments, the processor performs the following steps when
executing the
computer program: writing the partitioned data into a temporary table of a
column-oriented
database management unit ClickHouse for data synchronization, comprising:
counting data
amount of the temporary table and the optimized join table to obtain the data
amount of the
temporary table and the data amount of the join table; when the data amount of
the temporary
table is consistent with the data amount of the optimized join table, writing
the partitioned data
in the temporary table into a master table of ClickHouse to complete data
synchronization.
[0123] In some embodiments, the processor performs the following steps when
executing the
computer program: the method also comprises: when the data amount of the
temporary table
is inconsistent with the data amount of the join table, re-writing the
partitioned data into the
temporary table of ClickHouse to obtain a rewritten temporary table; when the
data amount in
the rewritten temporary table is consistent with the data amount of the
optimized join table,
writing the partitioned data in the rewritten temporary table into a master
table of ClickHouse
to complete data synchronization.
[0124] In some embodiments, the processor performs the following steps when
executing the
computer program: method also comprises: when the data amount of the temporary
table is
18
Date Recue/Date Received 2023-02-27
consistent with the data amount of the optimized join table, after writing the
partitioned data in
the temporary table into a master table of ClickHouse, recording first status
information of data
in the master table as submission status.
[0125] In some embodiments, the processor performs the following steps when
executing the
computer program: the method also comprises: when the data amount of the
temporary table
is inconsistent with the data amount of the join table, obtaining meta
information and execution
information of the temporary table; recording second status information of
data in the
temporary table as pre-submission status; saving the meta information, the
execution
information and the second status information to a relational database
management system
MySQL.
[0126] In some embodiments, the processor performs the following steps when
executing the
computer program: the method also comprises: obtaining a to-be-synchronized
first fact table
and a reverse table within a preset time; multiplying measurement data in the
first fact table
with a preset value to obtain a prepared fact table; adding measurement data
in the prepared
fact table and the reverse table to obtain a fact reverse table; joining the
fact reverse table for
data synchronization.
[0127] In an embodiment, a computer readable storage medium is provided, the
medium
stored with computer program and the processor performs the following steps
when executing
the computer program:
[0128] Receiving user data synchronization request information, the request
information
includes field information of data analyzed by user online;
[0129] Obtaining a fact table and a dimension table corresponding to the field
information,
left joining the fact table and the dimension table to obtain a join table
corresponding to the
field information;
19
Date Recue/Date Received 2023-02-27
[0130] Performing a mapjoin operation to skewed data in the join table to
obtain an optimized
join table;
[0131] Classifying according to data billing time in the optimized join table,
saving the
classified data to corresponding partition of a distributed file system HDFS
cluster to obtain
partitioned data;
[0132] Writing the partitioned data into a temporary table of a column-
oriented database
management unit ClickHouse for data synchronization.
[0133] In some embodiments, the processor performs the following steps when
executing the
computer program: writing the partitioned data into a temporary table of a
column-oriented
database management unit ClickHouse for data synchronization, comprising:
counting data
amount of the temporary table and the optimized join table to obtain the data
amount of the
temporary table and the data amount of the join table; when the data amount of
the temporary
table is consistent with the data amount of the optimized join table, writing
the partitioned data
in the temporary table into a master table of ClickHouse to complete data
synchronization.
[0134] In some embodiments, the processor performs the following steps when
executing the
computer program: the method also comprises: when the data amount of the
temporary table
is inconsistent with the data amount of the join table, re-writing the
partitioned data into the
temporary table of ClickHouse to obtain a rewritten temporary table; when the
data amount in
the rewritten temporary table is consistent with the data amount of the
optimized join table,
writing the partitioned data in the rewritten temporary table into a master
table of ClickHouse
to complete data synchronization.
[0135] In some embodiments, the processor performs the following steps when
executing the
computer program: method also comprises: when the data amount of the temporary
table is
consistent with the data amount of the optimized join table, after writing the
partitioned data in
the temporary table into a master table of ClickHouse, recording first status
information of data
Date Recue/Date Received 2023-02-27
in the master table as submission status.
[0136] In some embodiments, the processor performs the following steps when
executing the
computer program: the method also comprises: when the data amount of the
temporary table
is inconsistent with the data amount of the join table, obtaining meta
information and execution
information of the temporary table; recording second status information of
data in the
temporary table as pre-submission status; saving the meta information, the
execution
information and the second status information to a relational database
management system
MySQL.
[0137] In some embodiments, the processor performs the following steps when
executing the
computer program: the method also comprises: obtaining a to-be-synchronized
first fact table
and a reverse table within a preset time; multiplying measurement data in the
first fact table
with a preset value to obtain a prepared fact table; adding measurement data
in the prepared
fact table and the reverse table to obtain a fact reverse table; joining the
fact reverse table for
data synchronization.
[0138] The skilled in the art can understand that all or partial of procedures
from the above-
mentioned methods can be performed by computer program instructions through
related
hardware, the mentioned computer program can be stored in a non-volatile
material computer
readable storage medium, this computer can include various embodiment
procedures from the
abovementioned methods when execution. Any reference to the memory, the
storage, the
database, or the other media used in each embodiment provided in current
application can
include non-volatile and/or volatile memory. Non-volatile memory can include
read-only
memory (ROM), programable ROM (PROM), electrically programmable ROM (EPRPMD),
electrically erasable programmable ROM (EEPROM) or flash memory. Volatile
memory can
include random access memory (RAM) or external cache memory. As an instruction
but not
limited to, RAM is available in many forms such as static RAM (SRAM), dynamic
RAM
(DRAMD), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM),
enhanced SRAM (ESDRAM), synchronal link (Synchlink) DRAM (SLDRAM), memory bus
21
Date Recue/Date Received 2023-02-27
(Rambus), direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and
memory bus dynamic RAM (RDRAM), etc.
[0139] The technical features of the above-mentioned embodiments can be
randomly
combined, for concisely statement, not all possible combinations of technical
features in the
abovementioned embodiments are described. However, if there are no conflicts
in the
combinations of these technical features, it shall be within the scope of this
description.
[0140] The above-mentioned embodiments are only several embodiments in this
disclosure
and the description is more specific and detailed but cannot be understood as
the limitation of
the scope of the invention patent. Evidently those ordinary skilled in the art
can make various
modifications and variations to the disclosure without departing from the
spirit and scope of
the disclosure. Therefore, the appended claims are intended to be construed as
encompassing
the described embodiment and all the modifications and variations coming into
the scope of
the disclosure.
22
Date Recue/Date Received 2023-02-27