Language selection

Search

Patent 3125546 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 3125546
(54) English Title: DISTRIBUTED LOGICAL TIMESTAMP-BASED MANAGEMENT METHOD AND SYSTEM FOR DISTRIBUTED TRANSACTION
(54) French Title: PROCEDE ET SYSTEME DE GESTION DE TRANSACTION DISTRIBUEE BASES SUR UNE ESTAMPILLE TEMPORELLE LOGIQUE DISTRIBUEE
Status: Dead
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06F 16/27 (2019.01)
(72) Inventors :
  • XU, JIANHUI (China)
  • CHEN, YUANXI (China)
  • HE, GUOMING (China)
  • WANG, TAO (China)
(73) Owners :
  • SEQUOIADB CORPORATION (China)
(71) Applicants :
  • SEQUOIADB CORPORATION (China)
(74) Agent: NELLIGAN O'BRIEN PAYNE LLP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 2020-09-11
(87) Open to Public Inspection: 2021-04-29
Examination requested: 2021-06-29
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/CN2020/114654
(87) International Publication Number: WO2021/077934
(85) National Entry: 2021-06-29

(30) Application Priority Data:
Application No. Country/Territory Date
201911014865.2 China 2019-10-23
201911291498.0 China 2019-12-13

Abstracts

English Abstract

Disclosed are a distributed transaction management method and system based on a distributed logic timestamp. The method comprises: setting a transaction start time of a transaction; if the difference between a local logic time of a data node and a local logic time of a coordination node exceeds a preset threshold value, timing the local logic time of the data node, enabling the coordination node to roll back the transaction, and retrying the transaction after the time of the coordination node is timed; if the difference between a transaction pre-submission time and the transaction start time is greater than a transaction tolerance error of the transaction, sending the transaction pre-submission time and a pre-submission message of the transaction to all data nodes that participate in the transaction; and selecting, according to the difference between timestamps of two different transactions, one data node as an arbitration node to arbitrate timestamp orders of the two different transactions. By means of the present invention network overheads can be reduced while distributed storage and processing requirements are met, thereby effectively improving the overall performance of a system.


French Abstract

La présente invention concerne un procédé et un système de gestion des transactions distribuées basés sur une estampille temporelle logique distribuée. Le procédé comprend les étapes consistant : à définir un temps de début de transaction d'une transaction ; si la différence entre un temps logique local d'un nud de données et un temps logique local d'un nud de coordination dépasse une valeur seuil prédéfinie, à synchroniser le temps logique local du nud de données, permettant au nud de coordination de renvoyer la transaction et de relancer la transaction après que le temps du nud de coordination est écoulé ; si la différence entre un temps de présoumission de transaction et le temps de début de transaction est supérieure à une erreur de tolérance de transaction de la transaction, à envoyer le temps de présoumission de transaction et un message de présoumission de la transaction à tous les nuds de données qui participent à la transaction ; et à sélectionner, selon la différence entre des estampilles temporelles de deux transactions différentes, un nud de données en tant que nud d'arbitrage pour arbitrer des ordres d'estampille temporelle des deux transactions différentes. Au moyen de la présente invention, des surcharges de réseau peuvent être réduites pendant que des exigences de stockage et de traitement distribués sont satisfaites, ce qui permet d'améliorer efficacement les performances globales d'un système.

Claims

Note: Claims are shown in the official language in which they were submitted.


CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
CLAIMS
WHAT IS CLAIMED IS:
1. A distributed logical timestamp¨based management method for a distributed
transaction,
wherein a distributed transaction database comprises a coordination node, a
catalog node, and a data
node, and a local logical timestamp (LLT) of each node is synchronously
calibrated based on a preset
universal logical timestamp (ULT); and
the distributed logical timestamp¨based management method for a distributed
transaction
comprises:
when a transaction begins, setting a transaction begin timestamp (TBT) of the
transaction as an
LLT of the coordination node;
when the data node first receives a message sent by the coordination node,
determining whether a
difference between an LLT of the data node and the LLT of the coordination
node exceeds a preset first
error threshold; and if the difference between the LLT of the data node and
the LLT of the coordination
node exceeds the preset first error threshold, synchronously calibrating the
LLT of the data node,
controlling the data node to return an error message to the coordination node
to enable the coordination
node to roll back the transaction, and retrying the transaction after
synchronously calibrating the LLT
of the coordination node;
when the transaction is pre-committed, setting a transaction pre-commit
timestamp (TPCT) of the
transaction as a current LLT of the coordination node; determining whether a
difference between the
TPCT and the TBT is greater than a transaction tolerance of the transaction;
and if the difference
between the TPCT and the TBT is not greater than the transaction tolerance of
the transaction,
suspending the current operation; or if the difference between the TPCT and
the TBT is greater than
the transaction tolerance of the transaction, sending the TPCT and a
transaction pre-committing
message to all data nodes participating in the transaction; and
when two different transactions access same data, determining whether a
difference between
timestamps of the two different transactions is less than a preset second
error threshold; and if the
difference between the timestamps of the two different transactions is less
than the preset second error
threshold, selecting one data node from target data nodes as an arbitration
node based on a preset
algorithm to arbitrate a sequence of the timestamps of the two different
transactions.
2. The distributed logical timestamp¨based management method for a distributed
transaction
according to claim 1, wherein the preset first error threshold is twice a ULT
tolerance.
3. The distributed logical timestamp¨based management method for a distributed
transaction
according to claim 1, wherein the second error threshold is set as follows:
finding a maximum
14
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
transaction tolerance in different transactions accessing same data, and
setting the second error
threshold to be twice the maximum transaction tolerance.
4. The distributed logical timestamp¨based management method for a distributed
transaction
according to claim 1, wherein a ULT tolerance is dynamically changed based on
a system status, and
the method further comprises:
when the coordination node performs synchronous time calibration with the
catalog node,
obtaining and recording a current ULT tolerance of a system; and
when the transaction obtains the TBT, storing the current ULT tolerance of the
system in a
metadata control block of the transaction.
5. The distributed logical timestamp¨based management method for a distributed
transaction
according to claim 1, further comprising:
when the transaction modifies, deletes, or inserts data, using an identifier
(ID) of the transaction
as a version to mark the data.
6. The distributed logical timestamp¨based management method for a distributed
transaction
according to claim 1, wherein an ID of the transaction is constituted by the
TBT of the transaction and
an ID of the coordination node participating in the transaction.
7. A distributed logical timestamp¨based management system for a distributed
transaction,
wherein a distributed transaction database comprises a coordination node, a
catalog node, and a data
node, and an LLT of each node is synchronously calibrated based on a preset
ULT; and
the distributed logical timestamp¨based management system for a distributed
transaction
comprises:
a transaction time management module, configured to: when a transaction
begins, set a TBT of
the transaction as an LLT of the coordination node;
a transaction access management module, configured to: when the data node
first receives a
message sent by the coordination node, determine whether a difference between
an LLT of the data
node and the LLT of the coordination node exceeds a preset first error
threshold; and if the difference
between the LLT of the data node and the LLT of the coordination node exceeds
the preset first error
threshold, synchronously calibrate the LLT of the data node, control the data
node to return an error
message to the coordination node to enable the coordination node to roll back
the transaction, and retry
the transaction after synchronously calibrating the LLT of the coordination
node;
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
a transaction pre-committing management module, configured to: when the
transaction is pre-
committed, set a TPCT of the transaction as a current LLT of the coordination
node; determine whether
a difference between the TPCT and the TBT is greater than a transaction
tolerance of the transaction;
and if the difference between the TPCT and the TBT is not greater than the
transaction tolerance of the
transaction, suspend the current operation; or if the difference between the
TPCT and the TBT is greater
than the transaction tolerance of the transaction, send the TPCT and a
transaction pre-committing
message to all data nodes participating in the transaction; and
a distributed arbitration module, configured to: when two different
transactions access same data,
determine whether a difference between timestamps of the two different
transactions is less than a
preset second error threshold; and if the difference between the timestamps of
the two different
transactions is less than the preset second error threshold, select one data
node from target data nodes
as an arbitration node based on a preset algorithm to arbitrate a sequence of
the timestamps of the two
different transactions.
8. The distributed logical timestamp¨based management system for a distributed
transaction
according to claim 7, wherein the preset first error threshold is twice a ULT
tolerance.
9. The distributed logical timestamp¨based management system for a distributed
transaction
according to claim 7, wherein the second error threshold is set as follows:
finding a maximum
transaction tolerance in different transactions accessing same data, and
setting the second error
threshold to be twice the maximum transaction tolerance.
10. The distributed logical timestamp¨based management system for a
distributed transaction
according to claim 7, wherein a ULT tolerance is dynamically changed based on
a system status, and
the system further comprises:
a ULT tolerance recording module, configured to: when the coordination node
performs
synchronous time calibration with the catalog node, obtain and record a
current ULT tolerance of the
system; and
a ULT tolerance update module, configured to: when the transaction obtains the
TBT, store the
current ULT tolerance of the system in a metadata control block of the
transaction.
16
Date Recue/Date Received 2021-06-29

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
DISTRIBUTED LOGICAL TIMESTAMP¨BASED MANAGEMENT METHOD AND
SYSTEM FOR DISTRIBUTED TRANSACTION
TECHNICAL FIELD
[0001] The present disclosure relates to the field of database technologies,
and in particular, to a
distributed logical timestamp¨based management method and system for a
distributed transaction.
BACKGROUND
[0002] A distributed database management system plays an extremely important
role in actual
application of a distributed system. The distributed database management
system has all characteristics
of the distributed system, and also imposes higher requirements for data
storage and processing. A
database management system has been widely used since the 1980s. At the
beginning, each database
is deployed on a single computer, and all transactions are processed in the
standalone database. As data
and businesses increase rapidly, the standalone database has been unable to
satisfy requirements for
data storage and processing capabilities, and a distributed database system
has become a preferred
architecture for deployment of all kinds of applications. Distributed storage,
distributed computing,
and low-delay response under high concurrence in a distributed database are of
great significance to
other distributed systems.
[0003] In the distributed database system, a globally unique identifier (ID)
usually needs to be
maintained to distinguish between concurrent transactions and identify
generated or changed data.
Usually, the ID is free from a single-point failure, is in chronological order
or contains time, and can
control data fragmentation. It cannot be set too long.
[0004] Such a globally unique ID is usually implemented based on transaction
initiation time, a
timestamp, and other identification bits. The timestamp is generally generated
based on a clock of
hardware or based on a logical clock implemented by software. When the
timestamp is generated based
on the clock of the hardware, characteristics of atoms and the hardware are
used to ensure that a mutual
error between clock devices in a long time is negligible. In this way, each
computing node in a cluster
can directly query a local clock, and use the local clock as a universal
timestamp. However, this manner
is characterized by high costs, cannot be widely applied. In actual
application, the manner in which
the timestamp is generated based on the logical clock implemented by the
software is used more widely.
In an existing implementation, one global transaction manager (GTM) or one
small GTM cluster is
deployed in a computer cluster of the distributed system to generate and
distribute a unified global ID
including a timestamp. All other computers in the cluster query the GTM
synchronously to obtain an
ID. The ID increases progressively to ensure that a unique ID is queried each
time. However, in this
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
implementation, when the system cluster is large and there are many concurrent
operations, the GTM
will be heavily loaded, and network overheads will be very large, lowering
overall system performance.
In addition, it is very complex to use the small GTM cluster to avoid a single-
point failure, and
reliability cannot be completely guaranteed.
SUMMARY
[0005] Embodiments of the present disclosure provide a distributed logical
timestamp¨based
management method and system for a distributed transaction, to prevent from
forcing all related nodes
to obtain a unique ID from a GTM. This can reduce network overheads while
satisfying requirements
for distributed storage and processing, thereby improving overall system
performance.
[0006] To resolve the foregoing technical problems, an embodiment of the
present disclosure
provides a distributed logical timestamp¨based management method for a
distributed transaction. A
distributed transaction database includes a coordination node, a catalog node,
and a data node, and a
local logical timestamp (LLT) of each node is synchronously calibrated based
on a preset universal
logical timestamp (ULT).
[0007] The distributed logical timestamp¨based management method for a
distributed transaction
includes:
[0008] when a transaction begins, setting a transaction begin timestamp (TBT)
of the transaction as
an LLT of the coordination node;
[0009] when the data node first receives a message sent by the coordination
node, determining
whether a difference between an LLT of the data node and the LLT of the
coordination node exceeds
a preset first error threshold; and if the difference between the LLT of the
data node and the LLT of the
coordination node exceeds the preset first error threshold, synchronously
calibrating the LLT of the
data node, controlling the data node to return an error message to the
coordination node to enable the
coordination node to roll back the transaction, and retrying the transaction
after synchronously
calibrating the LLT of the coordination node;
[0010] when the transaction is pre-committed, setting a transaction pre-commit
timestamp (TPCT)
of the transaction as a current LLT of the coordination node; determining
whether a difference between
the TPCT and the TBT is greater than a transaction tolerance of the
transaction; and if the difference
between the TPCT and the TBT is not greater than the transaction tolerance of
the transaction,
suspending the current operation; or if the difference between the TPCT and
the TBT is greater than
the transaction tolerance of the transaction, sending the TPCT and a
transaction pre-committing
message to all data nodes participating in the transaction; and
[0011] when two different transactions access same data, determining whether a
difference between
2
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
timestamps of the two different transactions is less than a preset second
error threshold; and if the
difference between the timestamps of the two different transactions is less
than the preset second error
threshold, selecting one data node from target data nodes as an arbitration
node based on a preset
algorithm to arbitrate a sequence of the timestamps of the two different
transactions.
[0012] Further, the preset first error threshold is twice a ULT tolerance.
[0013] Further, the second error threshold is set as follows: finding a
maximum transaction tolerance
in different transactions accessing same data, and setting the second error
threshold to be twice the
maximum transaction tolerance.
[0014] Further, the ULT tolerance is dynamically changed based on a system
status, and the method
includes:
[0015] when the coordination node performs synchronous time calibration with
the catalog node,
obtaining and recording a current ULT tolerance of a system; and
[0016] when the transaction obtains the TBT, storing the current ULT tolerance
of the system in a
metadata control block of the transaction.
[0017] Further, the method includes:
[0018] when the transaction modifies, deletes, or inserts data, using an ID of
the transaction as a
version to mark the data.
[0019] Further, the ID of the transaction is constituted by the TBT of the
transaction and an ID of the
coordination node participating in the transaction.
[0020] To resolve the same technical problems, the present disclosure further
provides a distributed
logical timestamp¨based management system for a distributed transaction. A
distributed transaction
database includes a coordination node, a catalog node, and a data node, and an
LLT of each node is
synchronously calibrated based on a preset ULT.
[0021] The distributed logical timestamp¨based management system for a
distributed transaction
includes:
[0022] a transaction time management module, configured to: when a transaction
begins, set a TBT
of the transaction as an LLT of the coordination node;
[0023] a transaction access management module, configured to: when the data
node first receives a
message sent by the coordination node, determine whether a difference between
an LLT of the data
node and the LLT of the coordination node exceeds a preset first error
threshold; and if the difference
between the LLT of the data node and the LLT of the coordination node exceeds
the preset first error
threshold, synchronously calibrate the LLT of the data node, control the data
node to return an error
message to the coordination node to enable the coordination node to roll back
the transaction, and retry
the transaction after synchronously calibrating the LLT of the coordination
node;
3
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
[0024] a transaction pre-committing management module, configured to: when the
transaction is pre-
committed, set a TPCT of the transaction as a current LLT of the coordination
node; determine whether
a difference between the TPCT and the TBT is greater than a transaction
tolerance of the transaction;
and if the difference between the TPCT and the TBT is not greater than the
transaction tolerance of the
transaction, suspend the current operation; or if the difference between the
TPCT and the TBT is greater
than the transaction tolerance of the transaction, send the TPCT and a
transaction pre-committing
message to all data nodes participating in the transaction; and
[0025] a distributed arbitration module, configured to: when two different
transactions access same
data, determine whether a difference between timestamps of the two different
transactions is less than
a preset second error threshold; and if the difference between the timestamps
of the two different
transactions is less than the preset second error threshold, select one data
node from target data nodes
as an arbitration node based on a preset algorithm to arbitrate a sequence of
the timestamps of the two
different transactions.
[0026] Further, the preset first error threshold is twice a ULT tolerance.
[0027] Further, the second error threshold is set as follows: finding a
maximum transaction tolerance
in different transactions accessing same data, and setting the second error
threshold to be twice the
maximum transaction tolerance.
[0028] Further, the ULT tolerance is dynamically changed based on a system
status, and the system
includes:
[0029] a ULT tolerance recording module, configured to: when the coordination
node performs
synchronous time calibration with the catalog node, obtain and record a
current ULT tolerance of the
system; and
[0030] a ULT tolerance update module, configured to: when the transaction
obtains the TBT, store
the current ULT tolerance of the system in a metadata control block of the
transaction.
[0031] According to the present disclosure, based on a fully distributed
logical clock mechanism,
there is no need to force all related nodes to obtain a unique ID from a GTM.
This can reduce network
overheads while satisfying requirements for distributed storage and
processing, thereby effectively
improving overall system performance.
BRIEF DESCRIPTION OF THE DRAWINGS
[0032] FIG. 1 is a schematic diagram of a system architecture of a distributed
database according to
an embodiment of the present disclosure;
[0033] FIG. 2 is a schematic diagram of performing synchronous time
calibration by various nodes
according to an embodiment of the present disclosure;
4
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
[0034] FIG. 3 is a schematic flowchart of a distributed logical
timestamp¨based management method
for a distributed transaction according to an embodiment of the present
disclosure;
[0035] FIG. 4 is another schematic flowchart of a distributed logical
timestamp¨based management
method for a distributed transaction according to an embodiment of the present
disclosure; and
[0036] FIG. 5 is a schematic structural diagram of a distributed logical
timestamp¨based management
system for a distributed transaction according to an embodiment of the present
disclosure.
DETAILED DESCRIPTION
[0037] The following clearly and completely describes the technical solutions
in the embodiments of
the present disclosure with reference to the accompanying drawings in the
embodiments of the present
disclosure. Apparently, the described embodiments are merely some rather than
all of the embodiments
of the present disclosure. All other embodiments obtained by a person of
ordinary skill in the art based
on the embodiments of the present disclosure without creative efforts shall
fall within the protection
scope of the present disclosure.
[0038] Refer to FIG. 1. An embodiment of the present disclosure provides a
distributed logical
timestamp¨based management method for a distributed transaction. A distributed
transaction database
includes a coordination node, a catalog node, and a data node, and an LLT of
each node is
synchronously calibrated based on a preset ULT.
[0039] In a system shown in FIG. 1, a distributed database (the distributed
transaction database) is
used as an example. An architecture in which storage and computing are
separated is used for the
distributed database. There are three different types of nodes in the system,
and the nodes can be
horizontally expanded. A coordination node is responsible for delivering a
request to a desired data
node. A data node is responsible for data access and storage. A catalog node
stores system metadata
and partition-related information. To ensure high reliability of the system,
each master data node has
a plurality of slave data nodes, and the master data node synchronizes data to
the slave nodes based on
a synchronization log. Similarly, there are also master and slave catalog
nodes. The coordination node
only performs intermediate computing and processing, and does not keep a
status. Therefore, there is
no slave coordination node.
[0040] For brief description, some terms in this specification are described
by abbreviations. It should
be noted that an abbreviation corresponding to a term has a same meaning as
the term. The terms are
explained as follows:
[0041] LLT: local logical timestamp. It indicates an LLT maintained by each
node (minimum unit:
microsecond).
[0042] ULT: universal logical timestamp (minimum unit: microsecond).
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
[0043] LRT: local real timestamp. It indicates local coordinated universal
time (UTC) of a node.
[0044] ULT tolerance: universal logical timestamp tolerance. It can be
dynamically adjusted based
on a system status. An initial default value may be set to 1 ms.
[0045] ULT SyncInternal: universal logical timestamp synchronization interval.
It indicates an
interval at which ULTs are synchronized.
[0046] TBT: transaction begin timestamp. It indicates time at which a
transaction begins.
[0047] TPCT: transaction pre-commit timestamp. It indicates time at which a
transaction is pre-
committed.
[0048] TCT: transaction commit timestamp. It indicates time at which a
transaction is committed.
[0049] Refer to FIG. 2. A mechanism of calculating and synchronizing a logical
timestamp in the
present disclosure is as follows:
[0050] ULT allocation may be performed by an independent node. However, the
catalog node in the
system is lightly loaded. Therefore, in a preferred solution, a function of
generating and maintaining
the ULT can be integrated into the catalog node. When the entire system is
started, the ULT may be set
as an LRT of the catalog node. After that, the ULT is generated by a master
catalog node through
accumulation based on a difference between central processing unit (CPU)
ticks. In this way, the ULT
approximates real time, but does not depend on precise machine time. On the
catalog node, the ULT
increases progressively, with a minimum accuracy of microseconds (us). The
master catalog node
writes the ULT into a disk at an ULT SyncInterval for persistence, and copies
the ULT to a slave catalog
node based on a log. Each time the master catalog node is restarted or
switched, a new master catalog
node compares (ULT + ULT SyncInterval) in the log and a current LRT, and uses
the larger one as a
new ULT.
[0051] When being started, each coordination node and data node obtain the ULT
from the catalog
node as their own LLTs. A slave data node directly performs synchronization
with a master data node
to obtain and set an LLT of the slave data node. In a running process of the
system, each coordination
node and data node regularly perform synchronous time calibration with the
catalog node based on a
Network Time Protocol (NTP) algorithm, to adjust their own LLTs. The ULT
SyncInterval is
configurable, and may be set to 60 seconds by default. Specific
synchronization steps are as follows:
[0052] An NTP client sends a delay calculation request to an NTP server (a
node that needs to perform
time synchronization is defined as the NTP client, and a time synchronization
source (for example, the
master catalog node) is defined as the NTP server).
[0053] A time delay can be calculated based on one request and a corresponding
response.
[0054] ti is an LLT at which the NTP client sends the request.
[0055] t2 is an LLT at which the NTP server receives the request.
6
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
[0056] t3 is an LLT at which the NTP server sends the response.
[0057] t4 is an LLT at which the NTP client receives the response.
[0058] A network delay is equal to (t2 ¨ ti) + (t4 ¨ t3).
[0059] The time delay is equal to [(t2 ¨ ti) + (t3 ¨ t4)]/2.
[0060] A ULT tolerance (1 ms by default) is set as follows:
[0061] If the time delay is greater than 0, a updated client LLT is set to the
original client LLT + the
time delay.
[0062] If the time delay is less than 0, the client is forcibly suspended for
the time delay.
[0063] If the network delay is greater than the ULT tolerance,
[0064] time synchronization is re-initiated.
[0065] If the network delay is greater than the ULT tolerance for five times,
the ULT tolerance is set
to the ULT tolerance multiplied by a change factor.
[0066] If there is no additional time calibration request within a period of
time, the ULT tolerance is
set to the ULT tolerance divided by the change factor. The change factor may
be set to 1.2.
[0067] After each coordination node/data node performs time calibration with
the catalog log by the
foregoing mechanism, it can be ensured that an LLT error between various
coordination nodes/data
nodes does not exceed 2*ULT tolerance.
[0068] In this embodiment of the present disclosure, logical timestamps are
used to determine a
sequence of distributed transactions and data visibility. In the distributed
database system, each
transaction selects one coordination node to access the database. The selected
coordination node may
be specified by an application, may be any coordination node, or may be
allocated through load balance.
In addition, a quantity of coordination nodes may be dynamically adjusted
based on system load. A
two-phase commit mechanism (XA) is used to maintain consistency of a
distributed transaction on
various nodes. The following will describe an entire transaction access
process based on the
mechanism.
[0069] As shown in FIG. 3 and FIG. 4, the distributed logical timestamp¨based
management method
for a distributed transaction includes the following steps.
[0070] Step 51: When a transaction begins, set a TBT of the transaction as an
LLT of the coordination
node. Further, an ID of the transaction is constituted by the TBT of the
transaction and an ID of the
coordination node participating in the transaction.
[0071] Further, the ULT tolerance of the system is dynamically changed based
on a system status.
When the coordination node performs synchronous time calibration with the
catalog node, a current
ULT tolerance of the system is obtained and recorded.
[0072] When the transaction obtains the TBT, the current ULT tolerance of the
system is stored in a
7
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
metadata control block of the transaction.
[0073] In this embodiment of the present disclosure, when the transaction
begins, an LLT is obtained
from the coordination node, and is set as the TBT. The corresponding
transaction ID (TID) may be
constituted by the TBT and the ID of the coordination node. If 50 bits are
used to represent the TBT
accurate to microseconds and 14 bits are used to identify the ID of the
coordination node, the system
can support use of 16K nodes for 35 years, and the 64-bit ID will not be
repeated. To support a larger
system cluster or longer time, more bits can be used to identify the TBT and
the node ID. The 64-bit
ID is used as an example herein.
TBTime (Transaction Begin NodeID
Time)
50 bits 14 bits
[0074] The ULT tolerance of the system is dynamically changed based on the
system status
(including the network delay). Therefore, a ULT tolerance of each transaction
when the transaction is
initiated may be different. Therefore, when the coordination node performs
time calibration with the
catalog node, the system obtains the current ULT tolerance of the system,
delivers the ULT tolerance
when the transaction obtains the TBT, and stores the ULT in the metadata
control block related to the
transaction.
[0075] Step S2: When the data node first receives a message sent by the
coordination node, determine
whether a difference between an LLT of the data node and the LLT of the
coordination node exceeds
a preset first error threshold; and if the difference between the LLT of the
data node and the LLT of the
coordination node exceeds the preset first error threshold, synchronously
calibrate the LLT of the data
node, control the data node to return an error message to the coordination
node to enable the
coordination node to roll back the transaction, and retry the transaction
after synchronously calibrating
the LLT of the coordination node. Further, the preset first error threshold is
twice the ULT tolerance of
the system. Further, when the transaction modifies, deletes, or inserts data,
the ID of the transaction is
used as a version to mark the data.
[0076] It should be noted that the first message sent by the coordination node
to the data node
includes the TBT and the LLT of the coordination node. After receiving the
message, the data node
compares its own LLT and the LLT of the coordination node in the message. If
the difference between
the two LLTs does not exceed 2*ULT tolerance, the data node performs an
operation normally. If the
difference between the two LLTs exceeds 2*ULT tolerance, the data node
performs time calibration
with the catalog node, returns an error, requests the coordination node to
first roll back the transaction,
and then retries the transaction after performing time calibration with the
catalog node. When the
transaction modifies/deletes/inserts the data, the ID of the transaction is
used as the version to mark
the data.
8
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
[0077] Step S3: When the transaction is pre-committed, set a TPCT of the
transaction as a current
LLT of the coordination node; determine whether a difference between the TPCT
and the TBT is
greater than a transaction tolerance of the transaction; and if the difference
between the TPCT and the
TBT is not greater than the transaction tolerance of the transaction, suspend
the current operation; or
if the difference between the TPCT and the TBT is greater than the transaction
tolerance of the
transaction, send the TPCT and a transaction pre-committing message to all
data nodes participating
in the transaction.
[0078] In this embodiment of the present disclosure, when the transaction is
pre-committed, the
current LLT of the coordination node is used as the TPCT of the transaction,
and the current LLT of
the coordination node, along with the pre-committing message, is sent to all
the data nodes
participating in the transaction. The system requires that TPCT > TBT + ULT
tolerance. Otherwise,
the current operation is suspended till the condition is satisfied. A sequence
of different transactions
on the data node is determined based on TIDs of the transactions. A typical
application scenario is
determining of data visibility in a multi-version data scenario.
[0079] Step S4: When two different transactions access same data, determine
whether a difference
between timestamps of the two different transactions is less than a preset
second error threshold; and
if the difference between the timestamps of the two different transactions is
less than the preset second
error threshold, select one data node from target data nodes as an arbitration
node based on a preset
algorithm to arbitrate a sequence of the timestamps of the two different
transactions. Further, the
second error threshold is set as follows: finding a maximum transaction
tolerance in different
transactions accessing same data, and setting the second error threshold to be
twice the maximum
transaction tolerance.
[0080] It should be noted that, in a distributed scenario, transactions are
initiated from a plurality of
different coordination nodes, and obtain their own timestamps separately.
Therefore, in a high-
concurrence scenario, these timestamps may be very approximate. In addition,
the coordination nodes
initiating the transactions perform time calibration with the catalog node at
different time. Therefore,
ULT tolerances obtained by the transactions may be different. The system needs
to record a
corresponding ULT tolerance when each transaction begins. It is assumed that
two transactions obtain
a ULT tolerance A and a ULT tolerance B respectively. When a difference
between timestamps of the
two transactions is less than 2*max(ULT tolerance A + ULT tolerance B), it is
regarded that a sequence
of the timestamps of the two transactions is within an error range, and
therefore, determining cannot
be performed, and arbitration needs to be performed. It should be noted that
arbitration is possibly
needed only when different transactions need to access same data.
[0081] An algorithm (for example, a hash algorithm) may be used to select one
data node from data
9
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
nodes participating in each of the two transactions as an arbitration node to
determine the sequence of
the two transactions. For example, IDs of nodes participating in a transaction
A are (1, 2, 3, 4), and IDs
of nodes participating in a transaction B are (3, 4, 5, 6). In this case, the
arbitration node is Func(3, 4).
In this way, the same node performs arbitration for subsequent comparison
between the transaction A
and the transaction B. A result of the first arbitration is used to determine
a sequence of the two
timestamps, and the result is reserved for subsequent queries. The arbitration
result may be cleared
after both the transaction A and the transaction B end or when the oldest
transaction of the system is
generated after max(TBT A, TBT B). The arbitration algorithm is not limited,
and may be selected
randomly or by comparing node IDs or priorities of transaction types.
[0082] The arbitration result needs to be synchronized to a slave node of the
arbitration node in a
specified manner, to ensure that when a single-node fault occurs, the slave
node can still serve as the
arbitration node after becoming a master node, and return the same arbitration
result. For the
synchronization manner, refer to an existing master/slave synchronization
manner.
[0083] Arbitration is performed at a low probability, and is performed only
when two timestamps are
very approximate and common data is used. Arbitration is performed based on
nodes participating in
a transaction, and therefore, does not cause a single-point failure or a
performance bottleneck, so the
arbitration transactions and the nodes are fully distributed. Therefore,
arbitration does not affect overall
system performance.
[0084] It should be noted that, in different scenarios, the same mechanism may
be used to perform
arbitration as required to satisfy system requirements. In the foregoing
example, different timestamps
are compared to determine a sequence of starting or committing corresponding
transactions, or a TBT
of one transaction and a TCT of another transaction are compared to determine
data visibility. In actual
application, key values of other types may be compared in a distributed manner
to determine the
sequence.
[0085] It should be noted that, for the sake of simplicity, the foregoing
method or process
embodiments are described as a series of action combinations, but a person
skilled in the art will
recognize that the embodiments of the present disclosure are not limited by
the sequence of actions
described, certain steps may be performed in another order or at the same time
according to the
embodiments of the present disclosure. In addition, it should be understood by
a person skilled in the
art that the embodiments described in this specification are preferred
embodiments and the related
actions are not necessarily necessary for the embodiments of the present
disclosure.
[0086] It may be understood that the present disclosure provides one logical
timestamp maintained
in the entire system, and the logical timestamp is used in the distributed
database to determine a
sequence of starting a transaction and data visibility. When timestamps of
transactions are approximate
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
and a sequence of the timestamps cannot be directly determined, the system
performs arbitration by a
distributed arbitration mechanism. In this way, a management mechanism without
a single bottleneck
is implemented for distributed transactions in the distributed database.
[0087] Compared with the prior art, the embodiments of the present disclosure
have the following
beneficial effects.
[0088] Compared with implementation of a universal timestamp by hardware,
implementation by the
system has advantages of low costs, easy implementation, and easy
popularization. Compared with
implementation of a distributed logical timestamp by other software,
implementation by the system
has the following advantages.
[0089] 1. The logical timestamp is generated by a node participating in a
transaction instead of by a
globally unified timestamp management node. This reduces corresponding network
interactions,
avoids a single-node performance bottleneck and a single-point failure, and
can provide high-
frequency and high-concurrency transaction services.
[0090] 2. A universal fault tolerance parameter is dynamically adaptive, and
can adapt to different
distributed environments and scenarios.
[0091] 3. In an entire transaction process, a parameter indicating waiting
time triggered by fault
tolerance is only needed for time calibration performed by the coordination
node and the data node
when the transaction begins, and an interval between beginning of the
transaction and committing of
the transaction is required. Therefore, waiting triggered by fault tolerance
due to concurrent
transactions does not occur. The time calibration is of a very small
occurrence probability, and is
performed only when a global system environment (for example, a network or
machine fault occurs)
changes significantly. The interval does not affect an overall system
throughput in normal transaction
logic, and only has a certain delay requirement (in milliseconds), so that
demands for most applications
can be satisfied.
[0092] 4. Different timestamp lengths and node ID lengths are configured to
easily support clusters
with different scales.
[0093] The distributed arbitration mechanism is introduced to eliminate
blocking or waiting caused
when timestamps of a plurality transactions are approximate in an existing
software-based solution
(for details, refer to google spanner). The arbitration mechanism selects an
arbitrator based on nodes
participating a transaction, and is free from a single-point performance
bottleneck and a single-point
failure.
[0094] Refer to FIG. 5. To resolve the same technical problems, the present
disclosure further
provides a distributed logical timestamp¨based management system for a
distributed transaction. A
distributed transaction database includes a coordination node, a catalog node,
and a data node, and an
11
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
LLT of each node is synchronously calibrated based on a preset ULT.
[0095] The distributed logical timestamp¨based management system for a
distributed transaction
includes:
[0096] a transaction time management module 1, configured to: when a
transaction begins, set a TBT
of the transaction as an LLT of the coordination node;
[0097] a transaction access management module 2, configured to: when the data
node first receives
a message sent by the coordination node, determine whether a difference
between an LLT of the data
node and the LLT of the coordination node exceeds a preset first error
threshold; and if the difference
between the LLT of the data node and the LLT of the coordination node exceeds
the preset first error
threshold, synchronously calibrate the LLT of the data node, control the data
node to return an error
message to the coordination node to enable the coordination node to roll back
the transaction, and retry
the transaction after synchronously calibrating the LLT of the coordination
node;
[0098] a transaction pre-committing management module 3, configured to: when
the transaction is
pre-committed, set a TPCT of the transaction as a current LLT of the
coordination node; determine
whether a difference between the TPCT and the TBT is greater than a
transaction tolerance of the
transaction; and if the difference between the TPCT and the TBT is not greater
than the transaction
tolerance of the transaction, suspend the current operation; or if the
difference between the TPCT and
the TBT is greater than the transaction tolerance of the transaction, send the
TPCT and a transaction
pre-committing message to all data nodes participating in the transaction; and
[0099] a distributed arbitration module 4, configured to: when two different
transactions access same
data, determine whether a difference between timestamps of the two different
transactions is less than
a preset second error threshold; and if the difference between the timestamps
of the two different
transactions is less than the preset second error threshold, select one data
node from target data nodes
as an arbitration node based on a preset algorithm to arbitrate a sequence of
the timestamps of the two
different transactions.
[0100] Further, the preset first error threshold is twice a ULT tolerance.
[0101] Further, the second error threshold is set as follows: finding a
maximum transaction tolerance
in different transactions accessing same data, and setting the second error
threshold to be twice the
maximum transaction tolerance.
[0102] Further, the ULT tolerance is dynamically changed based on a system
status, and the system
includes:
[0103] a ULT tolerance recording module, configured to: when the coordination
node performs
synchronous time calibration with the catalog node, obtain and record a
current ULT tolerance of the
system; and
12
Date Recue/Date Received 2021-06-29

CA 03125546 2021-06-29
Our Ref: 43970-7
CA National Entry of PCT/CN2020/114654
(CA2101012H-PCT)
[0104] a ULT tolerance update module, configured to: when the transaction
obtains the TBT, store
the current ULT tolerance of the system in a metadata control block of the
transaction.
[0105] It may be understood that the foregoing system embodiment corresponds
to the method
embodiment of the present disclosure. The distributed logical timestamp¨based
management system
for a distributed transaction in the embodiments of the present disclosure can
implement the distributed
logical timestamp¨based management method for a distributed transaction in any
one of the method
embodiments of the present disclosure.
[0106] The descriptions above are preferred implementations of the present
disclosure, and it should
be noted that for a person of ordinary skill in the art, various improvements
and modifications can be
made without departing from the principles of the present disclosure. These
improvements and
modifications should also be regarded as falling into the protection scope of
the present disclosure.
13
Date Recue/Date Received 2021-06-29

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(86) PCT Filing Date 2020-09-11
(87) PCT Publication Date 2021-04-29
(85) National Entry 2021-06-29
Examination Requested 2021-06-29
Dead Application 2024-03-21

Abandonment History

Abandonment Date Reason Reinstatement Date
2024-03-11 FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee 2021-06-29 $408.00 2021-06-29
Request for Examination 2024-09-11 $816.00 2021-06-29
Maintenance Fee - Application - New Act 2 2022-09-12 $100.00 2022-08-19
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
SEQUOIADB CORPORATION
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Abstract 2021-06-29 1 23
Claims 2021-06-29 3 162
Drawings 2021-06-29 4 393
Description 2021-06-29 13 763
Representative Drawing 2021-06-29 1 34
Patent Cooperation Treaty (PCT) 2021-06-29 2 123
International Search Report 2021-06-29 4 144
Amendment - Abstract 2021-06-29 2 111
National Entry Request 2021-06-29 7 238
Representative Drawing 2021-09-15 1 40
Cover Page 2021-09-15 1 69