Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
CA 02459123 2004-02-26
METHOD AND SYSTEM FOR DETECTING POTENTIAL
DEADLOCKS IN COMPUTER PROGRAMS
FIELD OF THE INVENTION
[0001]The present invention relates to computer programming and, in
particular,
to a method and system for the detection of potential deadlocks in computer
programs.
BACKGROUND OF THE INVENTION
[0002] Multithreaded processing of computer programs has become increasingly
common as more popular operating systems provide support for multithreaded
processing. A thread is a portion of a computer program that may be logically
executed in parallel with another portion of the program. Multithreading
allows
for the effective utilization of processor resources when executing a computer
program by having a separate process utilize clock cycles that would otherwise
go unused by a concurrently executing process. Operating systems that support
multithreaded processing typically include a scheduler for coordinating the
processing of multiple threads.
[0003]A computer system executing a multithreaded computer program often
includes shared resources, such as program objects, to which more than one
thread will require access. To address this conflict, many shared resources
are
capable of being locked by a thread, preventing other threads from accessing
the
resource until it is unlocked. Such a shared resource is sometimes referred to
as
a mutual exclusion object ("mutex").
[0004]A problem that arises with multithreaded computer programs is that two
or
more threads may include a certain sequence of locks and unlocks upon shared
resources that, if executed in a particular sequence relative to each other,
could
result in a deadlock. The scheduler typically ensures only the order of events
CA 02459123 2004-02-26
-2-
within each thread, but not the timing of particular events as between
threads.
Accordingly, the timing of the processing of two threads relative to each
other
can result in "freeze" or "lock up" behavior, which may only be exhibited
intermittently.
[0005jReference is made to Figure 1 which shows a diagram illustrating a
potential deadlock situation involving two threads of a computer program. A
first
thread 120 and a second thread 140 both require access to a first resource 100
and a second resource 110. The sequence of steps in the first thread 120
includes a first step 122 of locking the second resource 110 and a second step
124 of locking the first resource 100. The second thread 140 includes steps in
the opposite sequence: a first step 142 locks the first resource 100 and a
second
step 144 locks the second resource 110. Both threads have subsequent steps
for performing some processing action 126, 146, and unlocking the shared
resources 128, 130, 148, and 150.
[0006j It will be appreciated that the threads 120, 140 may be executed in
many
cases without encountering any problems; however, if the threads 120, 140 are
executed such that the two first steps 122, 142 in each thread 120, 140 are
executed one after the other, a deadlock situation will result. For example,
if the
first thread 120 locks the second resource 110 and then the second thread 140
locks the first resource 100, neither thread 120, 140 will be capable of
advancing
to the second steps 124, 144 since neither can ever gain access to the other
shared resource.
[0007jThe conventional tool for identifying deadlocks in computer programs is
a
graphical analysis of the source code to generate a "resource dependency
graph". If the graph is cyclical, a potential deadlock is identified. This
technique
is usually performed by hand, and is therefore time-consuming and complicated.
[0008] Potential deadlocks are difficult to identify during run-time execution
of the
computer program since the errors only arise intermittently. Accordingly, a
brute
force work-around that is commonly employed is to use a timeout value on the
CA 02459123 2004-02-26
-3-
shared resource lock. This solution fails to actually identify and fix the
deadlock
problem.
[0009]A similar problem of deadlocking may be encountered when a wait-trigger
event occurs relative to a lock event, wherein one thread has locked a
resource
and is stalled at a wait function awaiting a trigger event while a second
thread
needs access to the resource before the trigger event can occur, thereby
freezing operation of both threads.
[0010]An automated method of detecting potential deadlocks in a computer
program that addresses, at Least in part, these shortcomings would be
advantageous.
SUMMARY OF THE INVENTION
[0011]The present invention provides a method of detecting a potential
deadlock
during execution of a computer program that identifies the occurrence of
complementary inverse lock sequences on different threads. The method
records sequenced requests for access to resources and assesses whether
previous sequences of requests reveal a potential deadlock situation.
[0012] In one aspect, the present invention provides a method of detecting
potential deadlocks in a multithreaded computer program during execution of
the
computer program on a computer system, the computer system having a
plurality of resources and the computer program including a plurality of
requests
for access to one or more of the resources. The method includes the steps of
receiving one of the requests for access to a selected resource, recording a
list
of previously requested and unreleased resources in a data element associated
with the selected resource, reading data elements associated with the
previously
requested and unreleased resources, and generating a deadlock indicator if the
selected resource appears in one of the data elements associated with the
previously requested and unreleased resources.
CA 02459123 2004-02-26
-4-
(0013] In another aspect the present invention provides a method of detecting
potential deadlocks in a multithreaded computer program during execution of
the
computer program on a computer system, the computer system having a
plurality of shared resources. The method includes the steps of recording a
request sequence evidencing a first sequence of two or more of the resources
concurrently requested by a first thread of the computer program, identifying
a
second sequence of at least two resources concurrently requested by a second
thread of the computer program, and determining whether the at least two
resources in the second sequence are included, in inverse order, in the first
sequence and, if so, generating a potential deadlock indicator.
[0014] In another aspect, the present invention provides a method of detecting
potential deadlocks in a multithreaded computer program during execution of
the
computer program on a computer system, the computer system having a
plurality of resources. The method includes the steps of receiving a request
for
access to a selected resource, receiving a request to wait for an event,
determining whether the selected resource has been released, and generating a
potential deadlock indicator if the selected resource has not been released.
[0015] In a further aspect, the present invention provides a computer system
for
implementing any of above-described methods. In yet a further aspect, the
present invention provides a computer software product having a computer-
readable medium tangibly embodying computer executable instructions for
implementing any of the above-described methods.
[0016] Other aspects and features of the present invention will be apparent to
those of ordinary skill in the art from a review of the following detailed
description
when considered in conjunction with the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] Reference will now be made, by way of example, to the accompanying
CA 02459123 2004-02-26
-5-
drawings which show an embodiment of the present invention, and in which:
[0018] Figure 1 shows a diagram illustrating an example of a potential
deadlock
situation involving two threads of a software program;
[0019] Figure 2 shows a flowchart depicting a method for detecting potential
deadlocks according to the present invention;
[0020] Figures 3a to 3f show, in block diagram form, an illustration by way of
an
example of the steps taken by an embodiment of the present invention; and
[0021]Figure 4 shows a flowchart depicting another method for detecting
potential deadlocks according to the present invention.
[0022]Similar reference numerals are used in different figures to denote
similar
components.
DESCRIPTION OF SPECIFIC EMBODIMENTS
[0023]The following description of one or more specific embodiments of the
invention does not limit the implementation of the invention to any particular
computer language or computer operating system. Any limitations presented
that result from a particular computer language or a particular operating
system
are not intended as limitations of the present invention.
[0024] Reference is first made to Figure 2, which shows a flowchart of a
method
200 for detecting potential deadlocks according to the present invention.
[0025] If the author of an implementation of the method 200 has editorial
control
of the operating system, such as in the case of a system based upon open
source code, existing functions for locking shared resources may be modified
so
as to incorporate additional steps to customize the function without renaming
the
function. For example, in a Java environment, the Java synchronize statement
is
typically employed to lock a shared resource. If the developer has access to
the
code for a Java Virtual Machine (JVM), the method 200 may be implemented
CA 02459123 2004-02-26
-6-
within the JVM to modify the synchronize statement so as to perform the steps
of
the method 200 in addition to the steps for locking the shared resource that
would normally be taken. This provides the implementation with transparency,
allowing subsequent developers to incorporate the method 200 without requiring
knowledge of a custom function.
[0026] In other computer environments, where the author does not have control
of the existing shared resource functions, it may not be possible to customize
the
existing functions. In these cases, the method 200 may be implemented through
"wrapping" the existing lock function with a new deadlock detection lock
function
that implements the method 200 and calls the existing predefined lock
function.
An example of such an environment is one based upon use of the Microsoft
WindowsT"" operating system by developers other than the Microsoft Corporation
itself.
[0027]The method 200 operates during run-time execution of a computer
program. It is triggered when a thread of the computer program requests access
to a shared resource. When the computer program requests access to a shared
resource a lock function is called (either the customized lock function or the
"wrapped" lock function).
[0028]The method 200 begins in step 202 wherein the identity of the requested
resource is recorded in a list of currently locked resources. Typically, the
operating system for the computer system will maintain a stack or list of
locked
resources. For example, the JVM maintains a list of which resources have been
locked. When the JVM receives a request for a resource it consults this list
to
determine whether or not access to the resource should be granted to the
requester based upon whether or not the resource is already locked. This step
202 is a step which would be taken by the conventional lock function to lock
the
requested resource.
[0029] If the method 200 is implemented in a 'wrapper' function, it may be
necessary to maintain a separate data structure identifying which resources
have
CA 02459123 2004-02-26
-7-
been locked.
(0030] In step 204, the stack or list of locked resources is read to determine
which resources were locked and have not yet been unlocked prior to the
current
resource request. At step 206, this list of prior locked and unreleased
resources
is written to a data element or structure associated with the requested
resource.
In other words, for each shared resource there is an associated persistent
data
element that records which other shared resources have been locked and not
released each time an attempt is made to lock the resource. This results in a
set
of records of sequential lock operations contained in the computer program.
[0031]In one embodiment, implemented in a Java environment, the associated
data structure is the requested object itself. Other embodiments include a
separate data object associated with each shared resource.
[0032]The method 200 then continues at step 208 wherein, for each currently
locked and unreleased resource, its associated record of antecedent requests,
i.e. previous sequential lock operations, is read. The associated record for
each
locked resource may contain only antecedent requests made during processing
of the current thread, or it may contain multiple sequences reflecting
multiple
requests to access the resource made by multiple threads.
[0033] In step 210, an assessment is made as to whether the requested resource
appears in the associated records for currently locked and unreleased
resources.
If it does, then it indicates a potential cycle, which could result in a
deadlock
situation. Accordingly, if the requested resource is identified in the
associated
records of locked and unreleased resources, then at step 212 a potential
deadlock indicator is generated. At step 214, execution of the computer
program
is continued.
[0034]The generation of a potential deadlock indicator may include writing the
potential deadlock to a log file or otherwise identifying the portion of the
computer program code that resulted in the potential deadlock. It may also
CA 02459123 2004-02-26
include displaying an error message and it may include halting execution of
the
computer program. Other mechanisms for alerting the computer program
developer to the identification of a potential deadlock will be apparent to
those of
ordinary skill in the art.
[0035] It will be appreciated that the method 200 according to the present
invention attempts to detect potential resource dependency cycles in the
computer program during run-time execution of the program by recording partial
resource lock sequences for each requested resource. With each resource
request, prior lock sequences for resources already locked and not released
are
analyzed to ensure that the requested resource does not appear in a prior lock
sequence. If it does, then a potential deadlock scenario is identified.
[0036] In this manner, the present invention identifies when the computer
program performs a sequential lock of, for example, (A ... B~, and it assesses
whether or not the sequence (B ... A) has previously been encountered on
another thread. If it has, then there is the potential for a deadlock to
occur.
[0037] It will be understood that many of the steps of the method 200 may be
performed in an alternative order without affecting the resulting
determination as
to whether a potential deadlock exists.
[0038] It will also be appreciated that the method 200 according to the
present
invention does not require that an actual deadlock occur in order to identify
a
potential deadlock.
[0039] Reference is now made to Figures 3a through 3f, which show, in block
diagram form, an illustration by way of an example of the steps taken by an
embodiment of the present invention.
[0040] In the example shown in Figures 3a through 3f, a computer system 320
includes four shared resources A, B, C, and D, designated with reference
numerals 302a, 302b, 302c, and 302d, respectively. Each of the shared
resources 302 has an associated data element 304 for storing a list of
CA 02459123 2004-02-26
_g_
antecedent lock sequences. In one embodiment, the data elements 304 are
incorporated within the associated resource 302.
[0041]Also present on the computer system 320 is a list 306 of resources
currently locked. The list 306 may be maintained and controlled by the
operating
system of the computer system 320 or it may be a data structure or object
created and stored on computer system 320 by a module or function
implementing the method 200 (Fig. 2) according to the present invention.
[0042]As shown in Figure 3a, a first thread (not shown) has locked resource A
302a, as indicated by the heavy outline. Accordingly, the identity of resource
A
appears in the list 306 of locked resources.
[0043] Next, as shown in Figure 3b, the first thread locks resource B 302b.
The
identity of resource B is added to the list 306 of locked resources. Resource
A
302a remains locked. Moreover, the identity of resource A is recorded in the
data element 304b associated with resource B 302b, since the list 306
indicates
that resource A 302a was locked and not yet unlocked when the first thread
requested access to resource B 302b.
[0044] Reference is now made to Figure 3c, which shows that the first thread
then locks resource D 302d. Accordingly, the identity of resource D is added
to
the list 306 of locked resources and the identities of resources A and B 302a,
302b are added to the data element 304d associated with resource D 302d.
This evidences the fact that resource D 302d was requested by the first thread
at
a time when the first thread had already locked resources A and B 302a, 302b.
[0045] It is now supposed that the first thread unlocks the locked resources
A, B,
and D 302a, 302b, and 302d. At some point in the computer program execution,
a second thread (not shown) locks resource D 302d, as shown in Figure 3d. The
lock sequence data, AB, from the processing of the first thread continues to
exist
in the data element 304d associated with resource D 302d. The list 306 of
locked resources is updated to record the fact that resource D 3024 has been
CA 02459123 2004-02-26
-10-
locked.
[0046]As shown in Figure 3e, the second thread then locks resource C 302c.
The identity of resource C 302c is added to the list 306 of locked resources
and
the fact that resource D 302d has already been locked by the second thread is
recorded in the data element 302c associated with resource C 302c.
[0047] Resource D 302d was on the list 306 of currently locked resources prior
to
the request to lock resource C 302c, so the contents of the data element 304d
associated with the currently locked resource D 302d are read to determine
whether or not a previous thread has performed a locking sequence that locked
resource C 302c prior to locking resource D 3024. The associated data element
304d identifies a lock sequence of AB. Because resource C 302c does not
appear in the data element 304d, thus far the lock sequences implemented by
the computer program are acyclical, indicating that no potential deadlock
exists.
[0048] Reference is now made to Figure 3f, which illustrates that the second
thread next locks resource B 302b. Accordingly, the identity of resource B
302b
is added to the list 306 of locked resources. The currently locked resources,
resource D 302d and resource C 302c, are added to the data element 304b
associated with resource B 302b. Thus the data element 304b for resource B
302b now includes two lock sequence records: one corresponding to the first
thread, and one corresponding to the second thread.
[0049]The data elements 304c and 304d associated with the previously locked
and not yet unlocked resources C and D 302c, 302d are then read to assess
whether the requested resource, resource B 302b, appears in prior lock
sequences. It will be seen that the data element 304c corresponding to
resource
C 302c contains only the identity of resource D 302d; however, the data
element
304d corresponding to resource D 302d contains the lock sequence AB.
Therefore, it is apparent that resource B 302b was locked prior to resource D
302d by a different thread, whereas the present thread has locked resource D
302d prior to resource B 302b. Accordingly, a potential deadlock has been
CA 02459123 2004-02-26
-11-
identified. In accordance with step 212 (Fig. 2) of the method 200 (Fig. 2),
an
appropriate deadlock indicator is generated.
[0050]Although the method 200 as illustrated by the example shown in Figure 3
records the complete lock sequence in the data element 304 associated with a
requested resource 302, it will be appreciated that in another embodiment only
the identity of the immediately preceding locked resource could be recorded in
the associated data element 304. In this embodiment, the invention recursively
steps back through the data elements 304 of preceding locked resources to
trace the lock sequence and identify any potential deadlocks. Based upon the
foregoing description, other methods and techniques for tracking the
occurrence
of complementary inverse lock sequences on different threads will be apparent
to those of ordinary skill in the art.
[0051]The present invention is not limited to the detection of deadlocks
resulting
from the use of lock functions for accessing mutually exclusive resources. The
invention may also be used to detect potential deadlocks related to "event"
structures. An event includes a function of waiting upon an event and a
complementary function of triggering the event. The wait function is analogous
to a lock and the trigger function is analogous to an unlock. An event
construct
is similar to a lock construct, except whereas a lock-unlock pair occurs
within a
single thread, a wait-trigger pair occurs across separate threads.
[0052]The potential deadlock effect of interacting events and locks across
multiple threads is the same as described above for locks alone. Consider the
following example threads:
THREAD 1 THREAD 2
Lock Mutex A Lock Mutex A
Wait Event B Trigger Event B
Unlock Mutex A Unlock Mutex A
[0053] In the above example, if thread 1 is executed first then Mutex A will
be
locked and the thread will await the triggering of Event B. This trigger can
never
CA 02459123 2004-02-26
-12-
occur because thread 2 cannot access the locked Mutex A. Accordingly, a
deadlock will result.
[0054] Reference is now made to Figure 4, which shows a flowchart depicting
another method 400 for detecting potential deadlocks according to the present
invention. The method 400 of detecting a potential deadlock situation
involving a
wait-trigger event includes a first step 402 of receiving a wait request. The
wait
request has a corresponding trigger event on another thread. The method 400
then includes a second step 404 of assessing whether or not the active thread
has any resources that have been requested and are unreleased when the wait
request is encountered. This step 404 may be performed by consulting the stack
to determine if there are any currently locked resources.
[0055] If there are locked and unreleased resources when the wait request is
encountered, then there is the potential for a deadlock. Accordingly, in the
next
step 406, a deadlock indicator is generated. If there are no locked and
unreleased resources, then the program continues execution 408.
[0056]The method 400 may be rendered more sophisticated by recording the
resource that was locked when the wait request was encountered and then later
assessing whether or not the same resource could be locked during the trigger
event.
[0057]Those of ordinary skill in the art will appreciate that the foregoing
description is not limited to lock functions, and a lock-unlock sequence may
be
generally considered a request and release sequence in relation to a resource.
[0058]The present invention may be embodied in other specific forms without
departing from the spirit or essential characteristics thereof. Certain
adaptations
and modifications of the invention will be obvious to those skilled in the
art.
Therefore, the above discussed embodiments are considered to be illustrative
and not restrictive, the scope of the invention being indicated by the
appended
claims rather than the foregoing description, and all changes which come
within
CA 02459123 2004-02-26
-13-
the meaning and range of equivalency of the claims are therefore intended to
be
embraced therein.