Note: Descriptions are shown in the official language in which they were submitted.
CA 02299505 2000-02-04
wo 99romo4 pcTius~t~s
DATA SECURITY IN MULTIPOINT PUBLISH/SUBSCRIBE
- COMMUNICATIONS
BACKGROUND OF THE INVENTION
S
Technical Field -
This invention relates to multipoiat publish/subscribe communications and,
more
particularly, to secure encrypted communications between computer-based
publisher
and subscriber applications in a publish/subscribe communications system.
IO
Backgrnaad
It is well known that today's electronic communication environment needs
secure .
communications. This need has led to the coding or encryption of electronic
communications. Many encryption algorithms have been proposed and are in use.
15 Probably one of the best known algorithms is the Diffie-Hellman key
exchange .
protocol that allows two parties to establish a shared secret over a public
channel. -
The Diffie-Hellman protocol is based on a "public key" technology relying on
the
difficulty of reversing an exponrntiation function in modular arithmetic. A
description of the Lie-Hellman key exchange protocol can be found in numerous
20 sources. See, for example, Applied Cryptography, Second Edition, by Bruce
Schneier, section 22.1. (ISBN 0-471-11709-9).
One of the main problems with current encrypted (secret) electronic
communications,
however, is that these communications are all point-to-point. Point-to-point
25 communications are basically single source to single receiver
communications in
which both parties (source and receiver) require advance knowledge of the
other
party. At a minimum, this is usually the address of the other party and some
knowledge of the encryption key used by the other party. Thus, a sander
"gives" one
half of the key to the receiver and, at a later stage and independent of the
giving of the
30 key, encrypts a communication (electronic message) and sends that encrypted
message to the receiver. The receiver then decrypts the received message by
using
the previously received key.
1-
CA 02299505 2000-02-04
W O 99N7104 PCTNS98116365
This traditional, point-to-point communication has significant disadvantages.
One of
the primary being is that the advance knowledge of the parties make
conventional
encryption products technological solutions inappropriate in anonymous
publishlsubscribe environments.
In a typical anonymous publishlsubscribe technologies - such as described in
US
patents 5,557,798; 5,339,392; 5,257,369 and 5,187,787 - a publisher
application
publishes information to requesting or subscriber applications without having
any
knowledge of the number, identity or address of any such subscriber
applications. In
fact, no subscriber applications may exist. Instead of knowing about
subscribers, a
publisher will merely publish information applying a context or subject
"label" to the
published message. A subscriber then identifies desired messages by the
content label .
and receives only those messages relevant to the desired content.
Publish/subseribe interactions are event-driven rather than demand-driven. In
this
paradigm, producers (also called publishers) disseminate data to multiple
consumers
(also called subscribers). Publish/subscribe interactions are driven by events
(usually
the arrival or creation of data) in a producer. Consumers place a standing
request for
data by subscribing, but publishing events are independent of subscriptions.
Communication is in one direction only, and is often one-to-many. Example
applications include:
Securities data feed handlers publish the latest stock prices to hundreds of
traders on a trading floor simultaneously;
Materials movement systems distribute data to various materials handlers,
controllers and tracking systems on a factory floor,
Inventory levels flow continuously to accounting, purchasing, marketing and
other departments in a retail store;
A publish/subscribe interaction consists of only one broadcast message. A
subscription is an implicit request-for messages.
2-
CA 02299505 2000-02-04
WO 99107104 PCTNS98/16365
In publish/subscribe interactions producers are decoupled from consumers-they
do not
coordinate data transmission with each other. Producers publish data to the
network
at large. Consumers can receive messages with any subject names. Any
application
can be both a producer and a consumer.
This event-driven paradigm is a natural and effrcient style of computing for
many
applications, including data warehousing, Internet mirrors, incremental
schedulers, ,
just-in-time inventory management, real-time decision support, and data
monitoring
applications. Event-driven computing is emerging as a dominant paradigm for
many
business applications.
The advantages of s~h a publish subscribe, content-based addressing systems
are .
well known and include the ability to totally decouple subscribers and
publishers from
one another. This decoupling allows publishers and subscribers to operate
without '
having any knowledge of the identity, location or address, or communication _
protocols of each other. The flexibility that this offers is enonmous and,
accordingly,
such content/subject-based addressing communication-environments are becoming
increasingly popular. Unfortunately, the very advantages (such as anonymous
decoupling) of these systems, precludes the use of conventional "knowledge
based"
encrypted messaging protocols; protocols that require advanced knowledge
between
applications. Thus, a very real need exits for having both the advantage of
encrypted
messaging and the advantages of content-based, anonymous publish/subscribe
environments.
Similar comments apply to "broadcast" and "bulletin-board" communication
environments. In "broadcast" environments all applications receive every
message
published by every application. Thus, it is not necessary for publishers to
know the
identity of subscribers. In "bulletin-board" environments, subscribers access
or join a
"bulletin-board" type of environment to which messages are posted by a
publisher. In
these environments, the publisher does not necessarily know individual
subscribers'
addresses but, at a minimum, must know the address and/or other details of the
bulletin-board.
3
CA 02299505 2000-02-04
WO 99/07104 PCTNS98I16365
Moreover, it is frequently necessary or desirable to send encrypted data from
publisher to subscriber, especially in the case of personal and privileged
information,
as well as financial information.
Whatever the case, however, both the broadcast and bulletin approaches to
messaging
do not readily lend themselves to encrypting messaging primarily because of a
degree
of anonymity between publisher and subscriber (sender or receiver). Thus,
these
environments also present a obstacle to the implementation of encrypted
messaging.
1 O SUMMARY OF THE INVENTION
Briefly, therefore, this invention provides for securelencrypted messaging
that is
applicable to point-to-point as well as other environments such as anonymous
content-based, publish/subscribe, broadcast and bulletin board environments. '
Generally, our invention is a method of and a program product for publishing
encrypted data in a publish/subscribe communications system having a publisher
and
at least one subscriber (and preferably more). The method includes the steps
of
initiating a key exchange between the subscriber and the publisher, publishing
encrypted data to a subscriber, and thereafter decrypting the published data.
In one embodiment, a publisher publishes an encrypted message with an
identifying
"tag" or "label" and the address of a publisher-side encrypting/decrypting
"responder." This message is sent over conventional data communications
channels
in a computer network and is received by a subscriber - having identified the
message by its "tag" or "label" - who then attempts to rrad the message.
As the message is encrypted, the subscriber cannot read it, so the unread,
encrypted
message is sent to a subscriber-side encrypting/decrypting responder
associated with
the subscriber. The subscriber responder realizes that the message is
encrypted and,
using the publisher responder address attached to the encrypted message, sends
a .
message to the publisher-side responder.
4
CA 02299505 2000-02-04
WO 9910'1104 PCTNS98It6365
As will be described more fully herein below, when a subscriber receives it's
first
message from a given publisher, it will attempt to decrypt it, but will get a
"not
enough info" error. In response to this error, the subscriber will initiate a
key
exchange with the responder specified in the message. Once the key exchange is
complete, the subscriber will be able to decrypt incoming data from the
publisher, and
can optionally now decrypt the original message that initiated the exchange.
Once a
key exchange has taken place, all subsequent messages from the publisher can
be
decrypted as they are received.
According to a still further embodiment, our invention can be used for
authentication
only. When used only to authenticate messages, and not to encrypt them, many
parts
of the protocol are simplified. For instance, a full key-exchange requires two
complete turn-grounds on the network (four messages). When authenticating a
message only, it may only be required to get a certificate from the sender.
This is
accomplished in one turnaround (two messages). Similarly, if a publisher
decides to
publish its certificate along with the data stream, then subscribers can
verify the
identity of the sender without any interactions at all.
The encrypted message can be sent along non-data communication paths, using
instead a more secure path. The subscriber responder acknowledges receipt of
the
originally encrypted message and starts the process of confirmed and verified
message exchanges using and receiving/checking a series of digital signatures.
This
process leads to the subscriber receiving the key to decipher the originally
encrypted
message and, in most cases, subsequent messages from the same publisher.
Advantstges of the Invention
This inventive process has a number of advantages. It provides secure
' communications in rriultipoint environments by allowing communicating
applications
to authenticate messages by creating and checking digital signatures. It does
this
through two major functions:
(i) using message encryption and decryption functions; functions that operate
on
messages and create and verify signatures; and
5.
CA 02299505 2000-02-04
WO y9~p71p4 PCTNS98/16365
(ii) using key and certificate exchange protocol; a protocol that employs
messaging to securely- exchange key information between publishers and
subscribers.
S A further advantage is that either or both endpoints can be anonymous - an
endpoint
that does not have a private key corresponding to a certificate is considered
anonymous - although for maximum security this may not be appropriate.
Yet another advantage is that the invention imposes very little policy as to
the actual
encryption and signing algorithms used, or the key lengths employed. Thus
algorithms can be added without changing any of the message formats.
In sumrnaty, the invention goes beyond the functionality of SSL (the Secure
Socket
Layer protocol originally proposed by Netscape). Whereas SSL implements the
semantics of a TCP socket, with two endpoints and a continuous byte stream of
data,
this invention allows publish/subscribe semantics to be implemented, with many
endpoints and discrete messages.
The invention will be described in greater detail below with reference to the
accompanying drawing.
DESCRIPTION OF THE DRAWING
~ In the attached drawing:
Figure 1 is a schematic representation of a typical publish subscribe
environment
illustrating the messages flow in the protocol of the invention.
DEFINITIONS
Before describing the invention in greater detail, it is necessary to define
certain terms
used in this application:
Asyrurnetric Encryption
6
CA 02299505 2000-02-04
WO 99/07101 PCf/US98/16365
Asymmetric encryption involves the use of two keys, one public and one
private. The
public key is derived from- the private key, and can be distributed to anyone.
Data
encrypted with the public key can only be decrypted with the private key, and
vice
versa. This is especially useful for applications like secure e-mail: a
message is
encrypted with the recipients public key, and no one (except the owner of the
private
key) can decrypt it. -
Digital Signatures
Digital signatures are a special application of asymmetric encryption. A hash
or
'fingerprint' of the message is calculated using a one-way hash function. The
signer
then encrypts this hash with a private key. This encrypted hash value is the
signature.
Any one can verify the signature by decrypting the signature with the signer's
public
key, and comparing it with a hash of the message that they calculated
themselves.
Identity and Certificates
Certificates provide a reasonably secure mechanism for verifying identity in a
distributed and scaleable way. A Certificate Authority signs copies of
everyone's
public keys, and makes these signed keys publicly available. The CA also makes
available a copy of its own public key. Client programs need only keep a copy
of the
CA's public key in order to verify any certificate they receive.
Message Authentication Codes (MACS)
Message Authentication Codes (MACs) are often used in lieu of true digital
signatures. It requires much less computation to generate them, so they are
preferred
in any protocol that may need to support a high data rate. The disadvantage of
MACs
is that they require the sender and receiver to have a shared secret (the MAC
key or
MAC code).
One-way Hash Functions
A one-way hash function is much like a check-sum. The difference is that it is
very
difficult to 'invent' two messages that hash to the same value.
Symmetric Encryption
7.
CA 02299505 2000-02-04
WO 99/07104 PCTIUS98I16365
Symmetric encryption is the most basic form of encryption. A secret key is
used to
obscure data in such a way that it cannot be recovered without 'knowing the
secret
key. This form of encryption is generally faster to perform on a large block
of data
than asymmetric encryption.
SPECIFIC DESCRIPTION -
In Figure 1 a publisher application 10 and a plurality of subscriber
applications 20, ,
20' and 20" are shown. In the preferred embodiment of this invention the
publisher
and subscribers) are soRwate applications based on one or more computers
interconnected by a network providing a data path among the applications. 'The
publisher 10 aad subscribers) 20 preferably implement a content-based
communications protocol whereby a publisher publishes a message indicating
only
the content of the message and without knowing the identity or protocols used
by the
subscribers) 20. These inter-application communications are established by '
communications daemons not shown in Fig. 1, but are well known in the art and
_
described in many publications including the patents referred to above.
Associated with both the publisher 10 and the subscriber 20 is a software
object or
data structure called a responder, 12 and 22 respectively. Each responder 12,
22 is a
single encrypting or decrypting entity. Each responder also is responsible for
keeping
one key for data encryption and one key for signature creation.
Responders can have a definite identity, or they can be anonymous. In order to
have
an identity, the responder must have both a certificate, and the private key
conresponding to that certificate's public key. In contrast, anonymous
responders are
useful for situations in which it is impractical for the.endpoint code to
obtain a private
key without reveaiing it to adversaries.
As will be explained in more detail below, the responders 12, 22 send messages
to
each other in order to accomplish key exchanges. These messages are encrypted
using keys established via the Dike-Hellman key exchange protocol.
8.
CA 02299505 2000-02-04
WO 99/0710 PCTNS98I16365
The Diffie-Hellman key exchange protocol is described by Schneier, above. As
described by Schneier, Diffie-Hellman gets its security from'. the difficulty
of
calculating discrete logarithms in a finite field, as compared with the ease
of
calculating exponentiation in the same field. Diffie-Hellman can be used for
key
distribution-For example, A and B can use this algorithm to generate a secret
key-
but it cannot be used to encrypt and decrypt messages.
In implementing Diffe-Hellman, A and B and Bob agree on a large prime, n and
g,
such that g is primitive mod n. These two integers don't have to be secret;
thus, A
and B can agree to them over some insecure channel. They can even be common
among a group of users.
Then, the protocol goes as follows: .
( 1 ) A chooses a random large integer x and sends B
X= gx mod n
(2) B chooses a random large integer y and sends A
Y=g''modn
(3) A computes
k = Y' mod n
(4) B computes
k' = X'' mod n
Both k and k' are equal to g'''' mod n. No one listening on the channel can
compute
that value; they only know n, g, X, and Y. Unless they can compute the
discrete
logarithm and recover x or y, they do not solve the problem. So, k is the
secret key
that both A and B computed independently.
The choice of g and n can have a substantial impact on the security of this
system.
The number (n -1 ~2 should also be a prime. And most important, n should be
large:
The security of the system is based on the diff culty of factoring numbers the
same
size as n. You can choose any g, such that g is primitive mod n; there's no
reason not
to choose the smallest g you can-generally a one-digit number. (And actually,
g
does not have to be primitive;_ it just has to generate a large subgroup of
the
multiplicitive group mod n.)
9-
CA 02299505 2000-02-04
WO 99107104 PCTNS98I16365
As a first step in the overall process, the publisher 10 has a message to be
encrypted.
To encrypt a message for publishing, an encrypt function is called with the
message
and the responder 12 as parameters. The responder 12's encryption key is used
to
encrypt the data aad its private key to sign the data.
It should be noted that a publisher 10 has great flexibility to assign
responders to
published subjects. A single responder can be used for all messages published
by the
publisher 10, or a responder can be created for each subject published. As one
responder corresponds to one encryption key, partitioning of security is
determined by
the mapping of responders to subjects. The best choice will vary according to
the
security needs of the publisher 10.
The encrypted message is then published (shown schematically as 30) on the
normal
data communications channel of the network. Typically, the message will take
the
form of
subj,resp ~x~J
in which
subj indicates the contentlsubject of the message;
resp indicates the encrypting respondent 12 address; and
(x~aiooicJ is the now-unreadable, encrypted message.
Also, the encrypted message 30 would include the responder 12's Diffe-Hellman
half value DH-x and generically include information of the foam
Ex (M, MIC)
in which
E,r is an encryption key;
M is the encrypted message; and
MIC is a message internal code typically including a hash function as
follows:
h (M, Source, Seguence #, delivery information)
, CA 02299505 2000-02-04
W O 99/0? 104 PCTNS98/16365
This hash function will include a "signature" if authentication is required.
Thus all
hash functions are signed in authenticated communications.
Each subscriber 20, 20', 20" listens for and accesses messages based on the
message
its contentlsubject in the normal way. When the message is received, however,
it is
unreadable and is therefore passed to the appropriate subscriber-side
responder 22.
The subscriber-side responder 22 then responds by communicating with the
publisher-
side responder 12 by way of a MAC-based message 32 to the address (resp)
identified
in the original message 30. This message 32, amongst other things, is the
start of a
message decrypt function and typically takes the format:
S,,t,,~ (sub resp; DH y; E,~ (r~; signature injo))
in which
sub resp is the subscriber-side responder 22's address;
DH y is responder 22's Diffie-Hellman half value;
r, is a first random number, and
signature info is a request for signature informationlsigned message.
The publisher side responder 12 responds with a fully signed message 36 of the
form:
SF~Ly (pub resp; E,~ (r1; r~; signature; signature in, fo))
in which
pub resp is the publisher-side responder 12's address;
r~ is an echo of the first random number;
r1 is a second random number;
signature is responder 12's signature; and
signature info is a request for signature informationla signed message
from responder 22.
When the subscriber-side responder 22 receives this message 36, it calls back
to the
subscriber 20 (shown by arrow 38) to get the subscriber 20's approval. Once
this
approval is received, the responder 22 calls back to responder 12 with a fully
signed
message 40 of the form:
SFUU (sub resp; ExE /r?; r3; signature; key request))
11
CA 02299505 2000-02-04
WO 99/0104 PGTNS98/16365
in which
sub resp is the subscriber-side responder 22's address;
r1 is an echo of the second random number;
r j is a third random number;
signature is responder 22's signature; and
Arey request is a request for the key to decipher the encrypted message
and it could include a data payload from the subscriber.
When the publisher-side responder 12 receives this message 40, it responds to
the
publisher 10 (shown by arrow 42) to get publisher 10 approval to transmit the
key.
Once this approval is received, the responder 12 calls bank to responder 22
with MAC
message 44 incorporating the encryption key as follows:
S,,u~ (pub resp; E~ (r,; rj; keyJgrant)) '
in which
pub resp is the publisher-side responder 12's address;
r3 is an echo of the third random number;
r, is a fourth random number; and
~Eey~rant is the requested decryption key and could include a
datapayload from the publisher.
For all these communications when a challenge value, that is, a random number,
is
sent by a responder 12, 22, it must be returned with additional information
and a
signature before the identity of the remote end is trusted. It is for similar
reasons that
the random numbers are echoed except that challenge values require signatures
(at a
minimum) in response while random number echoes are not always accompanied by
signatures. Thus, only when identity of both parties has been established
securely, a
key request can be sent.
Thus responders 12 and 22 communicate with each other to accomplish key
exchanges. The Diffie-Helhnan protocol is used to establish a secure channel
between the responders for the duration of the negotiation. The actual
protocol used
is similar to the station-to-station protocol which is based on Diffie-
Hellman. In most
12
CA 02299505 2000-02-04
WO 9910'f104 PC'fNS98116365
cases the entire exchange, including the transmission of key values, is
accomplished
in two turn-grounds (a total of four messages).
When satisfying a key request, the default behavior is to supply the most
current key
value. However, responders may cache old key values, and if asked for the key
corresponding to a message with an old sequence number, will send the older
key
value.
The protocol of the invention allows the publisher to choose algorithms for
both
encrypting and signing data and a algorithms can be "plugged in" to any API
incorporating this invention. The choice of algorithm is usually a tn~de-off
between
efficiency and security.
For encryption, a publisher can use any symmetric algorithm. Furthermore, many
algorithms exist, offering varying key lengths and encryption speeds. For
signing, the
publisher must choose between an asymmetric algorithm, such as RSA or E1-Gamal
-
encryption, or using a MAC for authentication. The most secure choice is to
use an
asymmetric algorithm, with a certified copy of the public key. This allows
recipients
of the message to definitely verify the origin of the message.
However, for some applications this level of security may be burdensome in
terms of
~ compute time or inconvenience. In these cases a MAC is more appropriate,
since it is
quick to compute and imposes no administrative overhead. However, since the
publisher must share its MAC secret with all subscribers, each subscriber will
then be
able to impersonate the publisher.
Although not illustrated in the above message formats, all messages contain a
sequence number in order to prevent replay attacks. If a message is received
with an
old sequence number it is flagged to the application as a potential replay.
This is to
prevent adversaries from grabbing whole, encrypted packets from the network
and
simply 'replaying' them at a later time.
13
CA 02299505 2000-02-04
W O 99/07101 PCTNS98/16365
There is nothing built into the protocol that requires the responder that
satisfies a key
request to be the same one tharis used for publishing. As long as-xhe two
responders
have the same key data, everything works fine. What this means is that it is
possible
to set up "key servers" which deal with key requests in a more distributed
fashion. Of
course, the key server must be synchronized with the key information used by
the
publisher.
The above describes a publish/subscribe scenario. In this scenario subscribers
typically receive the first message from a given publisher before they can
begin the
key exchange protocol. This is because the responder address is embedded in
the
message stream itself.
As described above, when a subscriber receives it's first message from a given
publisher, it will attempt to decrypt it, but will get a "not enough info"
error. In '
response to this error, the subscriber will initiate a key exchange with the
responder
specified in the message. Once the key exchange is complete, the subscriber
will be
able to decrypt incoming data from the publisher, and can optionally, now
decrypt the
original message that initiated the exchange. Once a key exchange has taken
place,
all subsequent messages from the publisher can be decrypted as they are
received.
This invention does, however, have broader implementation. For example, when
the
invention is used in a remote procedure call scenario, the request is usually
published
to a broadcast subject, and the reply information is returned to the
publisher. In this
situation, the servers are acting as subscribers in the previous scenario. The
first time
a message is received from a given client, the server must initiate a key
exchange with
that client. Once the key exchange is complete, the server can process the
client
message.
In addition, as far as the invention is concerned, point to point messages
behave
exactly as published messages.
Finally, this invention can be used for authentication only. When used only to
authenticate messages, and not to encrypt them, many parts of the protocol are
14
CA 02299505 2000-02-04
WO 99107104 PGTNS98/16365
simplified. For instance, a full key-exchange requires two complete turn-
grounds on
the network (four messages). -When authenticating a message only, it may only
be
required to get a certificate from the sender. This is accomplished in one
turnaround
(two messages). Similarly, if a publisher decides to publish its certificate
along with
the data stream, then subscribers can verify the identity of the sender
without any
interactions at all. -
All publications and patent applications mentioned in this specification are
herein
incorporated by reference to the same extent as if each individual publication
or
patent application was specifically and individually indicated to be
incorporated by
reference.
The invention now being fully described, it will be apparent to one of
ordinary skill in
the art that many changes and modifications can be made thereto without
departing '
from its spirit or scope.
15-