Sommaire du brevet 2504336

(12) Demande de brevet:	(11) CA 2504336
(54) Titre français:	METHODE ET DISPOSITIF DE CREATION D'UN SYSTEME DE CONTROLEUR AUTONOME
(54) Titre anglais:	METHOD AND APPARATUS FOR BUILDING AN AUTONOMIC CONTROLLER SYSTEM
Statut:	Réputée abandonnée et au-delà du délai pour le rétablissement - en attente de la réponse à l’avis de communication rejetée

(51) Classification internationale des brevets (CIB):	G06F 15/00 (2006.01) G06F 13/00 (2006.01)
(72) Inventeurs :	MCCARTHY, JOHN G. (Canada) CRICK, WILLIAM RUSSELL (Canada) ROBITAILLE, BENOIT (Canada) LITKEY, JAY (Canada) TURNBULL, G. SCOTT (Canada)
(73) Titulaires :	EMBOTICS CORPORATION
(71) Demandeurs :	EMBOTICS CORPORATION (Canada)
(74) Agent:	VICTORIA DONNELLYDONNELLY, VICTORIA
(74) Co-agent:
(45) Délivré:
(22) Date de dépôt:	2005-04-15
(41) Mise à la disponibilité du public:	2006-10-15
Requête d'examen:	2010-01-14
Licence disponible:	S.O.
Cédé au domaine public:	S.O.
(25) Langue des documents déposés:	Anglais

Traité de coopération en matière de brevets (PCT):	Non

(30) Données de priorité de la demande:	S.O.

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.

CA 02504336 2005-04-15
Method And Apparatus For Building An Autonomic Controller System
FIELD OF INVENTION
[0001] The present invention relates to computer systems, and more
particularly to a
method and system for building an autonomic controller system.
BACKGROUND OF THE INVENTION
[0002] Embedding devices into a host device is well known in the art. The
embedded
devices have been developed and used in Information Technology.
[0003] However, when the host device changes unexpectedly changes its
operating
state (e.g. power failure, hardware reset), the current embedded devices will
be subject
to reset, can suffer from operating errors, and are unable to diagnose the
root cause.
The embedded subsystem will need to ro-initialize and restore itself to a
known state
before resuming its own operations. To summarize, traditional approaches are
not
isolated from the host device.
[0004] Effective management of TT systems is increasingly difficult as the
underlying
1 s technologies become more complex. Computing power is constantly
increasing, and
applications are being designed with richer features and a higher level of
customizability. An unfortunate consequence is that the installation,
configuration
and maintenance of the software is no longer straightforward. The broad use of
heterogeneous systems with so many custonrconfigured applications and features
can
2o result in a nearly unmanageable number of interactions and dependencies,
causing
unexpected conflicts and imposing many constraints. IBM feels that we are now
on
the verge of a limit beyond which we will not be able to benefit from or
depend on
more complex systems, unless some of the underlying complexities can behidden
from the administrator.
2s [0005] Ideally, the majority of a system administrator's time should be
spent applying
domain expertise to the IT infrastructure, supporting an organization's
business goals.
Repetitive tasks such as low-level configuration should be automated, and the
verification of system integrity or analysis of performance should not rely on
manual
methods.
-1-

CA 02504336 2005-04-15
[0006] In order to address some of these issues, IBM has spent the last few
years
researching and calling for development in a new branch of computing which it
has
called Autonomic Computing. Autonomic Computing is a relatively recent field
of
study which focuses on the ability of computers to self manage [Ref. 1]
Autonomic
s Computing is promoted as the means by which greater dependency [Ref. 2] will
be
achieved in systems. This incorporates self diagnosis, healing, configuration
and
other independent behaviors, both reactive and proactive. Ideally, a system
will adapt
and learn normal levels of resource usage and predict likely points of failure
in the
system. Certain benefits of computers that are capable of adapting to their
usage
i o environments and recovering from failures without human interaction are
relatively
obvious; specifically the total cost of ownership of a device is reduced and
levels of
system availability are increased. Repetitive work performed by human
administrators is reduced, knowledge of the system's performance over time is
retained (assuming that the machine records or publishes information about the
~s problems it detects and the solutions it applies), and events of
significance are
detected and handled with more consistency and speed than a human could likely
provide.
[0007] However, as described above, it is difficult to apply Autonomic
computing to
the current embedded hardware/software and the host device, due to dependency
2o between the embedded hardware/software and the host device.
[0008] It is desired to provide survivability and sanity of the embedded
hardware/software subsystems.
SUMMARY OF THE INVENTION
[0009] It is an object of the invention to provide a method and system that
obviates or
25 mitigates at least one of the disadvantages of existing systems.
[0010] It is an object of the invention to provide an improved Method And
Apparatus
For Building An Autonomic Controller System.
[0011 ] According to an aspect of the present invention there is provided a
system for
building an autonomic controller system, which is embedded in a host device,
which
-2-

CA 02504336 2005-04-15
includes: a filed programmable gate array (FPGA) system for implementing
survivable hardware/software isolation from the host device. The FPGA system
may
includes: means for implementing a high bandwidth bus protocol that supports
communications with an embedded processor; means for interfacing and providing
s additional functionality to an analog video capture digitizer: means for
implementing
multiple PS2 interfaces for host keyboard and mouse input and output; means
for
providing interrupt control to amalgamate board wide interrupts and present a
unified
interrupt signal to IXP42X Processor; means for power management, including
host
power-on/off/cycle and host-reset control via multiple sources; or means for
providing
a dedicated I2C interface for peripherals, comprising: TEMP ADC, VOLT ADC,
EEPROM, of combinations thereof, wherein If real estate is not sufficient to
support
all devices, the Temperature ADC will be removed.
[0012] According to a further aspect of the present invention there is
provided a
method of building an autonomic controller system, which is embedded in a host
device, which includes the step of implementing the above FPGA system.
[0013] According to a further aspect of the present invention there is
provided a
program product for building an autonomic controller system, which is embedded
in a
host device, which includes a memory for storing code for implementing the
FPGA
system.
[0014] This summary of the invention does not necessarily describe all
features of the
invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] These and other features of the invention will become more apparent
from the
following description in which reference is made to the appended drawings
wherein:
2s [0016] Figure 1 is a diagram showing an example of an autonomic element to
which
programming and development infrastructure in accordance with the present
invention
is suitably applied;
[0017] Figure 2 is diagram showing an alternative architecture of Figure 1 for
the
_ _ _ _ _ server en~iFOnmeat; _ _ _ _ _ _ _ _ _ _ _ _
-3-

CA 02504336 2005-04-15
[0018] Figure 3 is a view of the ISAC of Figure 2;
[0019] Figure 4 is a diagram showing an application framework for management
of
an application:
[0020] Figure 5 is a diagram showing a Common Information Model Object Manager
s (CIMOM);
[0021] Figure 6 is a diagram showing the module archive structure of a
management
module;
[0022] Figure 7 is a diagram showing a scenario related to the ISAC of Figure
2;
[0023] Figure 8 is a diagram showing an exemplary operation for normal CPU
1 o monitoring;
[0024] Figure 9 is a diagram showing an exemplary operation for high CPU
monitoring;
[0025] Figure 10 is a diagram showing a group management;
[0026] Figure 11 is a diagram showing networks of autonomic elements;
1s [0027] Figure 12 is a diagram showing the ISAC having event consumer and
event
generator and a host;
[0028] Figure 13 is a diagram showing the design of an autonomic controller
system
in accordance with one embodiment of the present invention;
[0029] Figure 14 is a diagram showing an example of the autonomic controller
2o system of Figure 13; and
[0030] Figures 15(a)-15(b) show a further example of the autonomic controller
system.
-4-

CA 02504336 2005-04-15
DETAILED DESCRIPTION
[0031] Embodiments of the present invention are described using a server
system.
According with an embodiment of the present invention, Intelligent Secure
Autonomic Controller (ISAC) for autonomic computing is provided to the server
system.
[0032] The ISAC provides the ability to embed and automate firs~line systems
management tasks inside a server. It effectively performs independent,
reliable,
predictable and cost-effective administrator functions for every local and
remote
server, all fully programmable.
to [0033] The ISAC features a fail-safe, real-time, proprietary software
engine (ACE)
running on an independent hardware card (ACP) that is inserted into, for
example, a
host server's PCI or PCI-X slot. ISAC's independent resources do not depend on
the
functional state of the host operating system or hardware. ISAC offers the
equivalent
functionality of traditional "lights-out" remote management cards (e.g. HP
RILOE,
is Dell DRAG, IBM Supervisor).
[0034] According to a further embodiment of the present invention, Field
Programmable Gate Array (FPGA) is provided to the ISAC for survivable
hardware/saftware isolation from host device and for:
Implementing a high bandwidth bus protocol that supports communications
2o with an Embedded Processor;.
Interfacing and providing additional functionality to the analog video capture
digitizer;
Implementing multiple PS2 interfaces for host keyboard and mouse input and
output;
z5 Interrupt control to amalgamate board wide interrupts and present a unified
interrupt signal to a IXP42X Processor;
-5-

CA 02504336 2005-04-15
Power Management: Host power-on/off/cycle and Host reset control via
multiple sources;
A dedicated I2C interface for peripherals: TEMP ADC, VOLT ADC and
EEPROM. If real estate is not sufficient to support all devices, the
Temperature ADC
will be removed.
[0035] The detail of the ISAC and FPGA is described below.
[0036] 1. A Control Plane for Servers
[0037] Figure 1 illustrates an example of an autonomic element to which
programming and development infrastructure in accordance with the present
invention
io is suitably applied.
[0038] An autonomic element 1 of Figure 1 clearly separates management from
managed element function, providing sensor (S) and effector (E) interfaces for
management. It should minimally impact the functions of the managed element 4.
While not explicitly shown in Figure 1, there is an implicit requirement that
the
1 s managed element should not be able to dominate, override or impede
management
activity. For example, if the managed element 4 and autonomic manager 2 share
the
same processor or memory address space this cannot be guaranteed owing to the
management of these shared resources by a shared operating system. True
autonomy
requires a control plane, which has long been the view in the
telecommunications
2o domain.
[0039] Figure 2 illustrates an alternative architecture specifically for the
server
environment. While it applies to servers, the architecture generalizes to
other devices
with a bus architecture. Figure 2 shows that an autonomic manager (2 of Figure
1) is
instantiated using a hardware and software platform that communicates with the
2s managed element (the server; 4 ofFigure 1) using an independent management
bus
44. A PCI bus is used. However, IPMB may be added.
[0040] The responsibilities of the autonomic manager (2) are real-time
management
of the host hardware, operating system and hosted applications. The autonomic
-6-

CA 02504336 2005-04-15
manager (2) runs customizable, policy-based, server/OS/application management
software thereby automating IT service management. It performs preventative
maintenance tasks, detection, isolation, notification and recovery of host
events/faults
and records root cause forensics and other operating data of user interest.
[0041] In Figure 2, "ISAC" represents "Intelligent Secure Autonomic
Controller".
The ISAC 20 is embedded in a control plane 21. The ISAC 20 includes Autonomic
Controller Engine (ACE) 22.
[0042] The system of Figure 2 is applicable to the server system disclosed in
Canadian Patent Application No. 024?5387, which is incorporated herein by
i o reference.
[0043] Figure 3 illustrates a view of the ISAC 20 of Figure 2. The ISAC 20 of
Figure
3 is a PCI-X card-based system that plugs into an available slot on the
server.
[0044] The ISAC 20 provides for separation of concerns such as:
~ Fail-safe isolation and recovery of faults
1 s ~ Minimize host resource impacts
~ Containment of change management risks
~ Reduced reliance on the network
[0045] It also provides host independent security such as:
~ Independent policy enforcement
20 ~ Delineation of administration roles
~ Tamper-proof "black box" and audit trail
~ Data persistence
[0046] The ISAC 20 provides traditional "lights out" card functionality that
allows for
remote management, such as remote display of host video, keyboard and mouse

CA 02504336 2005-04-15
redirection over the card's network interface card and virtual devices for
remote
booting; e.g. virtual floppy and CD. These functions relate primarily to
management
involving human intervention. For details on remote management card design and
function, the reader should consult [Ref. 3], [Ref. 4] or [Ref. 5]. The ISAC
card 20 has
s full control of power management functionality and can, through software,
cause full
power down, power recycling and reset of the server.
[0047] Referring to Figure 2, there are several architectural components in
the design.
On the host server 30, two software components reside: the PCI driver 32 and
the host
service 34. On the ISAC card 20, there are several components: operating
system,
to PCI driver, ACE 22 and management modules 24.
[0048] Referring to Figures 1-2, the host service 34 provides sensor (S) and
effector
(E) implementations. The lines between the sensor (S) and effector (E) and the
autonomic manager 2 are provided by the PCI drivers on the host 30 and ISAC
card
20. The knowledge 14 of Figure 1 is provided by the Management Modules 24 of
15 Figure 2. The Monitor 6, Analyze 8, Plan 10 and Execute 12 functionality is
provided
by the ACE 22. It is reasonable to describe the design of the ISAC card 20
embedded
in the host server 30 as an autonomic element.
[0049] Other software components of Figure 2 include the Management Console 40
and Module Development Environment (MDE) 42. The responsibilities of these two
2o components are ISAC group management and module creation/editing
respectively.
[0050] The MDE 42 is used to create and edit management modules. A management
module is the unit of deployment of management expertise in the Symbium
solution.
As such, it consists of policies that assist in managing the server and its
applications
along with utilities to support those policies. Management modules are
deployed to
25 one or more ISACs via the SMC. One or more management modules can be
deployed
to an ISAC
[0051] 1.1 Principles of Design
[0052] The design principles for the hardware and software in the ISAC
systemare
_ . _ _ described. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_8_

CA 02504336 2005-04-15
[0053] Referring to Figure 2, the ISAC card 20 has full control over the
server and
can operate with complete autonomy, if required. The card 20, through
software, is
capable of recycling the server power if necessary and can dynamically change
the
order of boot devices if necessary in order to boot from a known image.
[0054] The ISAC card 20 does not depend upon the host for power but can use
power
from the PCI bus if present. As can be seen from Figure 2, batteries are
provided.
However, the card 20 can also be driven from a wall outlet.
[0055] The ISAC card 20 can recycle itself without affecting the host, this
being
required when certain upgrades occur. Related to this, important state
information can
io be stored on the ISAC card 20; non-volatile memory 26 is provided for this
purpose.
[0056] Concerning software design, the system is designed to be hot-swappable
[Ref.7], [Ref. 8], [Ref. 9]; that is, software elements can be upgraded as new
functionality becomes available or bugs are fixed. Software hot swapping maybe
an
important characteristic for autonomic systems [Ref.10]. The main control
principle
is in our design is derived from Recovery Oriented Computing (ROC) [Ref. 6];
that is,
the minimal part of the system will be reset (also referred to as
microrebooting)when
a hung or fault state is detected [Ref.l l], [Ref.l2]. ROC has been shown to
improve
availability [Ref. 13] by focusing on mean time to recovery or repair (MTTR)
while
allowing for faulty software at all levels of the software stack.
20 [0057] The design of the various autonomic element software components is
described.
[0058) 1.2 The Host PCI Driver (32 of Figure 2)
[0059) Referring to Figure 2, the Host PCI Driver 32 is the communications
conduit
through which all data is passed. The conduit supports multiple channels in
orcbr to
25 allow for prioritization of data traffic. The lower the channel number, the
higher the
priority of the traffic. Channel zero is a reserved channel meant for
management
traffic. This channel is used to synchronize the protocol version running
between the
two ends of the conduit and to restart communications when either end of the
conduit
_ has been_ rebooted. Either end of the conduit can ask to have the conduit
restarted. It is
-9-

CA 02504336 2005-04-15
also possible to have a new version of the driver for the host passed from the
card to
the host in the case of driver upgrade. A restart of the conduit automatically
occurs
when an upgrade has been performed.
[0060] Communications synchronization is transparent to the ACE 22, although
it is
possible for the ACE 22 to request resynchronization. Resynchronization can
occur
when the heartbeat expected from the driver is not received within a
reasonable
amount of time. In this case the driver is restarted on the host; i.e. a
microreboot [Ref.
13] is requested. In certain circumstances, the OS needs to be recycled.
[0061] 1.3 The Host Service (34 of Figure 2)
[0062] Referring to Figure 2, the Host Service 34 acts as a proxy for the ACE
22.
The ACE 22 communicates with the Host Service 34 in order to poll for
operational
measurements, subscribe for events ofinterest and manage the service.
[0063] Two protocols are supported for operational measurements and event
subscription: Windows Management Instrumentation (WM)) and a proprietary
protocol. WMI is Microsoft's interpretation of the Common Information Model
(CIM) and is mature. A proprietary protocol is provided for situations in
which a CIM
measurement provider is not available or when legacy measurement providers
already
exist; e.g. printer management for certain vendors using SNMP.
[0064] The Host Service 34 also provides mechanisms for acting on behalf of
the
2o ACE 22; i.e. it provides the instantiation of effectors for the action
requests made by
the ACE 22. Effectors include stopping or starting a service or process and
rebooting
the operating system, for example.
[0065] Recovery oriented control is also provided for the Host Service 34. A
heartbeat is expected from the Host Service 34. If one is not received within
a user-
definable period of time, the service is automatically restarted. Should the
service not
restart, a decision is made by the ACE 22 whether to restart the operating
system.
While watchdog timer cards can provide some of the functionality provided
above,
they suffer from a lack of end user programmability.
- io -

CA 02504336 2005-04-15
[0066] The Host Service 34 is designed to be extensible and upgradeable. New
host
service components - dynamic link libraries (DLLs) - can be transferred across
the
management bus and register with the service. These libraries are installed in
a well-
known location which is accessed when the service starts. When new or upgraded
functionality is installed, the Host Service 34 is automatically recycled,
which is an
event that is detected by the ACE 22. This microreboot request [Ref. 13]
ensures that
we can upgrade monitoring functionality without interrupting services offered
by the
host. Upon detection, the ACE 22 automatically ensures that events of interest
are
subscribed to.
[0067] The Host Service 26 occupies a small footprint on the host, typically
requiring
less than 2% of the processor for measurement operations.
[0068] 1.4 ISAC Card (20 of Figure 2)
[0069] The design of the ACE software, and the customizability of its behavior
via
the development of scenarios and policies are described. It is noted that
other
1 s components of the architecture play a significant role as well. For
example, in the
embodiment of the present invention the software resides on a proprietary
piece of
hardware, a PCI-X card that is installed inside the server to be managed. For
example, the card has its own Intel PXA255 processor (similar to those found
in a
personal digital assistant), which runs a version of Linux as its operating
system, as
2o well as a Java virtual machine that supports J2ME, a subset of Java
designed to run on
embedded devices. In the embodiment of the present invention the J9 Java
virtual
machine is used; however, the Sun embedded JVM has also been used. The card 20
also has several other features including its own memory (64 Meg), non-
volatile
storage (32 Meg), and external interfaces for network and serial (e.g. USB)
2s communications. Although it normally relies on the host's power supply, it
has
backup power to ensure that it can stay alive even when the host is shut off.
For
example, Figure 3 shows the rechargeable batteries carried on board.
[0070] Using an independent control plane 21 has multiple benefits; the host
system's
CPU is not preoccupied with self management, which would impede its
performance
30 _ - and might negate many of the benefits that the autonomous controller
canprovide. A
-11-

CA 02504336 2005-04-15
small portion of the host's resources may be required for the collection of
data and its
transmission to the card 20, however as much work as possible is delegated to
the card
20. Specifically, the Monitor (6), Analyze (8), Plan (10) and Execute (12)
functions
of the autonomic manager (2) are performed by the card processor, not the host
server
processor. This configuration is also much more fault-tolerant, as the ACE 22
can
remain active even in the case of a server crash, and can still attempt to
take actions
such as rebooting the server it resides in. As the card 20 is active during
the POST of
the server itself, it can take actions that are not possible in the case of a
software-only
on-server solution.
to [0071] 2. Autonomic Controller Engine Design
[0072] In order for autonomic systems to be effective, the adoption of open
standards
may be desired. There is little hope for the seamless integration of
applications across
large heterogeneous systems if each relies heavily on proprietary protocols
and
platform-dependent technologies. Open standards provide the benefits of both
~ s extensibility and flexibility -- and they are likely based on the input of
many
knowledgeable designers. As such, the widely-used standards tend to come with
all
of the other benefits of a well thought-out design.
[0073] Java is one of the languages for implementation of the ACE 22, for
reasons
including its widespread industry use, platform independence, object model,
strong
20 security model and the multitude of open-source technologies and
development tools
available for the language. All developm~t was undertaken in Eclipse, for
example.
[0074] The Common Information Model (CIM) is used within the system in order
to
obtain information on the managed objects available on the server. Further
detail on
the use of CIM is described below.
2s [0075] Also, referring to Figure 2, the extensible markup language (XML) is
used for
communications with remote managers, such as the Management Console using
HTTP as the transport protocol. Web-based Enterprise Management (WBEM) is used
for card manageability, with WS-Management being considered for a future
release.
- iz-

CA 02504336 2005-04-15
[0076] The Open Services Gateway Interface (OSGi) is used for service and
management module deployment.
[0077] A control plane separates management concerns for the server in the
following
ways.
[0078] It provides an environment which is fail safe. If the control plane
fails, the
server is unaffected. Contrast this with a software agent approach, whereby an
agent
running on the server with a memory leak will cause resources on the server to
become exhausted eventually, possibly making the server unresponsive. An
unresponsive server represents a serious management challenge as remote
control
through an in-band interface may be impossible. A control plane allows for
recovery
at many different levels: application, process, service, operating system and
various
hardware levels. Through an understanding of the dependencies between hardware
and software components it provides the ability to reboot the minimum set
required to
reestablish nominal server operation.
~ 5 [0079] A control plane minimizes the resources required by the management
solution
on the host. Referring to Figure l, in a control plane all processing of
sensor
information occurs within the autonomic manager and all management state
resides
there with critical state being stored in non-volatile memory. Contrast this
with a
software agent approach where multiple agents run on the host. Significant
memory
2o and CPU cycles are required in order to monitor state; state which is lost
if the host
needs to be rebooted. A control plane delivers datapersistence for follow-on
root
cause analysis.
[0080] A control plane contains change management risks. The lifecycle of a
host
involves change, change to applications, services and the operating system.
Having a
25 control plane ensures that as upgrades occur, the host can be monitored and
upgrades
halted if abnormal or unexpected behavior is observed. Upgrading the software
running on the control plane does not affect the host at all. Contrast this
with the
software agent management approach where unexpected behavior in the new
version
of an agent may make the host unmanageable or significantly degrade its
3o performance.____ _ _ ____ _ __ _ ._ _ _
-13-

CA 02504336 2005-04-15
[0081] A control plane does not rely on the network interfaces provided by the
host. It
uses its own network interface for management communication. No management
traffic is transferred over the host data channels, which implies that polling
rates for
management information have no impact on the bandwidth available for host
application traffic.
[0082] 2.1 Service-oriented framework
[0083] Since the engine's behavior depends entirely on a configurable list of
services
to initialize at runtime, as well as the set of modules to be run, a great
deal of
flexibility and extensibility is provided without the need for rebuilding the
engine or
writing very much, if any, code. While an application server would have been
ideal
for this purpose -- with web archives being the unit of deployment -- the
resource
constraints of the card necessitated the creation of a thin application
framework for
management of the lifecycle of services. This is shown in Figure 4.
[0084] The application framework 50 of Figure 4 ensures that services are
restarted if
~5 they fail and maintains dependencies between them. The application
framework 50 is
also responsible for management of the application itself. The framework 50
runs on
top of the J9 JVM; the Java Native Interface (JNI) being used to interface
with the
various drivers (e.g. PCI communications driver) that are implemented in the C
programming language. Services can be plugged and unplugged dynamically; i.e.
hot
2o swapping is supported. Services are arranged in bundles, with bundle
lifecycle
management being the responsibility of the OSGi [Ref. 14] standard implemented
by
the Services Management Framework (SMF) built by IBM. Other open source
implementations of the OSGi specification are available (e.g. OSCAR); however,
SMF may be preferable.
2s [0085] The OSGi is an effort to standardize the way in which managed
services can
be delivered to networked devices. It is being developed through contributions
by
experts from many companies in a wide variety of fields (such as manufacturers
of
Bluetooth devices, smart appliances, and home energy/security systems). An
open
specification is provided for a service platform so that custom services can
be
30 _ _ developed (in Java), deployed, and managed remotely. _ _
-14-

CA 02504336 2005-04-15
[0086] The OSGi Service Platform framework specifies how applications should
be
bundled, the interfaces they must support, as well as a set of standard
services that
must be provided for use by applications. A security model, namespace scoping,
and
specifications for interactions between running services are some of the
features also
provided.
[0087] Management Modules (26 of Figure 2)-the units of management knowledge
in the system - are OSGi bundles too. This use of OSGi ensures that one module
cannot clash with another as bundles are managed in separate namespaces.
Extensive
security facilities are also provided by OSGi. Interested readers should
consult the
OSGi whitepaper [Ref.lS] for further information of OSGi architecture and
services.
[0088] Several services have been implemented for the ACE 22, which include: a
managed object manager service, host communications service and module
management service.
[0089] The managed object manager service is a thin version of a Common
is Information Model Object Manager (CIMOM) 52 as shown in Figure 5. A full
CIMOM on card may be difficult owing to the resources that are required to
sustain it.
However, standard WBEM interfaces are provided in order to ensure easy
integration
with enterprise management systems. Specifically, CIM-XML is supported. The
design of the simplified model is described below.
20 [0090] The host communications service has been designed for the current
form factor
and communications bus. However, although the autonomic controller has
originally
been designed to be placed on a PCI-X card in a host machine, there is really
only one
service responsible for host communications. Another service adhering to the
same
interface could be quickly written and deployed which would allow the
controller
25 software to run directly on the host, perhaps obtaining information using
WMI (for a
Windows host.) Alternatively it would be possible to implement the same
interface as
a simulator, providing support for testing and development even when hardware
is
unavailable.
-is-

CA 02504336 2005-04-15
[0091] The module management service is responsible for loading, unloading and
general lifecycle management of modules - the units of management expertise in
the
system and the knowledge component (14) shown in Figure 1. The module
management service is responsible for creation of the run time model, which is
a
s hybrid of event propagation and rule processing.
[0092] 2.2 Security
[0093] The design of an autonomic element has to pay special attention to
security.
The control plane approach to autonomic element design has particular
advantages in
this regard.
[0094] Through use of a Linux distribution, a firewall is automatically
provided.
Authentication to the card is provided by a pluggable authentication module
(PAM).
In the embodiment of the present invention, a simple user-id~password system
is
provided. However, it may be integrated with with enterprise class LDARbased
authentication mechanisms. As shown in Figure 2, all communications to and
from
the server is encrypted using SSL, with the card certificate being pre-
installed on the
ISAC 20. Further application level security is provided through OSGi, where
bundles
can be run in different class loaders and considerable control of inter~undle
interaction is provided. Monitoring of management module activity is also
provided
by a built-in management module.
[0095] Security of managed elements is of increasing concern in today's IT
world.
New viruses, worms and trojans are reported daily and writers of these pieces
of
software exploit flaws in the operating system or applications or relyupon
social
engineering to achieve their goals. Malicious software ("malware") writers
have
become increasingly sophisticated in their attacks on the operating system and
hosted
applications to the point where deployed anti-virus software can be either
shut down
or removed from the system entirely. This is possible as a result of the
privilege levels
associated with the entity (user) responsible for running the software. Having
an
independent control plane enforcing security policy makes it impossible thata
piece
of malware can circumvent security and enforcement becomes the responsibility
of
_ . _ _ _ the control plane. _ . _ _ _ _ . _ _ . _ _ _ _ _
-16-

CA 02504336 2005-04-15
[0096] A further advantage of the control plane is that the security model
employed
becomes independent of the model used within the operating system on the host.
This
independent security plane makes coherent security policy enforcement
possible; that
is, regardless of the operating system running on the host, the same privilege
levels
apply. Separating security responsibilities also implies that separation of
administration roles takes place. Any attempt to compromise the security of
the host
such as changing the privilege levels of a user applies only to the operating
system;
the control plane remains unaffected. With incidents of malicious intent by
seemngly
trusted IT insiders being commonplace, independent security enforcement as
delivered by the control plane is critical.
[0097] Yet another benefit of using a control plane versus traditional
software agent-
based approaches is that remotely managed systems do not require the
puncturing of
their site firewall(s) to allow for the transmittal of (often sensitive) data
to a central
management console for analysis. The control plane can provide fully
autonomous,
15 localized data analysis and policy enforcement; all without burdening the
managed
system and associated network compute resources. For situations where
reporting to a
central management console is desired, the control plane can report up
meaningful
events of interest to the console and not a large volume of raw observations
like
traditional software agents.
20 [0098] In a secure system, audit information is collected and made
available for
review at some later time. In an audit log stored on the host, intrusive
activity may
rewrite or delete important forensic information. When a control plane is
present, logs
may be written to non-volatile storage and cannot be accessed from the host
directly.
Furthermore, the timestamp on the logs need not be generated from the host
clock,
25 which itself may be affected by intrusive behavior.
[0099] 2.3 Module Development
[00100] Environments for the creation of autonomic managers have been
proposed [Ref. 16]. Sterritt, in [Ref. 17], describes the event correlation
requirements
for autonomic computing systems. In [Ref. 18], Figure 1b, the requirements of
an
30 _ -autonomic manager-are described in terms of the functions that they must
perform. Of
- 17-

CA 02504336 2005-04-15
particular interest to this paper is the requirement for rules engines and
simple (event)
correlators. The design described here provides both of these elements.
[00101] Event correlation [Ref. 19] has received significant attention in the
research community over the last 15 years, with dependency graphs [Ref. 20]
being a
significant mechanism for root cause analysis determination. Event propagation
systems have been constructed, with the Yemanja system [Ref. 21]. The Yemanja
system promotes the idea of loose coupling rather than explicit graphs, which
we find
appealing as it reduces the need to maintain accurate dependency graphs. From
[Ref.
21]: "Yemanja is a model-based event correlation engine for mufti-layer fault
1 o diagnosis. It targets complex propagating fault scenarios, and can
smoothly correlate
low-level network events with high-level application performance alerts
related to
quality of service violations."
[00102] The key concepts built into the autonomic manager in accordance with
the embodiment of the present invention are described below.
[00103] 2.4 Module Components and Concepts
[00104] The ACE 22 was designed to be extensible in many ways. One of the
primary requirements is the ability to define and implement customized
behaviors
based on user-defined management scenarios without rewriting or rebuilding the
engine itself. Therefore, management scenarios compile to 3ava classes that
become
2o part of the running application once loaded.
[00105] Referring to Figure 2, the management module (or module) 24
comprises the knowledge component 14 of the autonomic manager 2. In the
embodiment of the present invention, a module is instantiated in a module
archive,
similar in structure and intent to a web archive used by application servers.
A partial
2s example is shown in Figure 6. The module archive is a directory structure
of a
standard format that contains classes and resources that encode a management
scenario of interest. The module archive also contains dynamic link libraries
that may
be required in order to augment the low level instrumentation on the host and
HTML
-is-

CA 02504336 2005-04-15
documents that allow a user to interact with the run time version of the
module for
purposes of configuration.
[00106] From an autonomic manager's perspective, the module 24 is comprised
of a set of scenarios related on a conceptual level-- for example there might
be a
s module defined to manage printers, another to audit host performance in
order to
establish normal levels of resource consumption, and a third to enforce host-
based
security.
[00107] A scenario encompasses data and host information to be monitored, as
well as the processing of this information: conditions, filters and thresholds
to be
to satisfied, and actions to be taken, for instance events to be logged and
alarms to be
raised. The modules 24 are completely pluggable, meaning that they can be
installed,
updated or reconfigured at runtime, and require no modifications to the engine
framework. Provisions have been made for the extension of flee engine via the
development of custom low-level reusable components as well, thanks in large
part to
1 s the use of well-defined interfaces for each component type.
[00108] Figure 7 shows the principal concepts used in the ACE (22) and how
they relate to one another, Figure 7 represents a simple scenario when
observation
tasks feed measurements or system events into an event provider, which, in
turn, feed
them into a policy. It is noted that Figure 7 is simplified as event providers
can feed
2o multiple policies.
[00109] When a module is loaded, 3 important processes occur. First, the
definition of each policy 70 is loaded. Second, the definition of each event
provider 72
is loaded. Referring to Figure S, the repository 54 is consulted for this
information.
Referring to Figure 7, a linkage between event providers 72 and policies 70 is
created.
2s Policy and event specifications are stored in properties files. An XML
schema is
designed.
[00110] Example Policy Specification is as follows:
policy.class=com.symbium;jeops.JeopsPolicy
-19-

CA 02504336 2005-04-15
kb.properties=os2k cpu mon~olicy_l.properties
kb.class=com.symbium jeops.CPUMonPolicyl
name=os2k cpu mon policy_1
description=normal CPU monitoring policy
event.source.0=os2k cpu_mon event 1
[00111] The important aspects of policy specification are the class to be
loaded
to represent the policy (policy.class), the actual implementation class for
the reasoning
used by the policy (kb.class) and event sources) of interest to the policy
(event.source.X, X=0,1,2, ...).
z o [00112] Policies will likely be defined by system administrators, rather
than
programmers, and as such they should be specified at a level abstracted as
much as
possible from low-level system/implementation details. Policies are built
using the
MDE (42), which is a graphical development environment where a designer drags
elements from a palette onto a canvas. The current prototypical environment is
built as
a series of plug-ins to Eclipse [Ref. 22].
[00113] The ACE (22) currently supports two mechanisms for supporting
policy definition. The first is via rule sets which are compiled into a
knowledgebase
and used by a forward-chaining inference engine (as shown in the above example
policy specification), and the second is through a visual state-machine
editor, which
outputs a description of the policy that the engine can consume and build
dynamically. Rules and finite state machines were selected as two reasonable
ways of
expressing policy, though the system could easily be extended with other types
of
policies ass well, because the framework is completely isolated from the
implementation of the underlying mechanisms. Thus we are not restricted to
using a
rule-based forward-chaining inference engine or a finite state machine, and
policies in
the future could be developed around neural nets or other artificial-
intelligence
constructs, where such concepts are deemed to be beneficial and an adaptive
system is
required.
-20-

CA 02504336 2005-04-15
[00114] If rules are used to specify the policy, then conditions and actions
are
evaluated and executed by a forward-chaining inference engine. Currently the
ACE
(22) uses an open-source inference engine called JEOPS [Ref. 23].
Alternatively, the
execution of policies derived from state machines is handled by a proprietary
dynamic
state machine. ABLE [Ref. 24] was considered for the Engine but was found to
be too
resource intensive for our embedded platform.
[00115] A module developer will specify the actions of a policy using a set of
high-level objects, known as effectors, which encapsulate the lov~level
details
required by the engine to perform common actions. Examples of effectors are:
to terminate a process, reboot the server, and remove a file from the file
system. Policies
can also be written in Java if desired, though it is expected that the MDE
(42) will be
used to facilitate scenario and policy-development with limited or no
programming
knowledge using the drag and drop visual programming paradigm referred to
earlier.
[00116] Referring to Figure 7, at the lowest level of a scenario, sensors
convert
1 s raw data 76 from the host (such as the value of a performance counter)
into a
(typically platform-independent) observation 78. The observation task 74
provides an
important abstraction away from raw measurements made on the host system. As
such, it intended to decouple sensing from reasoning. Ideally, the sensory
interface
would use only the CIM; however, this to be insufficient for certain types of
scenario;
2o e.g. printer queue management. For this reason, the sensor abstraction
layer is present
in the system. The layer also, in principle, allows for the use of the ACE
(22) as an
autonomic manager (2) in domains where the CIM has yet to be applied.
[00117] The observation object is used as input to the event processor where a
dynamic and fully-customizable pipeline of atomic software objects called
25 observation processors filters and manipulates this observation, ultimately
determining the relevance of its contents.
[00118] Figure 8 shows an example of pipelining. Pipelining or the filter
design pattern of processing has long been used as a mechanism for combining
simple
programming elements, dynamically composed, in order to transform a data
stream
_30 _ ._ . _ (e~g~~~ scriptpro~ramming)._ _ _ _ _ _ _ _ _ . .. _ ._ _ _ __
-21 -

CA 02504336 2005-04-15
[00119] Referred to Figure 7, the autonomic controller can use this pipeline
to
perform a wide variety of actions: for example a given observation processor
may be
configured to ignore a certain type of observation based on some configurable
criteria,
or store its contents for later use, or it may use one or more observations to
generate
an event 80. The event 80 is similar in structure to the observation 78,
however,
differs in that it implies that something of significance at a higher-level
has occurred.
The observation processing pipeline is constructed and managed by the event
providers 72, which also handle the dispatching of events 80 to policies 70,
or to other
event providers, which can be chained together to allow further processing.
to [00120] Policies employ high-level system objects called effectors 82,
which
have well-defined behaviors and are designed to encapsulate the lower level
details of
taking common system actions. Effectors 82 are also configurable and
lightweight, so
it is simple to extend the engine's ability to perform system actions. T'he
effector 82
hides the actual communication with the host and automatically generates an
event 70
~ s when completed which is fed back to the policy that invoked it. This
ensures that a
policy can track whether a state-changing action has succeeded or not.
[00121 ] All components involved in observation and event creation,
distribution, and evaluation are handled by the framework using only well-
defined
interfaces in order to facilitate customization and extension. They have been
defined
2o with a visual development environment in mind, in which one could literally
drag-
and-drop the desired processing components from a palette, and connect them
together, allowing the creation of scenarios of virtually any level of
complexity. The
palette is extensible and each processing component is highly configurable:
really a
component is then a template for a particular type of processing, and each
instance can
2s have specific configuration (such as, threshold values, observation
filtering, etc.).
[00122] 2.5 An Example Scenario
(00123] To demonstrate the design of the autonomic controller, consider the
following example. The example was identified by a domain expert as a
realistic usa
case and implemented for Windows 2000 and 2003 servers. As part of general
30 _ , resource-allocation planning, a system_administrator needs to ensure
that a se_rverha~
-22-

CA 02504336 2005-04-15
sufficient processing power to handle its normal workload, with enough left
over to
allow for occasional peaks in usage. Windows keeps performance counters that
can
provide statistical data about the percentage of a CPU which is being used as
well as
the processor queue length, both of which can assist in evaluating how busy a
s particular processor is. These counters can be polled programmatically using
either a
proprietary interface to the API provided by Windows; or via the Windows
Management Instrumentation infrastructure.
[00124] Figure 8 provides an encoding of the scenario described above.
Suppose that a high level, the administrator defines the following policy to
ensure that
a server has sufficient computing power for its load: if the CPU usage exceeds
85%
for a sustained period of 30 minutes and simultaneously the processor queue
length is
always greater than 2 over the same period, then the processor is considered
to be
unusually busy. It is noted that these statistics are polled, so the actual
values may
fluctuate and could drop below the specified thresholds.
~5 [00125] When a server seems to be experiencing this abnormally heavy load,
the administrator would like the ISAC card (20) to take several actions, which
can
ultimately be used in the analysis of the cause. First, an alarm should be
raised and
sent to remote management consoles) monitoring the card (20). An alarm
indicates
the time that an issue was detected, the type of problem that has been
observed and its
2o suspected severity-level, and possibly some other relevant information
about the host
system. In order to better understand the context for the high CPU usage, the
administrator has specified that when this condition is detected, intensive
monitoring
of several other statistics for a specified time would be useful. To do this,
the ACE
(22) will initiate the monitoring of about a dozen additional counters, which
will be
25 polled every 10 seconds and averaged over a five minute window. This
information is
aggregated and sent to the administrator in an email message, and normal
performance
monitoring is resumed.
[00126] To achieve this behavior, a module developer begins by specifying
configuration parameters for two performance counter sensors, one for CPU
usage
3o (PCOT A in Figure 8 and the other for processor queue length, PLOT B in
Figure 8).
-23-

CA 02504336 2005-04-15
The parameters to be configured are the performance counter name and the
polling
frequency. Then the observation processing pipeline must be defined to filter
and
aggregate the observations to determine whether the triggering conditions have
been
met. This processing is performed by small objects with very specific roles.
First, the
observation from each sensor is passed to a separate instance of a type of
observation
processor called a tripwire monitor (Tripwire A and B in Figure 8). These
processors
are each configured with a threshold value (e.g. 85% for the processing of the
CPU
usage counter observation), and each generates an observation that indicates
whether
the threshold has been crossed or not. To satisfy the requirement that the
threshold is
to exceeded for a sustained time period, the next processor evaluating each
observation
keeps track of how many times in a row the threshold has been crossed, and
only
passes along an observation once enough occurrences have been counted (Counter
A
and B in Figure 8). At this point the pipeline can determine the the
requirements
have individually been met to identify high CPU usage, but another piece is
required
to make sure that these happen concurrently. To aggregate observations, an
observation processor implementing a dynamic finite state machine was
built(FSM in
Figure 8). The states and transitions are entirely configurable so that it can
meet the
requirements of a wide variety of applications. In one embodiment, it has four
states:
the initial state, a state for counter A, a state for counter B, and astate
for both.
2o Timeouts have also been implemented so that the FSM can change states
automatically after a certain amount of elapsed time. When the FSM determines
that
both counters are true, it generates an event to inform the policy that high
CPU usage
has been detected. At this point, the policy raises an alarm and causes
another event
provider to start, which controls the sensors for the additional performance
counters
and uses its observation processing pipeline to average their values. These
values are
sent to an administrator's mail account via an effector that hides the details
of
SMTP. The aggregation mechanism is shown in Figure 9.
[00127] Referring to Figure 9, when the "High CPU Monitoring" policy is
started, the various observation tasks for the performance counters of
interest (PCOT
3o X, X = A, B, ... N) are automatically started. For each observation made,
the
measurement is passed through an averaging window observation processor
_ _ . _ . - fA~g~ Wmdova'iF, Y = A; B,-.: . -N~: When sufficl~ent samples o-f
the performance'
-24-

CA 02504336 2005-04-15
counters have been collected, a rule fires in the CPU Monitoring Policy 2 rule
base
that does 2 things: creates a report to send to an administrator and switches
off the
monitoring policy. Switching the policy off automatically stops the polling by
the
various performance counter observation tasks.
[00128] Numerous other management scenarios have been captured that
involve access to other information sources; e.g. the Windows registry. The
management of run-away processes has been provided; processes wikh memory
leaks
are automatically terminated and restarted (an example of a microreboot).
Automated
printer queue management has been encoded by polling printer queues to see if
jobs
t0 hang, hanging being determined by a non-zero number of jobs but not bytes
processed
in a specific interval. In the case of Microsoft Exchange, policies have been
constructed that ensures all services/processes are kept up, restarting in the
correct
order when needed; e.g. routing engine service. Finally, a security module
hasbeen
encoded that allow a user to specify the set of processes that can run; all
other
~ 5 processed being automatically terminated without user intervention.
[00129] 2.6 ISAC Group Management
[00130] While autonomic elements may well change the way in which devices
are managed, there still remains a need to integrate them with legacy
enterprise
management systems. Figure 10 shows how group management is achieved. The
2o management console 40 is the point of integration where alarms and events
from a
group of ISACs (20) are consolidated. It is also the point through which
primary
integration with enterprise management systems (e.g. HP OpenView) takes place.
The
management console 40 is also capable of discovering ISACs, although ISAC
discovery of a management console is also possible for deployment scenarios
where
25 ISACs reside behind a corporate firewall and group management is undertaken
from
outside the firewall.
[00131] 3. Future work
-25-

CA 02504336 2005-04-15
[00132] The design of an autonomic element for the server domain has been
described. We have shown how the architecture and design of the ISAC and ACE
map
onto the autonomic manager.
[00133] Figure 11 (due to J. Kephart of IBM) graphically demonstrates the
s direction that the work should follow; namely, networks of autonomic
elements that
self organize to achieve highly available business processes. It is our view
that
business process management will only be possible with autonomic elements. In
the
future, we will examine autonomic control in distributed systems, where groups
of
autonomic controllers coordinate with each other to provide large systems with
the
to same capabilities that an individual card currently provides a single host.
It is
expected that a single autonomic manager will then take on the responsibility
of
reporting the well-being of the business process supported by the autonomic
element
network, thereby further reducing the alarm stream reported to legacy
enterprise
management systems.
15 [00134] Several areas have been identified for future development of the
ACE
and the ISAC platform as a whole. A considerable amount of work must be done
to
fully develop the specified module development environment, which will
facilitate the
rapid definition of new scenarios, custom behaviors and components. The use of
the
Eclipse Model Framework (EMF) and the Graphical Editor Framework (GEF) will
2o prove crucial here. The prototypical MDE, once fully developed, will
facilitate third
party development as well, allowing others to provide modules that run on the
ISAC
platform.
[00135] The security model must be further refined, in particular for network
communications. Since the controller is capable of doing so much on the host
system,
25 we must ensure that external access to the engine is strictly limited to
avoid
compromising the host.
[00136] Performance enhancements and fine-tuning of the framework may be
necessary to efficiently support large numbers of scenarios requiring
simultaneous
processing. The next generation of the hardware platform should also increase
_ _ 3o _ _ _ ._ . performance.- _ _ . _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
-26-

CA 02504336 2005-04-15
[00137] Rules may not be sufficient to express all desired policies. As such,
non rule-based policies (e.g. neural nets, etc.) may be implemented to extend
the
engine's abilities. It is intended that modules be created that can determine
the normal
resource consumption levels for the server and set thresholds accordingly once
a
s "burn in" period has elapsed.
[00138] Many of the future extensions to the autonomic controller can
hopefully occur without the need for reconstruction of the framework, such as
the non
rule-based policies and coordinated control.
[00139] The design of an autonomic system for server management has been
i o described, which achieves separation of security responsibility (server
administrators
do not necessarily have access to ISAC management modules) and separation of
availability considerations (the ACE cannot cause outage of the server).
[00140] This autonomic system is operational and manages servers in
production environments within corporate settings. One set of users in a large
~ s financial institution has reported a 25% reduction in downtime on their
email servers.
Another set of users in an international managed service provider has been
able to
reduce recurring incident downtime from hours to minutes, while tripling the
number
of servers each administrator can effectively manage. The real value of the
solution
will be determined largely by large user feedback, both in terms of
performance and
2o the facility of extending the framework with custom high-level scenarios.
The
development of the ACE so far has been restricted mainly to the infrastructure
and the
implementation of selected policies which serve to demonstrate its potential.
[00141] 4. FPGA
[00142] The FPGA according to one embodiment of the present invention is
2s responsible for handling local processor PCI bus transactions and the
interfacing
"glue" that controls virtually all peripheral-devices on the card. In
addition, it
performs a video data color reduction algorithm.
[00143] Figure 13 shows the design ofan autonomic controller system in
_ _ _ a~~~ ~,hfr~bodiment-of~he present invention-~ -Figure-14-shows. an _
-2~-

CA 02504336 2005-04-15
example of the autonomic controller system of Figure 13. Figures 15(a)-(b) are
another example of the autonomic controller system.
[00144] The benefit of the design of Figure 13 is:
~In Figure 13, the Field-Upgradeable Programmable Device (a2) is essentially a
data
management and flow controller between an Autonomic Controller (al) and a host
compute element (a3).
~This design provides a clear separation of responsibilities and concerns
~Architecturally it ensures there are non-shared resources
~(a1) is freed up to focus on the Autonomic management of policies
(Monitoring,
to Analysis, Planning and Execution)
~Bi-Directional Responsibilities of (a2) include:
-Detection of events of interest
-Isolation from events, both expected and unexpected
Recovery from events, both expected and unexpected
is -Security point between (al) and (a3)
Traffic Shaping of data passed between (al) and (a3)
-Caching of data
~(a2) also provides an compute offload for (al) for resource-intensive
operations like
host video capture and compression.
20 [00146] (al) of Figure 13 can take the form of many embodiments (e.g. PCI-X
card, PCI-E card, embedded baseboard management controller (BMC), single
chip/silicon, external appliance connected via USB or wireless connection...)
[00147] (a2) of Figure 13 can take the form of many embodiments (e.g. FPGA,
_ . _ . _ _ FgG~ -~..~IG, PLC, standalone ccunput~ server-., .~ .. _ _. _ _. _
_ _ _ _ _
-28-

CA 02504336 2005-04-15
[00148] (a3) of Figure 13 can take the form of many embodiments (e.g. a
server, desktop computer, laptop, printer, set-top box, handheld device...).
[00149] The detail of the EPGA is disclosed in Appendix attached herewith.
[00150] The system having the FPGA have the following advantages:
~ Hardware and software isolation and survivability through various host
operating conditions
~ Real-time, autonomous recovery from internal software faults
~ 'Real-time, autonomous recovery from host system software corruptions
~ Universal host video capture and redirection
~ Host fault foot cause analysis, identification and storage
~ Autonomous systems management operations in isolated (non-networked)
sites
~ Embedded subsystems can be powered down and remove independently from
the host device state.
[001 S 1 ] Further detail can be found in the Appendices I - 5 which form an
integral part of the Detailed Description section of this patent application .
In addition, all citations listed on pages 32 and 33 are hereby incorporated
by
reference.
[00153] The present invention has been described with regard to one or more
embodiments. However, it will be apparent to persons skilled in the art that a
number
of variations and modifications can be made without departing from the scope
of the
invention as defined in the claims.
-29-

CA 02504336 2005-04-15
Reference
[ 1 ] Murch, R., Autonomic Computing, Prentice Hall, 2004.
[2] R. Sterritt, D. W. Bustard, Autonomic Computinga Means
of Achieving
Dependability?, Proceedings ofIEEElnternational Conference
on the Engineering of
Computer Based Systems (ECBS'03), Huntsville, Alabama, USA,
April 7-112003, pp
247-251.
[3] AMI MegaRAC, http://www.ami.com/megarac/ accessed 24'"
January, 2005.
[4] J. McGary and D. Bell, Exploring theNext Generation
DRAC 4 Dell Remote Access
Controller, Dell Power Solutions Magazine, October 2004,
pp. 18-21.
t o [5] W. Pan and G. Liu, Remote Management with Virtual Media
in the DRAC 4,Dell
Power Solutions Magazine, October 2004, pp. 30-35.
[6] Berkeley Recovery Oriented Computing Group, http://roc.cs.berkeley.edu/
accessed
24'" January 2005.
[7] Ao, G., Software Hot-swapping Techniques for Upgrading
Mission Critical
Applications on the Fly. M.Eng., Carleton University, May
2000.
[8] Feng N., S-Module Design for SoftwareHot-Swapping. M.Eng.,
Carleton University,
May 1999.
[9] Reynaga G., Hot Swapping using State Persistence, M.C.S.,
Carleton University,
August 2004.
[10] J. Appavoo, K. Hui, C. A. N. Soules, R. W. Wisniewski,
D. M. Da Silva, O.
Krieger, D. J. Edelsohn M. A. Auslander, B. Gamsa, G. R.
Ganger, P. McKenney, M.
Ostrowski, B. Rosenburg, M. Stumm, and J. Xenidis. Enabling
autonomic behavior
in systems software with hot-swapping. IBMSystems Journal,
42(1), 2003.
[11] G. Candea and A. Fox, Designing for High Availability
and Measurability. 1st
Workshop on Evaluating and Architecting System Dependability
(EASY), Goteborg,
Sweden, July 2001.
[12] G. Candea, J. Cutler, A. Fox, R. Doshi, P. Garg, R.
Gowd~ Reducing Recovery
Time in a Small Recursively Restartable System. International
Conference on
Dependable Systems and Networks (DSN), Washington, D.C.,
June 2002.
[13] G. Candea, J. Cutler, A. Fox, Improving Availability
with Recursive
Microreboots: A Soft-State System Case Study. Performance
Evaluation Journal,
Vol. 56, Nos. 1-3, March 2004.
[14] Open Services Gateway Initiative (OSGi), http://www.osgi.org,
accessed 24'"
January 2005.
[15] OSGI Overview http:llwww.osgi.orgldocuments/osgi technology/osgi-sp-
overview.pdf, accessed 24'" January 2005.
[16] S. Harm, H. Chen, M. Zhang, B. Kim, Y. Zhang and B
Kharghari, An Autonanic
Application Development & Management Environment, submitted
to IEEE
Communication: XML-based Management ofNetworks and Services,
2003, available
at: http:/lwww.ece.arizona.edu/~zhang/xml.pdf, accessed
24'" January 2005.
[17] R. Sterritt, Towards Autonomic Computing: Effective
Event Management,
Proceedings of 27th Annual IEEElNASA Software Engineering
Workshop (SEW),
Maryland, USA, December 3-5 2002, pp. 40-47.
[ 18] R. Sterritt, A. NcCrea, Autonomic Computing Correlation
for Fault Management
System Evolution, Proceedings of IEEE Conference on Industrial
Informatics, Banff,
Canada, August 21-24 2003.
-32-

CA 02504336 2005-04-15
[19] I. Katzela and M. Schwartz, Schemes for fault identification in
communication
networks, IEEE Transactions on Networking, 3 (6), 1995.
[20] B. Gruschke. Integrated Event Management: Event Correlation using
Dependency
Graphs, Proceedings of the 9th IFIPlIEEE International Workshop on Distributed
Systems Operation and Management (DSOM '98), October 1998.
[21] K. Appleby, G. Goldszmidt, and M. Steinder. Yemanja - a layered event
correlation engine for multi-domain server farms. In IFIPlIEEE International
Symposium on Integrated Network Management Vll, Seattle, WA, May 2001. IEEE
Publishing.
[22] Eclipse, http://www.eclipse.org, accessed 24'" January 2005.
[23] JEOPS, http://www.jeops.org/, accessed 24'" January 2005.
[24] Agent Building and Learning Environment (ABLE),
http://www.research.ibm.com/able/, accessed 24'" January 2005.
-33-

Description	Date
Demande non rétablie avant l'échéance	2013-02-04
Inactive : Morte - Aucune rép. dem. par.30(2) Règles	2013-02-04
Réputée abandonnée - omission de répondre à un avis sur les taxes pour le maintien en état	2012-04-16
Inactive : Abandon. - Aucune rép dem par.30(2) Règles	2012-02-06
Inactive : Dem. de l'examinateur par.30(2) Règles	2011-08-04
Inactive : Lettre officielle	2011-08-04
Inactive : Dem. de l'examinateur par.30(2) Règles	2011-07-15
Inactive : Demande ad hoc documentée	2011-07-15
Inactive : Correspondance - Formalités	2011-01-28
Lettre envoyée	2010-06-07
Inactive : Lettre officielle	2010-06-07
Lettre envoyée	2010-06-07
Inactive : Lettre officielle	2010-06-01
Inactive : Transfert individuel	2010-04-06
Lettre envoyée	2010-02-09
Toutes les exigences pour l'examen - jugée conforme	2010-01-14
Exigences pour une requête d'examen - jugée conforme	2010-01-14
Requête d'examen reçue	2010-01-14
Lettre envoyée	2007-04-10
Lettre envoyée	2007-04-10
Lettre envoyée	2007-04-10
Inactive : Transfert individuel	2007-03-06
Demande visant la révocation de la nomination d'un agent	2007-03-06
Demande visant la nomination d'un agent	2007-03-06
Inactive : Lettre officielle	2007-02-27
Exigences relatives à la révocation de la nomination d'un agent - jugée conforme	2007-02-19
Exigences relatives à la nomination d'un agent - jugée conforme	2007-02-19
Inactive : Lettre officielle	2007-02-19
Inactive : Lettre officielle	2007-02-19
Lettre envoyée	2007-02-12
Lettre envoyée	2007-02-12
Lettre envoyée	2007-02-12
Inactive : Supprimer l'abandon	2007-02-01
Réputée abandonnée - omission de répondre à un avis exigeant une traduction	2007-01-03
Demande visant la nomination d'un agent	2007-01-02
Demande visant la révocation de la nomination d'un agent	2007-01-02
Inactive : Transfert individuel	2007-01-02
Inactive : Lettre officielle	2006-11-28
Inactive : Demande ad hoc documentée	2006-11-28
Demande visant la révocation de la nomination d'un agent	2006-10-30
Demande visant la nomination d'un agent	2006-10-30
Demande publiée (accessible au public)	2006-10-15
Inactive : Page couverture publiée	2006-10-15
Inactive : Incomplète	2006-10-03
Exigences de prorogation de délai pour l'accomplissement d'un acte - jugée conforme	2006-08-02
Lettre envoyée	2006-08-02
Inactive : Prorogation de délai lié aux transferts	2006-07-19
Inactive : Correspondance - Formalités	2006-01-03
Inactive : Conformité - Formalités: Réponse reçue	2006-01-03
Inactive : CIB attribuée	2005-08-03
Inactive : CIB en 1re position	2005-08-03
Demande reçue - nationale ordinaire	2005-05-18
Exigences relatives à une correction d'un inventeur - jugée conforme	2005-05-18
Exigences de dépôt - jugé conforme	2005-05-18
Inactive : Certificat de dépôt - Sans RE (Anglais)	2005-05-18
Inactive : Inventeur supprimé	2005-05-18

Date d'abandonnement	Raison	Date de rétablissement
2012-04-16
2007-01-03

Type de taxes	Anniversaire	Échéance	Date payée
Taxe pour le dépôt - générale			2005-04-15
Prorogation de délai			2006-07-19
Enregistrement d'un document			2007-01-02
			2007-01-03
TM (demande, 2e anniv.) - générale	02	2007-04-16	2007-01-29
Enregistrement d'un document			2007-03-06
TM (demande, 3e anniv.) - générale	03	2008-04-15	2008-01-11
TM (demande, 4e anniv.) - générale	04	2009-04-15	2009-03-17
TM (demande, 5e anniv.) - générale	05	2010-04-15	2010-01-11
Requête d'examen - générale			2010-01-14
Enregistrement d'un document			2010-04-06
TM (demande, 6e anniv.) - générale	06	2011-04-15	2011-02-21

Description du Document	Date (aaaa-mm-jj)	Nombre de pages	Taille de l'image (Ko)
Description	2005-04-14	31	1 414
Abrégé	2005-04-14	1	12
Revendications	2005-04-14	2	48
Dessin représentatif	2006-09-24	1	6
Dessins	2007-01-02	18	2 169
Certificat de dépôt (anglais)	2005-05-17	1	157
Demande de preuve ou de transfert manquant	2006-04-18	1	103
Rappel de taxe de maintien due	2006-12-17	1	112
Courtoisie - Certificat d'enregistrement (document(s) connexe(s))	2007-02-11	1	105
Courtoisie - Certificat d'enregistrement (document(s) connexe(s))	2007-04-09	1	105
Courtoisie - Certificat d'enregistrement (document(s) connexe(s))	2007-04-09	1	105
Courtoisie - Certificat d'enregistrement (document(s) connexe(s))	2007-04-09	1	105
Rappel - requête d'examen	2009-12-15	1	117
Accusé de réception de la requête d'examen	2010-02-08	1	176
Courtoisie - Certificat d'enregistrement (document(s) connexe(s))	2010-06-06	1	125
Courtoisie - Certificat d'enregistrement (document(s) connexe(s))	2010-06-06	1	125
Courtoisie - Lettre d'abandon (R30(2))	2012-04-29	1	166
Courtoisie - Lettre d'abandon (taxe de maintien en état)	2012-06-10	1	173
Correspondance	2005-05-17	1	26
Correspondance	2006-07-18	1	38
Correspondance	2006-08-01	1	16
Correspondance	2006-09-26	1	20
Correspondance	2006-10-29	2	54
Correspondance	2006-11-27	1	17
Correspondance	2007-01-01	2	71
Correspondance	2007-01-02	3	122
Correspondance	2007-02-18	1	15
Correspondance	2007-02-18	1	18
Taxes	2007-01-28	1	35
Correspondance	2007-03-05	1	50
Taxes	2007-03-05	1	48
Taxes	2008-01-10	1	34
Taxes	2009-03-16	1	36
Correspondance	2010-06-06	1	17
Correspondance	2011-01-27	2	65
Correspondance	2011-08-03	1	13

Sélection de la langue

Menus

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Taxes périodiques

Historique des taxes

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.

Titulaires antérieures au dossier
BENOIT ROBITAILLE
G. SCOTT TURNBULL
JAY LITKEY
JOHN G. MCCARTHY
WILLIAM RUSSELL CRICK