Language selection

Search

Patent 2358807 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 2358807
(54) English Title: MULTIPLE SOFTWARE-FACILITY COMPONENT OPERATING SYSTEM FOR CO-OPERATIVE PROCESSOR CONTROL WITHIN A MULTIPROCESSOR COMPUTER SYSTEM
(54) French Title: PIECE COMPOSANTE D'UN SYSTEME D'EXPLOITATION DOTE D'AMENAGEMENTS POUR MULTIPLE LOGICIELS POUR PROCESSEUR DE COMMANDE CO-OPERATIF A L'INTERIEUR D'UN SYSTEME MULTI-PROCESSEUR D'ORDINATEUR
Status: Term Expired - Post Grant Beyond Limit
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06F 15/76 (2006.01)
  • G06F 13/00 (2006.01)
  • G06F 15/16 (2006.01)
(72) Inventors :
  • HITZ, DAVID (United States of America)
  • SCHWARTZ, ALLAN (United States of America)
  • LAU, JAMES (United States of America)
  • HARRIS, GUY (United States of America)
(73) Owners :
  • NETWORK APPLIANCE, INC.
(71) Applicants :
  • NETWORK APPLIANCE, INC. (United States of America)
(74) Agent: GOWLING WLG (CANADA) LLP
(74) Associate agent:
(45) Issued: 2005-11-01
(22) Filed Date: 1990-08-20
(41) Open to Public Inspection: 1991-04-04
Examination requested: 2001-10-11
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
404,885 (United States of America) 1989-09-08

Abstracts

English Abstract


This is achieved in a computer system employing a
multiple facility operating system architecture. The
computer system includes a plurality of processor units for
implementing a predetermined set of peer-level facilities
wherein each peer-level facility includes a plurality of
related functions and a communications bus for
interconnecting the processor units. Each of the processor
units includes a central processor and the stored program
that, upon execution, provides for the implementation of a
predetermined peer-level facility of the predetermined set
of peer-level facilities, and for performing a
multi-tasking interface function. The multi-tasking interface
function is responsive to control messages for selecting
for execution functions of the predetermined peer-level
facility and that is responsive to the predetermined
peer-level facility for providing control messages to request or
to respond to the performance of functions of another
peer-level facility of the computer system. The multi-tasking
interface functions of each of the plurality of processor
units communicate among one another via the network bus.


Claims

Note: Claims are shown in the official language in which they were submitted.


142
The embodiments of the invention in which an
exclusive property or privilege is claimed
are defined as follows:
1. A computer system employing a multiple facility operating
system architecture, said computer system comprising:
a) a plurality of processor units provided to co-
operatively execute a predetermined set of operating system
peer-level facilities, wherein each said processor units is
associated with a respective different one of said operating
system peer-level facilities and not another of said
operating system peer level facilities, and wherein each of
said operating system peer-level facilities constitutes a
respective separately executed software entity which
includes a respective distinct set of peer-level facility
related functions, each said processor unit including:
i) a processor capable of executing a control program;
and
ii) a memory store capable of storing said control
program, said processor being coupled to said memory store
to obtain access to said control program,
said memory store providing for the storage of a first
control program portion that includes a one of said
respective distinct sets of operating system peer-level
facility related functions and that corresponds to a one of
said predetermined operating system peer-level facilities,
and a second control program portion that provides for the
implementation of a multi-tasking interface function, said
multi-tasking interface function being responsive to control
messages for selecting for execution a one of said peer-
level facility related functions of said one of said
predetermined operating system peer-level facilities and
responsive to said one of said predetermined operating
system peer-level facilities for providing control messages
to request or in response to the performance of said

143
predetermined peer-level facility related functions of
another operating system peer-level facility; and
b) a communications bus that provides for the
interconnection of said plurality of processor units, said
communications bus transferring said control messages
between the multi-tasking interface functions of said
predetermined set of operating system peer-level facilities.
2. The computer system of claim 1 wherein a first one of
said predetermined set of operating system peer-level
facilities includes a network communications facility and a
second one includes a filesystem facility.
3. The computer system of claim 2 wherein said network
communications facility is coupled to a network to permit
the receipt of network requests, said network communications
facility providing for the identification of a predetermined
filesystem type network request, said multi-tasking
interface function of said network communications facility
being responsive to said predetermined filesystem type
network request to provide a predetermined control message
to said filesystem facility to request the performance of a
predetermined filesystem function.
4. The computer system of claim 3 further comprising a data
store that provides for the storage of data, said
predetermined filesystem type network request directing said
network communications facility to transfer predetermined
data with respect to said network, said data store being
coupled to said network communications facility for storing
said predetermined data.
5. The computer system of claim 3 or 4 wherein said
predetermined set of peer-level facilities further includes
a storage facility and wherein said filesystem facility
provides for the performance of said predetermined

144
filesystem function, said multi-tasking interface function
of said filesystem facility being responsive to said
filesystem facility to provide control messages to said
storage facility to request the performance of a
predetermined storage access function.
6. The computer system of claim 5 wherein said predetermined
storage access function directs said storage facility to
transfer said predetermined data, said data store being
coupled to said storage facility for storing said
predetermined data.
7. A computer system implementing a co-operative facility
based operating system architecture, said computer system
comprising:
a) a plurality of processors, each being coupled to a
respective control program store and a respective data
store, said plurality of processors being interconnected by
a communications bus; and
b) a multiple facility operating system having a kernel
and providing for the message based co-operative operation
of said plurality of processors, said multiple facility
operating system providing for the operating system internal
execution of a plurality of operating system peer-level
facilities by execution of each of said peer-level
facilities by a respective different one of said plurality
of processors, each of said peer-level facilities
constituting a respective software entity executed
separately from said kernel, wherein each of said plurality
of facilities implements a multi-tasking interface
coupleable between said communications bus and a respective
and unique peer-level control function set to permit message
transfer between each of said plurality of facilities.
8. The computer system of claim 7 wherein said plurality of
facilities includes a network facility and a filesystem

145
utility, wherein said network facility includes a
communications network peer-level control function coupled
between a first multi-tasking interface and a network
interface and said filesystem facility includes a data
storage peer-level control function coupled between a second
multi-tasking interface and a filesystem.
9. The computer system of claim 8 wherein said network
facility is coupled through said network interface to a
communications network, wherein said network facility is
responsive to a predetermined network filesystem message
received via said network interface to provide a
predetermined filesystem message, and wherein said
filesystem facility is responsive to said predetermined
filesystem message to transfer data with respect to said
filesystem.
10. The computer system of claim 9 further comprising a
common data store, said network facility providing for the
transfer of data between said network interface and said
data store, said filesystem facility providing for the
transfer of data between said data store and said
filesystem, said communications network peer-level control
function directing a message to said filesystem peer-level
control function identifying a predetermined location of
data in said data store with respect to said predetermined
filesystem message.
11. A computer system employing a multiple facility
operating system to provide for co-operative operation of a
plurality of processors,
wherein said operating system includes a kernel and a
plurality of additional component facilities executed
separately from said kernel, each of said component
facilities including a facility sub-component, that defines
the execution operation of a one of said component

146
facilities, coupled to a multi-tasking interface sub-
component,
wherein said computer system comprises:
a) a plurality of processors executing said operating
system, each of said processors including local memory for
the storage and execution of a respective component
facility;
b) a data memory accessible by each of said processors
for the storage and retrieval of data blocks exchangeable
between said processors; and
c) a communications bus coupling said processors and
said data memory to permit the exchange of control messages
between said processors and data through said data memory,
and wherein said processors each implement a respective
different local sub-set of fewer than all of said component
facilities that depends through the exchange of control
messages on the execution of another sub-set of said
componentized facilities by another of said processors to
co-operatively implement said operating system.
12. The computer system of claim 11 wherein control messages
communicate any of a facility sub-component function
request, a facility sub-component function response, and a
facility sub-component identifier of a memory space within
said data memory to use in connection with said sub-
component function request.
13. The computer system of claim 12 wherein said plurality
of component facilities includes a network facility and a
filesystem facility, wherein a network facility sub-
component is executed by a first processor to process
network requests and data transfers and a filesystem
facility sub-component is executed by a second processor to
process filesystem requests and data transfers derivative of
said network requests and data transfers.

147
14. The computer system of claim 1, wherein one of the
processor units in said plurality of processor units is
provided further to execute a further operating system peer-
level facility not in said predetermined set of operating
system peer-level facilities.
15. The computer system of claim 7, wherein said multiple
facility operating system provides further for the operating
system internal execution of a further operating system
peer-level facility not in said plurality of operating
system peer-level facilities, by execution of said further
peer level facility by one of the processors in said
plurality of processors.
16. The computer system of claim 7, wherein said kernel is a
Unix kernel.
17. The computer system of claim 11, wherein said kernel is
a Unix kernel.

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA102358807 2004-09-02
1-
MULTIPLE SOFTWARE-FACILITY COMPONENT OPERATING SYSTEM FOR
CO-OPERATIVE PROCESSOR CONTROL WITHIN A MULTIPROCESSOR
COMPUTER SYSTEM
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is related to the following
U.S. Patents:
1. PARALLEL I/O NETWORK FILE SERVER ARCHITECTURE,
inventors: John Row, Larry Boucher, William Pitts, and
Stephen Blightman, now U.S. Patent Nos. 5,163,131 and
5,355,453;
2. ENHANCED VMEBUS PROTOCOL UTILIZING
PSEUDOSYNCHRONOUS HANDSHAKING AND BLOCK MODE DATA
TRANSFER invented by William Pitts, Stephen Blightman and
Daryl D. Starr, now U.S. Patent No. 5,388,231;
3. HIGHSPEED, FLEXIBLE SERVICE/DISTRIBUTION DATA
BURST DIRECT MEMORY ACCESS CONTROLLER, invented by Daryl
D. Starr, Stephen Blightman, and Larry Boucher, now U.S.
Patent No. 5,175,825.
The above applications are all assigned to the
assignee of the present invention.

CA 02358807 2001-10-11
-2-
Field of th ~ T_nvention:
The present invention is generally related to
operating system software architectures and, in
particular, to a mufti-processor operating system
architecture based on multiple independent multi-
tasking process kernels.
Background o' the Invention:
The desire to improve productivity, in
circumstances involving computers, is often realized by
an improvement in computing throughput. Conventional
file servers are recognized as being a limiting factor
in the potential productivity associated with their
client workstations.
A file server is typically a conventional computer
system coupled through a communications network, such
as Ethernet, to client workstations and potentially
other workstation file servers. The file server
operates to provide a common resource base to its
clients. The primary resource is typically the central
storage and management of data files, but additional
services including single point execution of certain
types of programs, electronic mail delivery and gateway
connection to other file servers and services are
generally also provided.

CA 02358807 2001-10-11
-3-
The client workstations may utilize any of a
number of communication network protocols to interact
with the file server. Perhaps the most commonly
known, if not most widely used, protocol suite is
TCP/IP. This protocol suite and its supporting utility
programs, provide for the creation of logical
communication channels between multiple client
workstations and a file server. These communication
channels are generally optimized for point-to-point
file transfers, i.e., multi-user file access control or
activity administration is not provided. In addition,
the supporting utility programs for these protocols
impose a significant degree of user interaction in
order to initiate file transfers as well as the entire
responsibility to manage the files once transferred.
Recently, a number of network connected remote
file system mechanisms has been developed to provide
clients with a single consistent view of a file system
of data files, even though portions of the file system
may be physically distributed between a client's own
local storage, one or more file servers or even other
client workstations.. These network file system
mechanisms operate to hide the distinction between
local data files and data files in the remotely
distributed portions of the file system accessible only
through the network. The advantages of such file

CA 02358807 2001-10-11
-4-
system mechanisms include retention of multi-user
access controls over the data files physically present
on the server, to the extent intrinsically provided by
a server, and a substantial simplification of a client
workstation's view and productive utilization of the
file system.
Two implementations of a network file system
mechanism are known as the network file system (NFS),
available from Sun Microsystems, Inc., and the remote
file sharing (RFS) system available from American
Telephone and Telegraph, Inc.
The immediate consequence of network file system
mechanism is that they have served to substantially
increase the throughput requirements of the file server
itself, as well as that of the communications network.
Thus, the number of client workstations that can be
served by a single file server must be balanced against
the reduction in productivity resulting from increased
file access response time and the potentially broader
effects of a degradation in communication efficiency
due to the network operating at or above its service
maximum.
An increase in the number of client workstations
is conventionally handled by the addition of another
file server, duplicating or possibly partitioning the
file system between the file servers, and providing a

CA 02358807 2001-10-11
-5-
dedicated high bandwidth network connection between
the file servers. Thus, another consequence of the
limited throughput of conventional file servers is a
greater cost and configuration complexity of the file
server base in relation to the number of client
workstations that can be effectively serviced.
Another complicating factor, for many technical
and practical reasons, is a requirement that the file
server be capable of executing the same or a similar
operating system as the attached client workstations.
The reasons include the need to execute maintenance
and monitoring programs on the file server, and to
execute programs, such as database servers, that would
excessively load the communications network if executed
remotely from the required file data. Another often
overlooked consideration is the need to avoid the cost
of supporting an operating system that is unique to the
file server.
Given these considerations, the file server is
typically only a conventional general purpose computer
with an extended data storage capacity and
communications network interface that is little
different from that present on each of the client
workstations. Indeed, many file servers are no more
than a physically repackaged workstation.
Unfortunately, even with multiple communications

CA 02358807 2001-10-11
-6-
network interfaces, such workstation-based computers
are either incapable or inappropriate, from a
cost/performance viewpoint, to perform as a single file
server to a large group of client workstations.
The throughput offered by conventional general
purpose computers, considered in terms of their
sustained file system facility data transfer bandwidth
potential, is limited by a number of factors, though
primarily due to the general purpose nature of their
design. Computer system design is necessarily
dependent on the level and nature of the operating
system to be executed, the nature of the application
load to be executed, and the degree of homogeneity of
applications. For example, a computer system utilized
solely for scientific computations may forego an
operating system entirely, may be restricted to a
single user at a time, and employ specialized
computation hardware optimized for the anticipated
highly homogeneous applications. Conversely, where an
operating system is required, the system design
typically calls for the utilization of dedicated
peripheral controllers, operated under the control of a
single processor executing the operating system, in an
effort to reduce the peripheral control processing
overhead of the system's single primary processor.
Such is the design of most conventional file servers.

CA 02358807 2001-10-11
_7-
A recurring theme in the design of general
purpose computer systems is to increase the number of
active primary processors. In the simplest analysis,
a linear improvement in the throughput performance of
the computer system might be expected. However,
utilization of increasing numbers of primary
processors is typically thwarted by the greater growth
of control overhead and contention for common
peripheral resources. Indeed, the net improvement in
throughput is often seen to increase slightly before
declining rapidly as the number of processors is
increased.
Summary of the Invention:
Therefore, a general purpose of the present
invention is to provide an operating system
architecture for the control of a multi-processor
system to provide an efficient, expandable computer
system for servicing network file system requests.
This is achieved in a computer system employing a
multiple facility operating system architecturQ. Tht.
computer system includes a plurality of processor units
for implementing a predetermined set of peer-level
facilities, wherein each peer-level facility implements
a plurality of related functions, and a communications
bus for interconnecting the processor units. Each of

CA 02358807 2001-10-11
_g_
the processor units includes a central processor and a
stored program that, upon execution, provides for the
implementation of a predetermined peer-level facility
and for implementing a multi-tasking interface
function. The multi-tasking interface function is
responsive to control messages for selecting for
execution functions of the predetermined peer-level
facility. The multi-tasking interface function is also
responsive to the predetermined peer-level facility for
providing control messages to request or to respond to
the performance of functions of another peer-level
facility of the computer system. The multi-tasking
interface functions of each of the plurality of
processor units communicate among one another via the
network bus.
Thus, in a preferred embodiment of the present
invention, the set of peer-level facilities includes
network communications, file system control, storage
control and a local host operating system.
An advantage of the present invention is that it
- provides for the implementation of multiple facilities,
each instance on a respective processor, all within a
single cohesive system while incurring little
additional control overhead i~ order to maintain
operational coherency.

CA 02358807 2001-10-11
_g_
Another advantage of the present invention is that
direct peer to peer-level facility communication is
supported in order to minimize overhead in processing
network file system requests.
A further advantage of the present invention is
that it realizes a computer system software
architecture that is readily expandable to include
multiple instances of each peer-level facility, and
respective peer-level processors, in a single cohesive
operating system environment including direct peer to
peer-level facility communications between like
facilities.
Yet another advantage of the present invention is
that it may include an operating system as a facility
operating concurrently and without conflict with the
otherwise independent peer to peer-12ve1 facility
communications of the other peer-level facilities. The
operating system peer-level facility may itself be a
conventional operating system suitably compatible with
the workstation operating systems so as to maintain
compatibility with "standard" file server operating
systems. The operating system peer-level facility may
be used to handle exception conditions from the other
peer-level facilities including handling of non-network
file system requests. Consequently, the multiple
facility operating system architecture of the present

CA 02358807 2001-10-11
-10-
invention appears to client workstations as a
conventional, single processor file server.
A still further advantage of the present
invention is that it provides a message-based operating
system architecture framework for the support of
multiple, specialized peer-level facilities within a
single cohesive computer system; a capability
particularly adaptable for implementation of a high-
performance, high-throughput file server.
Brief Descrigtior of the Drawing
These and other attendant advantages and features
of the present invention will become apparent and
readily appreciated as the same becomes better
understood by reference to the following detailed
description when considered in conjunction with the
accompanying drawings, in which like reference numerals
indicate like parts throughout the figures thereof, and
wherein:
Fig. 1 is a simplified block diagram of a
preferred computer system architecture for implementing
the multiple facility operating system architecture of
the present invention;
Fig. 2_ is a block diagram of a network
communications processor suitable for implementing a
network communications peer-level facility in

CA 02358807 2001-10-11
-11-
accordance with a preferred embodiment of the present
invention;
Fig. 3 is a block diagram of a file system
processor suitable for implementing a file system
controller peer-level facility in accordance with a
preferred embodiment of the present invention;
Fig. 4 is a block diagram of a storage processor
suitable for implementing a storage peer-level facility
in accordance with a preferred embodiment of the
present invention;
Fig. 5 is simplified block diagram of a primary
memory array suitable for use as a shared memory store
in a preferred embodiment of the present invention;
Fig. 6 is a block diagram of the multiple facility
operating system architecture configured in accordance
with a preferred embodiment of the present invention;
Fig. 7 is a representation of a message descriptor
passed between peer-level facilities to identify the
location of a message;
Fig. 8 is a representation of a peer-level
facility message as used in a preferred embodiment of
the present invention;
Fig. 9 is a simplified representation of a
conventional program function call;
Fig. 10 is a simplified representation of an

CA 02358807 2001-10-11
- 12-
inter-facility function call in accordance with the
preferred embodimer.~ of the present invention;
Fig. 11 is a control state diagram illustrating
the interface functions of two peer-level facilities in
accordance with a preferred embodiment of the present
invention;
Fig. 12 is an illustration of a data f low for an
LFS read request through the peer-level facilities of a
preferred embodiment of the present invention;
Fig. 13 is a:. illustration of a data flow for an
LFS write request Through the peer-level facilities of
a preferred embodi::.ent of the present invention;
Fig. 14 illustrates the data flow of a non-LFS
data packet between the network communication and
local host peer-level facilities in accordance with a
preferred embodime~t of the present invention; and
Fig. 15 illus~rates the data flow of a data packet
routed between two network communications peer-level
facilities in accordance with a preferred embodiment of
the present invention.
Detailed Description of the Invention:
While the present invention is broadly applicable
to a wide variet~: of hardware architectures, and its
software archi~ecture may be represented and
implemented in a variety of specific manners, the

CA 02358807 2004-09-02
-13-
present invention may be best understood from an
understanding of its preferred embodiment.
I Sv~em Architecture Overview
S A Hardware Arm itecture Overview
A block diagram representing the preferred
embodiment of the hardware support for the present
invention, generally indicated by the reference
numeral 10, is provided in Fig 1. The architecture of
the preferred hardware system 10 is described in the
above-identified U.S. Patent No. 5,163,131, entitled PARALLEL
I/O NETWORK FILE SERVER ARCHITECTURE.
The hardware components of the system 10 include
multiple instances of. network controllers '12, file
system controllers 14, and mass storage processors, 16,
interconnected by a high-bandwidth backplane bus 22.
Each of these controllers 12, 14, 16 preferably include
a high performance processor and local program store,
thereby minimizing their need to access the bus 22.
Rather, bus 22 accesses by the controllers 12, 14, 16
are substantially limited to transfer accesses as
required to transfer control information and client
workstation data between the controllers I2, 14, 16

CA 02358807 2001-10-11
_1~_
sys tem memory 13 , and a local cost processor 20 , w,~e;,
.necessa=v.
The il lustrated preferred system 10 configuration
includes four network controllers 121 ~, two file
controllers 141_x, two mass storage processors 161_2' a
bank of four system memory cards 181_4, and a host
processor 20 coupled to the backplane bus 22. The
invention, however, is not limited to this number and
type of processors. Rather, six or more network
communications processors 12 and two or more host
processors 20 could be implemented within the scone of
the Dresent invention.
Each network communications processor (NP) 121_4
preferabl y includes a Motorola 68020 (trade-mark) processor for
1~ supporting two independent Ethernet network
connections, shown as the network pairs 261-204. Each
' of the network connections directly support the ten
megabit per second data rate specified for a
conventional individual Ethernet network connection.
The preferred hardware embodiment of the present
invention thus realizes a combined maximum data
throughput potential of 80 megabits per second.
The file system processors (FP) 141_2, intended to
-operate primarily as a specialized compute engines,
each include a high-performance Motorola 68020 based
microprocessor, four megabytes of local data store and

CA 02358807 2001-10-11
a Smal l 2r qLlar ter-megabyte high-Speed DrOgram memory
S t0 r a .
The storage processors (SP) 161_ function as
intelligent small computer system interface (SCSI)
S controllers. Each includes a Motorola 68020 micro
processor, a local program and data memory, and an
array of ten parallel SCSI channels. Drive arrays
241-2 are coupled to the storage processors 161-2 to
provide mass storage. Preferably, the drive arrays
241-~ are ten unit-wide arrays of SCSI storage devices
uniformly from one to three units deep. The preferred
' embodiment of the present invention uses conventional
768 megabyte 5:-inch hard disk drives for each unit of
the arrays 241-~. Thus, each drive array level
achieves a storage capacity of approximately 6
gigabytes, with each storage processor readily
supporting a total of 18 gigabytes. Consequently, a
system 10 is capable of realizing a total combined data
storage capacity of 36 gigabytes.
The local host processor 20, in the preferred
embodiments of the present invention, is a Sun central
processor card, model Sun 3E120 (trade-mark> , manufactured and
distributed by Sun Microsystems, Inc.
Finally, the system memory cards 18 each provide
48 megabytes of 32-bit memory for shared use within the

CA 02358807 2004-09-02
. _) ...)
-16-
computer system 10. The memory is logically visible to
each of the processors of the system 10.
A VME bus 22 is used in the preferred embodiments
of the present invention to interconnect the network
communication processors 12, file system processors 14,
storage processors 16, primary memory 18, and host
processor 20. The hardware control logic for
controlling the VME bus 22, at least as implemented on
the network communication processor 12 and storage
processor 16, implements a bus master fast transfer
protocol in addition to the conventional VME transfer
protocols. The system memory 18 correspondingly
implements a modified slave VME bus control logic to
allow the system memory 18 to also act as the fast data
15~ transfer data source or destination for the network
communication processors 12 and storage processors 16.
It should be understood that, while the system 10
configuration represents the initially preferred
maximum hardware configuration, the present invention
is not limited to the preferred number ~or type of
controllers, the preferred size and t
ype of disk

CA 02358807 2001-10-11
_ 1 '~ _
drives or use of the preferred fast data transfer VME
protocol.
H S ftware Architecture Overview
S Although applicable to a wide variety of primary,
or full function, operating systems such as MVS and
VMS, the preferred embodiment of the present invention
is premised on the Unix operating system as distributed
under license by American Telephone and Telegraph, Inc.
and specifically the SunOS version of the Unix
operating system, as available from Sun Microsystems,
Inc. The architecture of the Unix operating system has
been the subject of substantial academic study and
many published works including "The Design of the Unix
Operating System", Mauriee J. Bach, Prentice Hall,
Inc., 1986.
In brief, the Unix operating system is organized
around a non-preemptive, multi-tasking, multi-user
kernel that implements a simple file-oriented
conceptual model of a file system. Central to the
modem-~ is a virtual file system (VFS) interface that
operates to provide a uniform file oriented, multiple
file system environment for both local and remote
files .
Connected to the virtual file system is the Unix
file system (UFS). The UFS allows physical devices,

CA 02358807 2001-10-11
-18-
pseudo-devices and other logical devices to appear and
be treated, from a client's perspective, as simple
files within the file system model. The UFS interfaces
to the VFS to receive and respond to file oriented
requests such as to obtain the attributes of a file,
the stored parameters of a physical or logical device,
and, of course, to read and write data. In carrying
out these functions, the UFS interacts with a low level
software device driver that is directly responsible for
an attached physical mass storage device. The UFS
handles all operations necessary to resolve logical
file oriented operations, as passed from the VFS, down
to the level of a logical disk sector read or write
request.
The VFS, in order to integrate access to remote
files into the file system model, provides a connection
point for network communications through the network
file system mechanism, if available. The preferred
network file system mechanism, NFS, is itself premised
on the existence of a series of communication protocol
layers that, inclusive of NFS and within the context of
the present invention, can be referred to as an NFS
stack. These layers, in addition to an NFS "layer,"
typically include a series of protocol handling layers
generally consistent with the International Standards
Organization's Open Systems Interconnection (ISO/OSI)

CA 02358807 2001-10-11
-19-
model. The OSI model has been the subject of many
publications, both regarding the conceptual aspects of
the model as well as specific implementations,
including "Computer Networks, 2nd Edition", Andrew S.
Tanenbaum, Prentice Hall, 1988.
In summary, the OSI layers utilized by the present
invention include all seven layers described in the OSI
reference model: application, presentation, session,
transport, network, data link and physical layers.
These layers are summarized below, in terms of their
general purpose, function and implementation for
purposes of the present invention.
The application layer protocol, NFS, provides a
set of remote procedure -call definitions, for use in
both server and client oriented contexts, to provide
network file services. As such, the NFS layer
provides a link between the VFS of the unix kernel and
the presentation protocol layer.
The presentation layer protocol, provided as an
external data representation (XDR) layer, defines a
common description and encoding of data as necessary to
allow transfer of data between different computer
architectures. The XDR is thus responsible for syntax
and semantic translation between the data
representations of heterogeneous computer systems.

CA 02358807 2001-10-11
-20-
The session layer protocol, implemented as a
remote procedure call (RPC) layer, provides a remote
procedure call capability between a client process and
a server process. In a conventional file server, the
NFS layer connects through the XDR layer to the RPC
layer in a server context to support the file oriented
data transfers and related requests of a network
client.
The transport layer protocol, typically
implemented as either a user datagram protocol (UDP)
or transmission control protocol (TCP) layer, provides
for a simple connectionless datagram delivery service.
NFS uses UDP.
The network layer protocol, implemented as an
i
Internet protocol (IP) layer, performs Internet
routing, based on address mappings stored in an IP
routing database, and data packet fragmentation and
reassembly.
The data link (DL) layer manages the transfer and
receipt of data packets based on packet frame
information. Often this layer is referred to as a
device driver, since it contains the low level software
control interface to the specific communications
hardware, including program control of low level data
transmission error correction/handling and data flow

CA 02358807 2001-10-11
-21-
control. As such, it presents a hardware independent
interface to the IP layer.
Finally, the physical layer, an Ethernet
controller, provides a hardware interface to the
network physical transmission medium.
The conventional NFS stack, as implemented for
the uniprocessor VAX architecture, is available in
source code form under license from Sun Microsystems,
Inc.
The preferred embodiment of the present invention
utilizes the conventional SunOS Unix kernel, the
Sun/VAX reference release of the UFS, and the Sun/VAX
reference release of the NFS stack as its operating
system platform. The present invention establishes an
instantiation of the NFS stack as an independent, i.e.,
separately executed, software entity separate from the
Unix kernel. Instantiations of the UFS and the mass
' storage device driver are also established as
respective independent software entities, again
separate from the Unix kernel. These entities, or
peer-level facilities, are each provided with an
interface that supports direct communication between
one another. This interface, or messaging kernel
layer, includes a message passing, multi-tasking
kernel. The messaging kernel's layers are tailored to
each type of peer-level facility in order to support

CA 02358807 2001-10-11
-22-
the specific facility's functions. The provision for
mufti-tasking operation allow s the peer-level
facilities to manage multiple concurrent processes.
Messages are directed to other peer-level facilities
based upon the nature of the function requested. Thus,
for NFS file system requests, request messages may be
passed from an NFS network communications peer-level
facility directly to a UFS file system peer-level
facility and, as necessary, then to the mass storage
peer-level facility. The relevant data path is between
the NFS network communications peer-level facility and
the mass storage peer-level facility by way of the VME
shared address space primary memory. Consequently, the
number of peer-level facilities is not logically
bounded and servicing of the most common type of client
workstation file system needs is satisfied while
requiring only a minimum amount of processing.
Finally, a Unix kernel, including its own NFS
stack, UFS, and mass storage device driver, is
established as a another peer-level facility. As with
the other peer-level facilities, this operating system
facility is provided with a mufti-tasking interface for
interacting concurrently with the other peer-level
facilities as just another entity within the system 10.
While the operating system kernel peer-level facility
is not involved in the immediate servicing of most NFS

CA 02358807 2001-10-11
-23-
requests, it interacts with the NFS stack peer-level
facility to perform general management of the ARP and
IP data bases, the initial NFS file system access
requests from a client workstation, and to handle any
non-NFS type requests that might be received by the NFS
stack peer-level facility.
II Peer-level Processors
Network Control Processor
A block diagram of the preferred network control
processor is shown in Fig. 2. The network controller
12 includes a 32-bit central processing unit (CPU) 30
coupled to a local CPU bus 32 that includes address,
control and data lines. The CPU is preferably a
Motorola 68020 processor. The data line portion of the
CPU bus 32 is 32 bits wide. A11 of the elements
coupled to the local bus 32 of the network controller
12 are memory mapped from the perspective of the CPU
30. This is enabled by a buffer 34 that connects the
local bus 32 to a boot PROM 38. The boot PROM 38 is
utilized to store a boot program and its necessary
start-up and operating parameters. Another buffer 40
allows the CPU 30 to separately address a pair of
Ethernet local area network (LAN) controllers 42, 44,
their local data packet memories 46, 48, and their
associated packet direct memory access (DMA)

CA 02358807 2001-10-11
-24-
controllers 50, 52, via two parallel address, control,
and 16-bit wide data buses 54, 56. The LAN controllers
42, 44 are programmed by the CPU 30 to utilize their
respective local buffer memories 46, 48 for the
storage and retrieval of data packets as transferred
via the Ethernet connections 26. The DMA controllers
50, 52 are programmed by the CPU 30 to transfer data
packets between the buffer memories 46, 48 and a
respective pair of multiplexing FIFOs 58, 60 also
connected to the LAN buses 54, 56. The multiplexing
FIFOs 58, 60 each include a 16-bit to 32-bit wide data
multiplexer/demultiplexer, coupled to the data portion
of the LAN buses 54, 56, and a pair of internal FIFO
buffers. Thus, for example in the preferred embodiment
of the present invention, a first 32-bit wide internal
FIFO is coupled through the multiplexes to the 16-bit
wide LAN bus 54. The second internal FIFO, also 32-bit
' wide, is coupled to a secondary data bus 62. These
internal FIFO buffers of the multiplexing FIFO 58, as
well as those of the multiplexing FIFO 60, may be
swapped between their logical connections to the LAN
buses, 54, 56 and the secondary data bus 62. Thus, a
large difference in the data transfer rate of the LAN
buses 54, 60 and the secondary data bus 62 can be
maintained for a burst data length equal to the depth
of the internal FIFOs 58, 60.

CA 02358807 2001-10-11
-25-
A high speed DMA controller 64, controlled by the
CPU 30, is provided to direct the operation of the
multiplexing FIFOs 58, 60 as well as an enhanced VME
control logic block 66, through which the data provided
on the secondary data bus 62 is communicated to the
data lines of the VME bus 22. The purpose of the
multiplexing FIFOs 58, 60, besides acting as a 16-bit
to 32-bit multiplexes and buffer, is to ultimately
support the data transfer rate of the fast transfer
mode of the enhanced VME control logic block 66.
Also connected to the local CPU data bus 32 is a
quarter megabyte block of local shared memory 68, a
buffer 70, and a third multiplexing FIFO 74. The
memory 68 is shared in the sense that it also appears
within the memory address space of the enhanced VME bus
22 by way of the enhanced VME control logic block 66
and buffer 70. The buffer 70 preferably provides a
bidirectional data path for transferring data between
the secondary data bus 62 and the local CPU bus 32 and
also includes a status register array for receiving and
storing status words either from the CPU 30 or from the
enhanced VME bus 22. The multiplexing FIFO 74,
identical to the multiplexing FIFOs 58, 60, provides a
higher speed, blocs-oriented data transfer capability
for the CPU 30.

CA 02358807 2004-09-02
-26-
Finally, a message descriptor FIFO 72 is connected
between the secondary data bus 62 and~the local CPU bus
32. Preferably, the message descriptor FIFO 72 is
addressed from the enhanced VME bus 22 as a single
S shared memory location for the receipt of message
descriptors. Preferably the message descriptor FIFO 72
is ~32-bit wide, single buffer FIFO with a 256-word
storage capability. In accordance with the preferred
embodiments of the present invention, the message
descriptor FIFO ~is described in detail in the above-
referenced related U.S. Patent No. 5,388,231; ENHANCED VMEBUS PROTOCOL
UTILIZING PSEUDOSYNCHRONOUS HANDSHAKING AND BLOCK MODE DATA TRANSFER
However, for purposes of completeness, an enhancement
embodied in the enhanced VME control logic block 66 is
that it. preemptively allows writes to the message
descriptor FIFO 72 from the enhanced VME bus,22 unless
the FIFO 72 is full. Where a write to the message
descriptor FIFO 72 cannot be accepted, the enhanced VME
control logic block 66 immediately declines the write
by issuing a VME bus error signal onto the enhanced
VME bus:

CA 02358807 2001-10-11
-27-
8 File System Control Processor
The preferred architecture of a file system
processor 14 60 is shown in Fig. 3. A CPU 80,
preferably a Motorola 68020 processor, is connected via
a local CPU address, control and 32-bit wide data bus
82 to the various elements of the file controller 14.
These principle elements include a 256 kilobytes of
static RAM block 84, used for storing the file system
control program, and a four megabyte dynamic RAM block
86 for storing local data, both connected directly to
the local CPU bus 82. A buffer 88 couples the local
CPU bus 82 to a secondary 32-bit wide data bus 90 that
is, in turn, coupled through an enhanced VME control
and logic block 92 to the data bus lines of the VME bus
22. In addition to providing status register array
storage, the buffer 88 allows the memory blocks 84, 86
to be accessible as local shared memory on the VME bus
22. A second buffer 94 is provided to logically
position a boot PROM 96, containing the file controller
initialization program, within the memory address map
of the CPU 80. Finally, a single- buffer message
descriptor FIFO 98 is provided between the secondary
data bus 90 and the local CPU bus 82. The message
descriptor FIFO 98 is again provided to allow
preemptive writes to the file controller 14 from the
enhanced VME bus 22.

CA 02358807 2001-10-11
-28-
Storac~P Control Processor
A block diagram of a storage processor 16 is
provided in Fig. 4. A CPU 100, preferably a Motorola
68020 processor, is coupled through a local CPU
address, control and 32-bit wide data bus 102 and a
buffer 104 to obtain access to a boot PROM 106 and a
double-buffered multiplexing FIFO 108 that is, in turn,
connected to an internal peripheral data bus 110. The
internal peripheral data bus 110 is, in turn, coupled
through a parallel channel array of double-buffered
multiplexing FIFOs 1121_10 and SCSI channel
controllers 1141_10' The SCSI controllers 1141_10
support the respective SCSI buses (SCSIO - SCSI9) that
IS connect to a drive array 24.
Control over the operation of the double buffer
FIFO 1121_10 and SCSI controller 1141_10 arrays is
' ultimately by the CPU 100 via a memory-mapped buffer
116 and a first port of a dual ported SRAM command
block 118. The second port of the SRAM block 118 is
coupled to a DMA controller 120 that controls the low
level transfer of data between the double-buffered
FIFOs 108, 1121_10' a temporary store buffer memory 122
and the enhanced VME bus 22. In accordance with a
preferred embodiment of the present invention, the DMA
controller responds to commands posted by the CPU 100

CA 02358807 2004-09-02
l
_29_
in the dual-ported SRAM block 118 to select any of the
double-buffered FIFOs 108, 1121_10' the buffer memory
122, and the enhanced VME bus 22 as a source or
destination of a data block transfer. To accomplish
S this, the DMA controller 120 is coupled through a
control bus 124 to the double buffered FIFOs 108, 1121_
10' the SCSI controllers 1141_10' the buffer memory
122, a pair of secondary data bus buffers 126, 128, and
an enhanced VME control and logic block 132. The
buffers 126, 128 are used to route data by selectively
coupling the internal peripheral data bus 110 to a
secondary data bus 130 and the buffer memory 122. The .
DMA controller 120, as implemented in accordance with
a preferred embodiment of the present invention, is
described in detail in the above-referenced related
U.S. Patent No. 5,175,825, HIGHSPEED, FLEXIBLE SERVICE/
DISTRIBUTION DATA BURST DIRECT MEMORY ACCESS CONTROLLER
local shared memory block 134, a high speed buffer and
register array 136, and a preemptive write message
descriptor FIFO 138 are provided connected directly to
the local CPU data bus 102. The buffer 136 is also
coupled to the secondary data bus 130, while the
message descriptor FIFO 138 is coupled to the secondary
data bus 130.

CA 02358807 2001-10-11
-30-
U. Primary Memory_Arrav
Fig. 5 provides a simplified block diagram of the
preferred architecture of a memory card 18. Each
memory card 18 operates as a slave on the enhanced VME
bus and therefore requires no on-board CPU. Rather, a
timing control block 150 is sufficient to provide the
necessary slave control operations. In particular, the
timing control block 150, in response to control
signals from the control portion of the enhanced VME
bus 22 enables a 32-bit wide buffer 152 for an
appropriate direction transfer of 32-bit data between
the enhanced VME bus 22 and a multiplexer unit 154.
The multiplexer 154 provides a multiplexing and
demultiplexing function, depending on data transfer
direction, for a six megabit by seventy-two bit word
memory array 156. An error correction code (ECC)
generation and testing unit 158 is coupled to the
multiplexer 154 to generate or verify, again depending
on transfer direction, eight bits of ECC data per
memory array word. The status of each ECC verification
operation is provided back to the timing control block
150.

CA 02358807 2001-10-11
-31-
E Host Processor
The host processor 20, as shown in Fig. 1, is a
conventional Sun 3E120 processor. Due to the
conventional design of this product, a software
emulation of a message descriptor FIFO is performed in
a reserved portion of the local host processor's shared
memory space. This software message descriptor FIFO is
intended to provide the functionality of the message
descriptor FIFOs 72, 98, and 138. A preferred
embodiment of the present invention includes a local
host processor 20', not shown, that includes a hardware
preemptive write message descriptor FIFO, but that is
otherwise functionally equivalent to the processor 20.
TTT peer-level Facility Architecture
A. Peer-Level Facility Functions
Fig. 6 provides an illustration of the multiple
peer-level facility architecture of the present
invention. However, only single instantiations of the
preferred set of the peer-level facilities are shown
for purposes of clarity.
The peer-level facilities include the network
communications facility (NC) 162, file system facility
(FS) 164, storage facility (S) 166 and host facility
(H) 168. For completeness, the memory 18 is

CA 02358807 2001-10-11
-32-
illustrated as a logical resource 18' and, similarly,
the disk array 24 as a resource 24'.
The network communications facility I62 includes a
messaging kernel layer 178 and an NFS stack. The
messaging kernel layer 178 includes a multi-tasking
kernel that supports multiple processes. Logically
concurrent executions of the code making up the NFS
stack are supported by reference to the process
context in which execution by the peer-level processor
is performed. Each process is uniquely identified by a
process ID (PID). Context execution switches by the
peer-level processor are controlled by a process
scheduler embedded in the facility's multi-tasking
kernel. A process may be "active" -- at a minimum,
where process execution by the peer-level processor
continues until a resource or condition required for
continued execution is unavailable. A process is
"blocked" when waiting for notice of availability of
such resource or condition. For the network:
communications facility 162, within the general context
of the present invention, the primary source of process
blocking is in the network and lower layers where a NC
process will wait, executing briefly upon receipt of
each of a series of packet frames, until sufficient
packet frames are received to be assembled into a
complete datagram transferrable to a higher level

CA 02358807 2001-10-11
-33-
layer. At the opposite extreme, a NC process will
block upon requesting a file system or local host
function to be performed, i.e., any function controlled
or implemented by another peer-level facility.
The messaging kernel layer 178, like all of the
messaging kernel layers of the present invention,
allocates processes to handle respective communication
transactions. In allocating a process, the messaging
kernel layer 178 transfers a previously blocked
process, from a queue of such processes, to a queue of
active processes scheduled for execution by the multi-
tasking kernel. At the conclusion of a communication
transaction, a process is deallocated by returning the
process to the queue of blocked processes.
As a new communication transaction is initiated,
an address or process ID of an allocated process
becomes the distinguishing datum by which the
subsequent transactions are correlated to the
relevant, i.e., proper handling, process. For example,
where a client workstation initiates a new
communication transaction, it provides its Ethernet
address. The network communication facility, will
store and subsequently, in responding to the request,
utilize the client's Ethernet address to direct the
response back to the specific requesting client.

CA 02358807 2001-10-11
-34-
The NC facility similarly provides a unique
facility ID and the PID of its relevant process to
another peer-level facility as part of any request
necessary to complete a client's request. Thus, an NC
facility process may block with certainty that the
responding peer-level facility can direct its response
back to the relevant process of the network
communications peer-level facility.
The network and lower level layers of the NFS
stack necessary to support the logical Ethernet
connections 26' are generally illustrated together as
an IP layer 172 and data link layer 170. The IP layer
172, coupled to the IP route database 174, is used to
initially distinguish between NFS and non-NFS client
requests. NFS requests are communicated to an NFS
server 176 that includes the remaining layers of the
NFS stack. The NFS server 176, in turn, communicates
NFS requests to the network communications messaging
kernel layer 178. By the nature of the call, the
messaging kernel layer 178 is able to discern between
NFS request calls, non-NFS calls from the IP layer I72
and network calls received directly from the network
layers 170.
For the specific instance of NFS requests, making
up the large majority of requests handled by the
network communications facility I62, the relevant NC

CA 02358807 2001-10-11
-35-
process calls the messaging kernel layer 178 to 'issue
a corresponding message to the messaging kernel layer
180 of the file system facility 164. The relevant NC
process is blocked pending a reply message and,
possibly, a data transfer. That is, when the messaging
kernel layer 178 receives the NFS request call, a
specific inter-facility message is prepared and passed
to the messaging kernel layer 180 with sufficient
information to identify the request and the facility
that sourced the request. As illustrated, messages are
exchanged between the various messaging kernel layers
of the system 160. However, the messages are in fact
transferred physically via the enhanced VME bus
connecting the peer-level processors upon which the
specific peer-level facilities are executing. The
physical to logical relationship of peer-level
facilities to peerlevel processors is established upon
the initialization of the system 160 by providing each
of the messaging kernel layers with the relevant
message descriptor FIFO addresses of the peer-level
processors.
In response to a message received, the messaging
kernel layer 180 allocates a FS process within its
multi-tasking environment to handle the communication
transaction. This active FS process is used to call,
carrying with it the received message contents, a local

CA 02358807 2001-10-11
-36-
file system (LFS) server 182. This LFS server 182 is,
in essence, an unmodified instantiation 184 of the
UFS. Calls, in turn, issued by this UFS 182,
ultimately intended for a device driver of a mass
storage device, are directed back to the messaging
kernel layer 180. The messaging kernel layer
distinguishes such device driver related functions
being requested by the nature of the function call.
The messaging kernel layer 180 blocks the relevant FS
process while another inter-processor message is
prepared and passed to a messaging kernel layer 186 of
the storage facility 166.
Since the storage facility 166 is also required to
track many requests at any one time, a single manager
process is used to receive messages. For throughput
efficiency, this S manager process responds to FIFO
interrupts, indicating that a corresponding message
descriptor has just been written to the SP FIFO, and
immediately initiates the SP processor operation
necessary to respond to the request. Thus, the
currently preferred S facility handles messages at
interrupt time and not in the context of separately
allocated processes. However, the messaging kernel
layer 186 could alternately allocate an S worker
process to service each received message request.

CA 02358807 2001-10-11
-37-
The message provided from the file syste~~ facility
164 includes the necessary information to specify the
particular function required of the storage facility in
order to satisfy the request. Within the context of
the allocated active S process, the messaging kernel
layer 186 calls the request corresponding function of a
device driver 188.
Depending on the availability and nature of the
resource requested, the device driver 188 will, for
example, direct the requested data to be retrieved from
the disk array resource 24'. As data is returned via
the device driver layer 188, the relevant S process of
the messaging kernel layer 186 directs the transfer of
the data into the memory resource 18'.
In accordance with the preferred embo~iments of
the present invention, the substantial bulk of the
mea.ory resource 18' is managed as an exclusive resource
of the file system facility 164. Thus, fo= messages
requesting the transfer of data to or fro~, the disk
array 24', the file system facility 164 provides an
appropriate shared memory address referencing a
suitably allocated portion of the memory resource 18'.
Thus, as data is retrieved from the disk array 24', the
relevant S process of the messaging kernel layer 186
will direct the transfer of data from the device driver
layer 188 to the message designated location within the

CA 02358807 2001-10-11
-38-
memory resource 18', as illustrated by the data path
190.
Once the data transfer is complete, the relevant S
process "returns" to the messaging kernel layer 186 and
S a reply message is prepared and issued by the messaging
kernel layer 186 to the messaging kernel layer 180.
The relevant S process may then be deallocated by the
messaging kernel layer 186.
In response to this reply message, the messaging
kernel layer 180 unblocks its relevant FS process,
i.e., the process that requested the S facility data
transfer. This, in turn, results in the relevant FS
process executing the UFS 182 and eventually issuing a
return to the messaging kernel layer .180 indicating
that the requested function has been completed. In
response, the messaging kernel layer 180 prepares and
issues a reply message on behalf of the relevant FS
process to the messaging ka~nel layer 178; this message
will include the shared memory address of the requested
data as stored within the memory resource 18'.
The messaging kernel layer 178 responds to the
reply message from the file system facility 164 by
unblocking the relevant NC process. Within that NC
process's context, the messaging kernel layer 178
performs a return to the NFS server 176 with the shared
memory address. The messaging kernel layer 178

CA 02358807 2001-10-11
-39-
transfers the data from the memory resource 18~ via the
indicated data path 192 to local stored memory for use
by the NFS server layer 176. The data may then be
processed through the NFS server layer 176, IP layer
172 and the network and lower layers 170 into packets
for provision onto the network 26' and directed to the
originally requesting client workstation.
Similarly , where data is received via the network
layer 170 as part of an NFS write transfer, the data is
buffered and processed through the NFS server layer
176. When complete, a call by the NFS server 176 to
the messaging kernel layer 178 results in the first
message of an inter-facility communication transaction
being issued to the file system facility 164. The
Z5 messaging kernel layer 180, on assigning a FS process
to handle the request message, replies to the relevant
NC process of the messaging kernel layer 178 with an
inter-facility message containing a shared memory
address within the memory resource 18'. The NFS data
is then transferred from local shared memory via the
data path 192 by the messaging kernel 178. When this
data transfer is complete, another inter-facility
message is passed to the relevant FS process of the
messaging kernel layer 180. That process is then
unblocked and processes the data transfer request
through the LFS/UFS 182. The UFS 182, in turn,

CA 02358807 2001-10-11
-40-
initiates, as needed, inter-facility communication
transactions through the messaging kernel layer 180 to
prepare for and ultimately transfer the data from the
memory resource 18' via the data path 190 and device
driver 188 to the disk array resource 24'.
The host operating system facility 168 is a
substantially complete implementation of the SunOS
operating system including a TCP/IP and NFS stack. A
messaging kernel layer 194, not unlike the messaging
kernel layers I78, 180, 186 is provided to logically
integrate the host facility 186 into the system 160.
The operating system kernel portion of the facility 168
includes the VFS 196 and a standard instantiation of
the UFS 198. The UFS 198 is, in turn, coupled to a
mass storage device driver 200 that, in normal
operation, provides for the support of UFS 198 requests
by calling the messaging kernel layer 194 to issue
inter-facility messages to the storage facility 166.
Thus, the storage facility 166 does not functionally
differentiate between the local host facility 168 and
the file system facility 164 except during the initial
phase of bootup. Rather, both generally appear as
unique but otherwise undifferentiated logical clients
of the storage facility 166.
Also interfaced to the VFS 196 is a conventional
client instantiation of an NFS layer 202. That is,

CA 02358807 2001-10-11
-41-
the NFS layer 202 is oriented as a client for
processing client requests directed to another file
server connected through a network communications
facility. These requests are handled via a TCP/UDP
layer 204 of a largely conventional instantiation of
the Sun NFS client stack. Connected to the layer 204
are the IP and data link layers 206. The IP and data
link layers 206 are modified to communicate directly
with the messaging kernel layer 194. Messages from the
messaging kernel layer 194, initiated in response to
calls directly from the data link layer 206 are
logically directed by the messaging kernel 178 directly
to the data link layer 170 of a network communications
facility. Similarly, calls from the IP layer 172,
recognized as not NFS requests of a local file system,
are passed through the messaging kernel layers 178 and
194 directly to the TCP/UDP layers 204. In accordance
with the preferred embodiments of the present
invention, the responses by the host facility 168 in
such circumstances are processed back through the
entire host TCP/IP stack 214, 204, 206, the messaging
kernel layers 194, 178, and finally the data link layer
170 of an NC facility 162.
Ancillary to the IP and data link layers 205, a
route database 208 is maintained under the control and
direction of a conventional "routed" daemon

CA 02358807 2001-10-11
-42-
application. This, and related daemons such as the
"mountd", execute in the application program layer as
background processes. In order to maintain coherency
between the route database 208 and the route database
174 present in the network communications facility 162,
a system call layer 212, provided as the interface
between the application program layer and the kernel
functions of the host facility 168, is modified in
accordance with the present invention. The
modification provides for the issuance of a message
containing any update information directed to the route
database 208, from the daemons, to be provided by an
inter-facility communication transaction from the
messaging kernel layer 194 to the messaging kernel
layer 178. Upon receipt of such a message, the
messaging kernel layer 178 directs an appropriate
update to the route database 174.
The system call layer 212 also provides for access
to the TCP/UDP layers via a conventional interface
layer 214 known as. sockets. Low level application
programs may use the system call layer 212 to directly
access the data storage system by calling directly on
the device driver 200. The system call layer also
interfaces with the VFS 196 for access to or by the NFS
client 202 and the UFS 198.

CA 02358807 2001-10-11
-43-
In addition, as provided by the preferred
embodiments of the present invention, the VFS 196 also
interfaces to a local file system (LFS) client layer
216. The conventional VFS 196 implements a "mount"
model for handling the logical relation between and
access to multiple file systems. By this model a file
system is mounted with respect to a specific file
system layer that interfaces with the VFS 196. The
file system is assigned a file system ID (FSID). File
operations subsequently requested of the VFS 196 with
regard to a FSID identified file system will be
directed to the appropriate file system.
In accordance with the present invention, the LFS
client layer 216 is utilized in the logical mounting of
file systems mounted through the file system facility
164. That is, the host facility's file oriented
requests presented to the VFS 196 are routed, based on
. their FSID, through the LFS client layer 216 to the
messaging kernel layer 194, and, in turn, to the
messaging kernel layer 180 of the file system facility
164 for servicing by the UFS 182. The model is
extended for handling network file system requests. A
client workstation may then issue a mount request for
a file system previously exported through the VFS 196.
The mount request is forwarded by a network
communications facility 162 ultimately to a mounted

CA 02358807 2001-10-11
-44-
daemon running in the application layer 210 of the host
facility 194. The mounted daemon response in turn
provides the client with the FSID of the file system if
the export is successful. Thereafter, the client's NFS
file system requests received by~ the network
communications facility 162 will be redirected, based
on the FSID provided with the request, to the
appropriate file system facility 164 that has mounted
the requested file system.
Consequently, once a file system is mounted by the
UFS 182 and exported via the network communications and
host facilities 162, 168, file oriented NFS requests
for that file system need not be passed to or
processed by the host facility 168. Rather, such NFS
requests are expediently routed directly to the
appropriate file system facility 164.
The primary benefits of the present invention
should now be apparent. In addition to allowing
multiple, independent instantiations of the network
communication, file system, storage and host
facilities 162, 164, 166, 168, the immediate
requirements for all NFS requests may be serviced
without involving the substantial performance overhead
of the VFS 196 and higher level portions of the
conventional Unix operating system kernel.

CA 02358807 2001-10-11
-45-
w Finally, another aspect of the host facility 168
is the provision for direct access to the messaging
kernel layer 194 or via the system call layer 212 as
appropriate, by maintenance application programs when
executed within the application program layer 210.
These maintenance programs may be utilized to collect
performance data from status accumulation data
structures maintained by the messaging kernel layer 194
and, by utilizing corresponding inter-facility
messages, the accumulated status information from
status data structures in the messaging kernel layers
178, 180 and 186.
B. MessaQinaKernel Laver Function
The messaging kernel layers 178, 180, 18o and 194
each include a small, efficient multi-tasking kernel.
As such, it provides only fundamental operating system
kernel services. These services include simple
lightweight process scheduling, message passing and
memory allocation. A library of standard functions and
processes provide services such as sleep(), wakeup(),
error logging, and real time clocks in a manner
substantially similar to those functions of a
conventional Unix kernel.
The list below summarizes the primary function

CA 02358807 2001-10-11
-46-
primitives of 'the multi-tasking kernel provided in each -
of the messaging kernel layers 178, 180, 186 and 194.
k_register(name) Registers the current process
as a provider of a named
service.
k_resolve(name) Returns the process ID for a
named service.
k_send(msg,pid) Sends a message to a specified
process and blocks until the
message is returned.
k-reply(msg) Returns a received messages to
its sender.
k null reply(msg) Returns an unmodified message
to the sender. (Faster than
k-reply(msg) because the
message need not be copied
back.)
k-receive() Blocks until a message is sent
to this process.
The balance of the messaging kernel layers 178,
180, 186 and 194 is made up of routines that
presumptively implement, at least from the perspective
of the balance of the facility, the functions that a
given facility might request of another. These
routines are premised on the function primitives
provided by the multi-tasking kernel to provide the
specific interface functions necessary to support the
NFS stack, UFS, storage device driver, or host
operating system. Since such routines do not actually
perform the functions for which they are called, they
may be referred to as "stub routines".

CA 02358807 2001-10-11
-47-
C. Inter-Fa i?i~v Co~ni_c3tion (IFCI ~y~ m
Communication c= information betweea the peer-
level facilities i~ performed as a series of
communication transactions. A transaction, defined as
a request message and a reply message, occurs between a
pair of messaging kernel layers, though others may
"listan" in order tv gather performance data or parform
3iagnostics. A single transaction may be suspended,
i.e., the reply message held, wh.Lle the receiving
messaging kernel layer initiates a separate
communication transaction with another peer-level
facility. Once the reply message of the second
traneactiov is received, a properly reply to the
5 initial communi~.~tion transaction can then be made.
1. Messa_:,~_Oescriprors and Me~ca,gg~
~_'he transfer of a messa~~e between sending a:~d
receiving messaging kernel layers is, i~~ turn,
generally a two step process. The first step ie for
the sending messaging kernel layer to Write a massage
descriptor to the receiving messaging kernel layer.
This is accompliahe3 by the message descriptor being
written to the descriptor FIFO of the receiving peer-
level processor.

CA 02358807 2001-10-11
-48-
The second step is for the message, as identified
by the message descriptor, to be copied, either
actually or implicitly, from the sending messaging
kernel layer to the receiving messaging kernel layer.
This copy, when actually performed, is a memory to
memory copy from the shared memory space of the sending
peer-level processor to that of the receiving peer-
level processor. Depending on the nature of the
communication transaction, the message copy will be
actually performed by the sending or receiving peer-
level processor, or implicitly by reference to the
image of the original message kept by the messaging
kernel layer that initiated a particular communication
transaction.
The message identified by a message descriptor is
evaluated by the receiving messaging kernel layer to
determine what is to be done with the message. A
message descriptor as used by a preferred embodiment
of the present invention is shown in Fig. 7. The
message descriptor is, in essence, a single 32-bit word
partitioned into two fields. The least significant
field is used to store a descriptor modifier, while the
high order 30-bit field provides a shared memory
address to a message. The preferred values of the
modifier field are given in Table 1.

CA 02358807 2001-10-11
-49-
Table 1
Message Modifiers
Modifier Meaning
0 Pointer to a message being sent
1 Pointer to a reply message
2 Pointer to message to be forwarded
3 Pointer to message
acknowledging a forwarded message
For request messages that are being sent, the
receiving messaging kernel layer performs the message
copy. For a message that is a reply to a prior
message, the sending messaging kernel layer is
effectively told whether a message copy must be
performed. That is, where the contents of a message
have not been changed by the receiving messaging kernel
layer, an implicit copy may be performed by replying
with a messaging descriptor that points to the
original message image within the sending messaging
kernel layer's local shared memory space. Similarly
for forwarding type communication transactions the
receiving messaging kernel layer performs the copy. A
message forwarding transaction is completed when an
acknowledgement message is provided. The purpose of
the acknowledgement is to notify the sending messaging
kernel layer to know that it can return the reference
message buffer to its free buffer pool.
The preferred block format of a message is
illustrated in Fig. 8. The message is a single data

CA 02358807 2001-10-11
-5p_
structure defined to occupy 128 bytes. The initial
32-bit word of the message encodes the message type and
a unique peer-level facility identifier. The text of
the message then follows with any necessary fill to
S reach a current maximum text limit. In the preferred
embodiment of the present invention, the text length is
84 bytes. An inter-facility communication (IFC)
control data block is provided, again followed by any
necessary fill characters needed to complete the 128-
byte long message. This IFC control data preferably
includes a copy of the address of the original message,
the relevant sending and receiving (destination)
process identifiers associated with the current
message, and any queue links required to manage the
structure while in memory.
An exemplary message structure is provided in
Table 2.

CA 02358807 2001-10-11
-S1-
Table 2
Exemplary Message Structure
S typedef struct m16 msg
K_MSGTYPE type; /* request code */
char msg[84J;
vme_t addr; /* shared memory address of
the original message */
IO PID m16_sender~id; /* PID of last sender. */
PID m16_forward~id;/* PID of last forwarder. */
PID m16_dest_pid; /* PID of lest. process. */
/* Following value is LOCAL and need
not be transferred. */
15 struct m16 msg *m16-link; /* message queue
link */
} K MSG;
20 This structure (K MSG) includes the message type
field (R MSGTYPE), the message text (msg[]), and the
IFC block (addr, m16_sender pid, m16_sender~id,
m16 dest pid, and m16-link). This K MSG structure is
used to encapsulate specific messages, such as
25 exemplified by a file system facility message structure
(FS STD T) shown in Table 3.

CA 02358807 2001-10-11
-52-
Table 3
Exemplary Specific Message Structure
typedef struct {
K_MSGTYPE type;
long errno;
FC_CRED cred; /* Access credentials*/
FC_FH file; /* File handle */
union {
FS fsid; /* Far fc_get_server.*/
FSID
_ mode; /* {READ,WRITE,EXEC} for
long
fc _access. */
R pid; /* FS facility serverpid
PID */
_ mask; /* Mask attributes. /
long *
} un;
} FS STD T;
The FS STD T structure is overlaid onto a K MSG
structure with byte zero of both structures aligned.
This composite message structure is created as part of
the formatting of a message prior to being sent. Other
message structures, appropriate for particular message
circumstances, may be used. However, all are
consistent with the use of the R-MSG message and block
format described above.
2. IFC Message Generation
The determination to send a message, and the
nature of the message, is determined by the peer-level
facilities. In particular, when a process executing on
a peer-level processor requires the support of another
peer-level facility, such as to store or retrieve data
or to handle some condition that it alone canno~

CA 02358807 2001-10-11
-53-
service, the peer-level facility issues a message
requesting the required function or support. This
message, in accordance with the present invention, is
generally initiated in response to the same function
call that the facility would make in a uniprocessor
configuration of the prior art. That is, in a
conventional single processor software system,
execution of a desired function may be achieved by
calling an appropriate routine, that, in turn,
determines and calls its own service routines. This is
illustrated in Fig. 9. A function call to a routine A,
illustrated by the arrow 300, may select and call 302 a
routine B. As may be necessary to carry out its
function, the routine B may call 304 still further
routines. Ultimately, any functions called by the
routine B return to the function B which returns to
the function A. The function A then itself returns
' with the requested function call having been completed.
In accordance with the present invention, the
various messaging kernels layers have been provided to
allow the indApPwden~ peer-level facilities to be
executed on respective processors. This is generally
illustrated in Fig. 10 by the inclusion of the
functions A' and B' representing the messaging kernel
layers of two pee.-level facilities. A function call
302 from the routine h is made to the messaging kernel

CA 02358807 2001-10-11
-54-
A'. Although A' does not implement the specific
function called, a stub routine is provided to allow
the messaging kernel layer A' to implicitly identify
function requested by the routine A and to receive any
associated function call data; the data being needed by
the routine H to actually carry out the requested
function. The messaging kernel layer A' prepares a
message containing the call data and sends a message
descriptor 306 to the appropriate messaging kernel
layer B'. Assuming that the message is initiating a
new communication transaction, the messaging kernel
layer B' copies the message to its own shared memory.
Based on the message type, the messaging kernel B'
identifies the specific function routine B that needs
to be called. Utilizing one of its own stub routines,
a call containing the data transferred by the message
is then made to the routine B. When routine B returns
to the stub process from which it was called, the
messaging kernel layer B' will prepare an appropriate
reply message to the messaging kernel layer A'. The
routine B return may reference data, such as the status
of the returning function, that must also be
transferred to the messaging kernel layer A'. This
data is copied into the message before the message is
copied back to the shared memory space of the A' peer-
level processor. The message copy is made to the

CA 02358807 2001-10-11
-55-
shared memory location where the original message was
stored on the A' peer-level processor. Thus, the image
of the original message is logically updated, yet
without requiring interaction between the two messaging
kernel layers to identify a destination storage
location for the reply message. A "reply" message
descriptor pointing to the message is then sent to the
messaging kernel layer A'.
The messaging kernel layer A', upon successive
evaluation of the message descriptor and the message
type field of the message, is able to identify the
particular process that resulted in the reply message
now received. That is, the process ID as provided in
the original message sent and now returned in the reply
message, is read. The messaging kernel layer A' is
therefore able to return with any applicable reply
message data to the calling routine A in the relevant
process context.
A more robust illustration of the relation between
two messaging kernel layers is provided in Fig. 11. A
first messaging kernel layer 310 may, for example,
represent the messaging kernel layer 178 of the
network communications peer-level facility 162. In
such case, the series of stub routines A1-X include a
complete NFS stack interface as well as an interface to
every other function of the network communications

CA 02358807 2001-10-11
-56-
facility that either can directly call or be called by
the messaging kernel layer 178. Consequently, each
call to the messaging kernel layer is uniquely
identifiable, both in type of function requested as
well as the context of the process that makes the call.
Where the messaging kernel layer calls a function
implemented by the NFS stack of its network
communications facility, a process is allocated to
allow the call to operate in a unique context. Thus,
the call to or by a stub routine is identifiable by the
process ID, PID, o~ the calling or responding process,
respectively.
The calling process to any of the stub routines
A1-X, upon making the call, begins executing in the
messaging kernel layer. This execution services the
call by receiving the function call data and preparing
a corresponding message. This is shown, for purposes
of illustrating the logical process, as handled by the
logical call format bubbles A1-~. A message buffer is
allocated and attached to a message queue. Depending
on the particular stub routine called; the contents of
the message may contain different data defined by
different specific message data structures. That is,
each message is formatted by the appropriate call
format bubble A1-X, using the function call data and
the PID of the calling process.

CA 02358807 2001-10-11
-57-
The message is then logically passed to an A
message state machine for sending. The A message state
machine initiates a message transfer by first issuing a
message descriptor identifying the location of the
message and indicating, for example, that it is a new
message being sent.
The destination of the message descriptor is the
shared memory address of the message descriptor FIFO as
present on the intended destination peer-level
processor. The specific message descriptor FIFO is
effectively selected based on the stub routine called
and the data provided with the call. That is, for
example, the messaging kernel layer 178 correlates the
FSID provided with the call to the particular file
system facility 164 that has mounted that particular
file system. If the messaging kernel layer 178 is
unable to correlate a FSID with a file system facility
164, as a consequence of a failure to export or mount
the file system, the NFS request is returned to the
client with an error.
Once the message descriptor is passed to the
messaging kernel layer 312 of an appropriate peer-level
facility, the multi-tasking kernel of the messaging
kernel layer 310 blocks the sending process until a
reply message has been received. Meanwhile, the
multi-tasking of the layer 310 kernel continues to

CA 02358807 2001-10-11
-58-
handle incoming messages, initiated by reading message
descriptors from its descriptor FIFO, and requests for
messages to be sent based on calls received through the
stub routines A1-X.
The messaging kernel layer 312 is similar to the
messaging kernel layer 310, though the implementation
of the layer specifically with regard to its call
format, return format, and stub routines B1-X differ
from their A layer counterparts. Where, for example,
the messaging kernel layer 312 is the messaging kernel
layer 180 of the file system facility 164, the stub
routines B1-X match the functions of the UFS 182 and
device driver 188 that may be directly called in
response to a message from another facility or that
may receive a function call intended for another
facility. Accordingly, the preparation and handling of
messages, as represented by the B message parser, call
format and return format bubbles, will be tailored to
the file system facility. Beyond this difference, the
messaging kernel layers 310, 312 are identical.
The B message state machine implemented by the
multi-tasking kernel of the messaging kernel layer 312
receives a message descriptor as a consequence of the
peer-level processor reading the message descriptor
from its message descriptor FIFO. Where the message
descriptor is initiating a new message transaction,

CA 02358807 2001-10-11
-59-
i.e., the message modifier is zero or two, the B
message state machine undertakes to copy the message
pointed to by the message descriptor into a newly
allocated message buffer in the local shared memory of
its peer-level processor. If the message modifier
indicates that the message is a reply to an existing
message transaction, then the B message state machine
assumes that the message has already been copied to the
previously allocated buffer identified by the message
descriptor. Finally, if the message descriptor
modifier indicates that the message pointed to by the
message is to be freed, the B message state machine
returns it to the B multi-tasking kernel's free
message buffer pool.
Received messages are initially examined to
determine their message type. This step is illustrated
by the B message parser bubble. Based on message type,
a corresponding data structure is selected by which the
message can be properly read. The process ID of the
relevant servicing destination process is also read
from the message and a context switch is made. The
detailed reading of the message is illustrated as a
series of return format bubbles B1-X. Upon reading the
message, the messaging kernel layer 312 selects a stub
routine, appropriate to carry out the function
requested by the received message and performs a

CA 02358807 2001-10-11
-60-
function call through the stub routine. Also, in
making the function call, the data contained by the
message is formatted as appropriate for transfer to the
called routine.
IFC Communication Transactions
Figure 12 illustrates an exemplary series of
communication transactions that are used for a network
communications facility or a local host facility to
obtain known data from the disk array 24 of the present
invention. Similar series of communication
transactions are used to read directory and other disk
management data from the disk array. For clarity, the
transfer of messages are referenced to time, though
time is not to scale. Also for purposes of clarity, a
pseudo-representation of the message structures is
referenced in describing the various aspects of
preparing messages.
a. LFS Read Transaction
At a time t2, an NFS read request is received by
the messaging kernel layer 178 of the network
communications facility 162 from an executing (sending)
process (PID=A$$). Alternately, the read request at t2
could be from a host process issuing an equivalent LFS
read request. In either case, a corresponding LFS

CA 02358807 2001-10-11
-61-
m a s s a g a ( m a s s a g a # 1 ) i s p r a p a r a d
(message#l.msg-type=fc-read; message#l. sender pid=A$$;
message#l.dest_pid=BSS).
The destination process (PID=BS$) is known to the
messaging kernel layer 178 or ,194 as the "manager"
process of the file system facility that has mounted
the file system identified by the FSID provided with
the read request. The association of an FSID with a
particular FS facility's PID is a product of the
initialization of all of the messaging kernel layers.
In general, at least one "manager" process is
c.~eated during initialization of each messaging kernel .
layer. These "manager" processes, directly or
indirectly, register with a "name server manager"
process (SC NAME-SERVER) running on the host facility.
Subsequently, other "manager" processes can query the
"name server manager" to obtain the PID of another
' "manager" process. For indirect relations, the
supervising "manager" process, itself registered with
the "name server manager" process, can be queried for
the:PIDs of the "manager" processes that it supervises.
For example, a single named "file system
administrator" (FC VICE_PRES) process is utilized to
supervise the potentially multiple FS facilities in the
system 160. The FC VICE_PRES process is registered
directly with the ~name server manager"

CA 02358807 2001-10-11
-62-
(SC_NAME-SERVER) process. The "manager" processes of
the respective FS facilities register with the "file
system administrator" (FC VICE-PRES) process -- and
thus are indirectly known to the "name server manager"
(SC-NAME-SERVER). The individual FS "manager"
processes register with the given FSIDs of their
mounted file systems. Thus, the "name server manager"
(SC NAME SERVER) can be queried by an NC facility for
the PID of the named "file system administrator"
(FC VICE PRES). The NC facility can then query for
the PID of the unnamed "manager" process that controls
access. to the file system identified by a FSID.
The function of a non-supervising "manager"
process is to be the known destination of a message.
Thus, such a "manager" process initially handles the
messages received in a communication transaction. Each
message is assigned to an appropriate local worker
process for handling. Consequently, the various
facilities need know only the PID of the "manager"
process of another facility, not the PID of the worker
process, in order to send a request message.
At t3, a corresponding message descriptor
(md##lvme_addr; mod=0), shown as a dashed arrow, is sent
to the FS's messaging kernel layer 180.
At t4, the FS messaging kernel layer 180 copies
down the message (message#1), shown as a solid arrow,

CA 02358807 2001-10-11
-63-
for evaluation, allocates a worker process to handle
the request and, in the context of the worker process,
calls the requested function of its UFS 182. If the
required data is already present in the memory resource
18', no communication transaction with the S messaging
kernel layer 186 is required, and the FS messaging
kernel layer 180 continues immediately at t14'
However, if a disk read is required, the messaging
kernel layer 180 is directed by the UFS 182 to initiate
another communications transaction to request retrieval
of the data by the storage facility 166. That is, the
t3FS 182 calls a storage device driver stub routine of
the messaging kernel layer 180. A message (message#2),
including a vector address referencing a buffer
location in the memory resource 18'
(message#2.msg_type=sp-read; message#2.vme addr=xxxxh;
message#2.sender~id=B$$; message#2.dest~pid=C$$), is
prepared. At t5, a corresponding message descriptor is
sent (md#2vme_addr; mod=0) to the S messaging kernel
layer 186.
At t6, i:he S messaging kernel layer 186. copies
down the message (message#2) for evaluation, allocates
a worker process to handle the request and calls the
requested function of its device driver 188 in the
context of the worker process. Between t~ and tll, the
requested data is transferred to the message specified

CA 02358807 2001-10-11
-64-
location (message#2.vme_addr=xxxxh) in the memory
resource 18'. When complete, the device driver returns
to the calling stub routine of the S messaging kernel
layer 186 with, for example, the successful (err=0) or
unsuccessful (err=-1) status of the data transfer.
Where there is an error, the message is updated
(message#2.err=-1) and, at t12, copied up to the
messaging kernel layer 180 (md#2vme-addr). A reply
message descriptor (md#2vme-addr; mod=1) is then sent
at t13 to the FC messaging kernel layer 180. However,
where there is no error, a k_null-reply(msg) is used.
This results in no copy of the unmodified message at
t12, but rather just the sending of the reply message
descriptor (md#2vme addr; mod=1) at t13'
Upon processing the message descriptor and reply
message (message#2), the FS messaging kernel layer 180
unblocks and returns to the calling process of the UFS
182 (message#2.sender_pid=B$$). After completing any
processing that may be required, including any
additional communication transactions with the storage
facility that might be required to support or complete
the data transfer, the UFS 182 returns to the stub
routine that earlier called the UFS 182. The message
is updated with status and the data location in the
memory resource 18' (message#l. err=0; message
#2.vme-addr=xxxxh=message#l.vme-addr=xxxxh) and, at

CA 02358807 2001-10-11
-65-
t14, copied up to the messaging kernel layer 178 or 194
(md#lvme-addr). A reply message descriptor
(md#lvme_addr; mod=1) is then sent at t15 to the
messaging kernel layer of the NC or local host, as
appropriate.
The messaging kernel layer 178 or 196 processes
the reply message descriptor and associated message.
As indicated between t16 and t19, the messaging kernel
layer 178 or 196, in the context of the requesting
process (PID=A$$), is responsible for copying the
requested data from the memory resource 18' into its
peer-level processor's local shared memory. Once
completed, the messaging kernel layer 178 or 196
prepares a final message (message#3) to conclude its
series of communication transactions with the FS
messaging kernel layer 180. This message is the same
as the first message (message#3=message#1), though
updated by the FS facility as to message type
(message#3.msg-type=fc_read_release) to notify the FC
facility that it no longer requires the requested data
space (message~3.vme~addr=xxxxh) to be held. In this
manner, the FC facility can maintain its expedient,
centralized control over the memory resource 18'. A
c o r r a s p o n d i n g m a s s a g a d a s c r i p t o r
(md#3vme addr=md##lvme-addr; mod=0) is sent at t20'

CA 02358807 2001-10-11
-66-
At t21, the release message (message#3) is copied
down by the FC messaging kernel layer. 180, and the
appropriate disk buffer management function of the UFS
182 is called, within the context of a worker process
o f t h a r a 1 a v a n t m a n a g a r p r o c a s s
(message#3.dest-pid=B$$), to release the buffer memory
(message#3.vme_addr=xxxxh). Upon completion of the UFS
memory management routine, the relevant worker process
returns to the stub routine of the FS messaging kernel
layer 180. The worker process and the message
(message#3) are deallocated with respect to the FS
facility and a reply message descriptor (md#3vme addr;
mod=1) is returned to the messaging kernel layer 178 or
196, whichever is appropriate.
Finally, at t23, the messaging kernel layer 178 or
196 returns, within the context of the relevant process
(PID=A$$), to its calling routine. With this return,
the address of the retrieved data within the local
shared memory is provided. Thus, the relevant process
is able to immediately access the data as it requires.
b. LFS Write Transaction
Figure 13 illustrates an exemplary series of
communication transactions used to implement an LFS
write to disk.

CA 02358807 2001-10-11
-67-
Beginning at a time tl, an LFS write request is
received by the messaging kernel layer 178 of the
network communications facility 162 from an executing
process (PID=A$$) in response to an NFS write request.
Alternately, the LFS write request at tl could be from
a host process. In either case, a corresponding
m a s s a g a ( m a s s a g a # 1 ) i s p r a p a r a d
(message#l.msg_type=fc write; message#l.sender_pid=A$$;
message#l.dest pid=B$$) and, at t2, its message
descriptor (md#ivme-addr; mod=0) is sent to the FC
messaging kernel layer 180.
At t3, the FC messaging. kernel layer 180 copies
down the message (message#1) for evaluation, allocates
a worker process to handle the request by the manager
process (PID=B$$), which calls the requested function
of its UFS 182. This UFS function allocates a disk
buffer in the memory resource 18' and returns a vector
address (vme addr=xxxxh) referencing the buffer to the
FC messaging kernel layer 180. The message is again
updated (message#2.vme addr=xxxxh) and copied back to
the messaging kernel layer 178 or 194 (md#lvme-ac~.dr) .
A reply message descriptor (md#lvme-addr; mod=1) is
then sent back to the messaging kernel layer 178 or
194, at t5.
Between t6 and t9, the relevant process (PID=A$$)
of the NC or host facility copies data to the memory

CA 02358807 2001-10-11
-68-
resource 18'. When completed, the messaging kernel
layer 178 or 194 is again called, at t9, to complete
the write request. A new message (message~2=message#1)
is prepared, though updated with the amount of data
transferred to the memory resource 18' and message type
(message#2msg-type=fc write_release), thereby implying
that the FS facility will have control over the
disposition of the data. Preferable, this message
utilizes the available message buffer of message#1,
thereby obviating the need to allocate a new message
buffer or to copy data from message#1. The message
descriptor (md#2vme addr=and#lvme addr; mod=0) for this
message is sent at t10'
The message is copied down by the FC messaging
kernel layer 180 and provided to a worker process by
the relevant manager process (message#2.dest~id=B$$).
While a reply message descriptor might be provided back
to the messaging kernel layer 178 or 194 immediately,
at t12, thereby releasing the local shared memory
buffer, the present invention adopts the data coherency
strategy of NFS by requiring the data to be written,to
disk before acknowledgment. Thus, upon copying down
the message at tli, the messaging kernel layer 180
calls the UFS 182 to write the data to the disk array
24'. The UFS 182, within the context of the relevant
worker process, calls the messaging kernel layer 180 to

CA 02358807 2001-10-11
-69-
initiate another communication transaction to request
a write out of the data by the storage facility 166.
Thus, a storage device driver stub routine of the
messaging kernel layer 180 is called. A message
(message#3), including the shared memory address of a
buffer location in the memory resource 18'
( m a s s a g a # 3 . m s g - t y p a = s p _ w r i t a ;
message#2.vme-addr=xxxxh; message#2. sender~id=B$$;
message#2.dest_pid=CSS), is prepared. At t16, a
corresponding message descriptor is sent (md#3vme-addr;
mod=0) to the S messaging kernel layer 186.
At tl~, the S messaging kernel layer 186 copies
down the message (message#3) for evaluation, allocates
a worker process to handle the request by the manager
process (PID=C$$), which calls the requested function
of its device driver 188. Between t18 and t22, the
requested data is transferred from the message
specified location (message#3.vme addr=xxxxh) of the
memory resource 18'. When complete, the device driver
returns to the routine of the messaging
calling S
stub
kernel layer with, for example, the status
186 of the
data transfer (err=0). The message is updated
(messag en3.err=0)and, at t23, copied up to the
messagi ng kernellayer 180 (md#3vme_addr). A reply
message descriptor -addr; mod=1) then sent
(md#3vme is
at t24 to the messaging ernel layer 180.
FC k

CA 02358807 2001-10-11
-70-
Upon processing the message descriptor and reply
message (message#3), the FC messaging kernel layer 180
returns to the calling process of the UFS 182
(message#3.sender~id=B$$). After completing any UFS
processing that may be required, including any
additional communication transactions with the storage
facility that might be required to support or complete
the data transfer, the UFS 182 returns to the messaging
kernel layer 180. At this point, the UFS 182 has
completed its memory management of the memory resource
18'. At t25, the messaging kernel layer 180 sends the
reply message descriptor (md#2vme addr; mod=1) to the
messaging kernel layer 178 or 196, as appropriate, to
indicate that the data has been transferred to the
disk array resource 24'.
Finally, at t26, the messaging kernel layer 178 or
196 returns, within the context of the relevant worker
' process, to its calling routine.
c. NC~Local Host Transfer Transaction
Figure 14 illustrates the communication
transaction and delivery of data, as provided from a NC
facility process (PID=A$$), to an application program
executing in the application program layer of the local
host facility. The packet, for example, could contain
new routing information to be added to the route data

CA 02358807 2001-10-11
-71-
base. However, since the NC facility does not perform
any significant interpretation of non-NFS packets
beyond identification as an IP packet, the packet is
passed to the local host facility. The local host,
upon recognizing the nature of the non-NFS packet, will
pass it ultimately to the IP client, as identified by
the packet, for interpretation. In this example, the
IP client would be the "route" daemon.
Thus, the transaction begins at t2, with the NC
messaging kernel layer 178 writing a message descriptor
(md#l.vme addr; mod=0) to the host messaging kernel
layer 194. The referenced message
(message#l.msg_type=nc_recv-ip-pkt;
message#l.sender_pid=D$$; message#l.dest~id=E$$) is
copied down, at t3, by the host messaging kernel layer
194. A reply message descriptor (md#l.vme_addr; mod=3)
is then returned to the NC messaging kernel layer 178
at t4.
The packet is then passed, by the local host
messaging kernel layer 194, to the TCP/UDP layers 204
of the local host facility for processing and;
eventually, delivery to the appropriate application
program.
As shown at t14, the application program may
subsequently call the host messaging kernel layer 194,
either directly or indirectly through the system call

CA 02358807 2001-10-11
_72_
layer. This call could be, for example, issued as a
consequence of the application program making a system
call layer call to update the host's IP route database.
As described earlier, this call has been modified to
also call the host messaging kernel layer 194 to send a
message to the NC facility to similarly update its IP
route database. Thus, a message descriptor
(md#2.vme_addr; mod=0) is sent at t15 to the NC
messaging kernel layer 178. The referenced message
( m a s s a g a # 2 . m s g - t y p a = n c - a d d _ r o a t a ;
message#2.sender~id=E$$; message#l.dest_pid=D$$) is
copied up, at t18, by the NC messaging kernel layer
178. The NC messaging kernel layer 178 then calls the
NC facility function to update the IP route database.
Finally, a reply message descriptor (md#2.vme addr;
mod=1) is returned to the local host messaging kernel
layer 194 at t17.
d. NC/NC Route Transfer Transaction
Figure 15 illustrates the routing, or bridging, of
a data packet two NC facility processes. The two NC
processes may be executing on separate peer-level
processors, or exist as two parallel processes
executinc within the same NC facility. The packet,
for example, is intercepted at the IP layer within the
context o~ the first process (PID=A$$). The IP layer

CA 02358807 2001-10-11
-73-
identifies the logical NC facility that the packet is
to be routed to calls the messaging kernel layer 178 to
prepare an appropriate message (message#1). The data
packet itself is copied to a portion of the memory
resource 18' (vme addr=xxxxh) that is reserved for the
specific NC facility; this memory is not under the
control of any FS facility.
Thus, at t2, the NC messaging kernel layer 178
writes a message descriptor (md#l.vme addr; mod=0) to
the second messaging kernel layer 178. The referenced
message (message#l.msg-type=nc_forward-ip_pkt;
message#l.sender_~id=F$$; message#l.dest-pid=G$$;
m a s s a g a # 1 . v m a _ a d d r - x x x x h ;
message#l.ethernet dst_net=xx) is copied down, at t3,
by the second NC messaging kernel layer 178. The data
packet is then copied, between t4 and t8, from the
memory resource 18' to the local shared memory of the
second NC peer-level processor.
Since the first NC, facility must manage its
portion of the memory resource 18', the second NC
messaging kernel layer 178, at t~, returns a reply
message descriptor (md#l.vme addr; mod=1) back to the
first NC messaging kernel layer 178 at t9. This
notifies the firs NC facility that it no longer
requires the memory resource 18' data space
(message#l.vme-addr=xxxxh) to be held. In this manner,

CA 02358807 2001-10-11
-74-
the first NC facility can maintain expedient,
centralized control over its portion of the memory
resource 18'.
The packet data is then passed, by the second NC
S messaging kernel layer 178, to the IP layer of its NC
facility for processing.
4. Detailed Communication Transaction
Messages, Syntax, and Semantics
A Notation for Communication Transactions
A terse notation for use in describing
communication transactions has been developed. This
notation does not directly represent the code that
implements the transactions, but rather is utilized to
describe them. A example and explanation of the
notation is made in reference to a LFS type transaction
requesting the attributes of a given file.
The communication transaction:
fc-get_attributes( FILE,ATTRIBUTES );
identifies that a message with type FC GET ATTRIBUTES,
the expected format of the message, when sent to the
FS facility, for example, is a typedef FILE, and that
when the message is returned, its format is a typedef
ATTRIBUTES.

CA 02358807 2001-10-11
-75-
A second convention makes it very clear when the
FS facility, for example, returns the message in the
same format that it was originally sent. The
communication transaction:
get buffer( BUFFER,***);
describes a transaction in which the NC facility, for
example, sends a typedef BUFFER, and that the message
is returned using the same structure.
If a facility can indicate success by returning
the message unchanged (k null reply()), then the format
is:
free buffer( BUFFER,* );
Sometimes, when facilities use standard
structures, only some of the fields will actually have
meaning. The following notation identifies meaningful
fields:
get_buffer( BUFFER{data-len},
***{data_len,data-ptr});
mh;c transaction notation describes the same
transaction as get-buffer above, but in more detail.
The facility requests a buffer of a particular length,
and the responding facility returns a pointer to the
buffer along with the buffer's actual length.

CA 02358807 2001-10-11
-76-
a. FS Facility Communication
Transactions
The communication transactions that the FS
facilities of the present invention recognizes, and
that the other facilities of the present invention
messaging kernel layer recognize as appropriate to
interact with the FS facility, are summarized in
Table 4 below.

CA 02358807 2001-10-11
Table 4
Summary of FS Communication Transactions
LFS Configurati on nacLement
Ma
fc_ find_manager(FC MOUNT T,***{errno,fc-pig} )
fc mount MOUNT_T,***(errno,fc~id,file}
)
(FC
_ unmount _
fc (FC
STD
T{partition.fsid},*{errno}
)
LFS Data Transf er ssages
Me
fc read ( RDWR_T(un.in},
FC
_ _ ***{errno,un.out.{bd,vattr}} )
fc write ( RDWR_T{un.in},
FC
_ _ ***{errno,un.out.{bd,vattr}} )
fc readdir ( RDWR_T{un.in},
FC
_ _
***{errno,un.aut.{bd,new offset}}
)
fc readlink ( RDWR_T{un.in.file,un.in.cred},
FC
_ _
***(errno,un.out.bd} )
fc_ release ( _RDWR_T{un.out.bd},*{errno}
FC
LFS File Mana"c~ement Messages
fc_ null (
K_MSG,***)
fc null (
null K_MSG,*)
fc _ ( _STD_T{cred,file,un.mask},
_ FC
-getattr
T{errno,vattr} )
FILE
FC
fc setattr ( _
FC _
_SATTR_T, FC_FILE_T{errno,vattr}
)
fc _ ( _DIROP_T{cred,where}, FC FILE T
_lookup FC )
fc create ( _CREATE_T, FC_FILE_T )
FC
fc _ ( _DIROP_T{cred,where}, *{errno}
remove FC )
fc _ ( _RENAME_T, *{errno} )
rename FC
fc _ ( _LINR_T, *{errno} )
link FC
fc _ ( T, *{errno} )
symlink FC SYMLINK
fc _ ( _
rmdir FC _
_DIROP_T{cred,where}, *{errno}
)
fc _ ( STATFS T{fsid},***)
- statfs FC
VOP ther Miscellaneous LFS Messages
VFS
and
O
fc fsync ( STD_T{cred,file}, *{errno} )
FC
fc _ ( _
access FC _STD T{cred,file,mode}, *{errno}
)
fc _ ( STD T{cred,fsid}, *{errno} )
-syncfs FC
The use of these communication transactions are
best illustrated from the perspective of their use.
An FS facility process named FC VICE_PRES directs
the configuration of all FS facilities in the system

CA 02358807 2001-10-11
160. Even with multiple instantiations of the FS
facility, there is only one FC VICE_PRES process.
There are also one or more unnamed manager processes
which actually handle most requests. Each file system-
s -or disk partition--in the system 160 belongs to a
particular manager; however, a manager may own more
than one file system. Since managers are unnamed,
would-be clients of a file system first check with
FC VICE PRES to get the FS facility pid of the
appropriate manager. Thus, the FC VICE PRES process
does no actual work. Rather, it simply operates to
direct requests to the appropriate manager.
To provide continuous service, managers must avoid
blocking. Managers farm out requests that would block
to a pool of unnamed file controller worker processes.
These details are not visible to FS facility clients.
The signif icant message structures used by the FS
' facility are given below. For clarity, the commonly
used structures are described here. An FSID (file
system identifier) identifies an individual file
system. An FSID is simply the UNIX device number for
the disk array partition which the file system lives
on. An FC FH structure (file controller file handle)
identifies individual files. It includes an FSID to
identify which file system the file belongs to, along

CA 02358807 2001-10-11
with an inode number and an inode generation to
identify the file itsel~.
Mart-uQ, Mountinc and Unmounting
Once the FC peer-level processor has booted an
instantiation of the FS facility, the first FS facility
to boot spawns an FC VICE-PRES process which, in turn,
creates any managers it- requires, then waits for
requests. Besides a few Ninternal" requests to
coordinate the mounting and unmounting of files systems
is the operation of multiple file system facilities,
The only request it accepts is:
fc_find manager ( FC MOUNT T,***{errno,fc-pid) );
The input message includes nothing but an FSID
identifying the file system of interest. The
successful return value is an FS facility process id
which identifies the manager responsible for this file
system. Having found the manager, a client facility
with the appropriate permissions can request that a
file system be made available for user requests (mount)
or unavailable for user requests (unmount). These
requests are made by the local host facility, through
its VFS/LFS client interface; requests for the mounting
and unmounting of file systems are not received
directly from client NC facilities.

CA 02358807 2001-10-11
The transaction:
fc-mount ( FC-MOUNT T,***{errno,fcJpid,file~ );
returns the root file handle in the requested file
system.
The unmount transaction:
fc unmount ( FC STD T{fsid}, *{errno} );
returns an error code. (The * in the transaction
description indicates that a k_null_rPply() is
possible, thus the caller must set errno to zero to
detect a successful reply.)
Data Transfer Messages
There are four common requests that require the
transfer data. These are FC READ, FC READDIR,
FC READLINK, and FC WRITE. The FS facility handles
these requests with a two message protocol. All four
transactions are similar, and all use the FC RDWR T
message structure for their messages.
typedef struct {
void *buf; /* Buffer id. Valid if
non-NULL. */
vme_t addr; /* Pointer to data. */
int count; /* Length of data. */
} FC BUF DESC;
##define FC RDWR BUFS 2

CA 02358807 2001-10-11
-81-
typedef struct {
int type;
int errno;
union {
struct {
FC_CRED cred; /* credentials */
int flags;
FC_FH file;
int offset;
int count;
} in;
struct
/*
* Structure used in response to
* fc-release message.
x/
FC_BUF_DESC bd[FC RDWR BUFS};
/* Buffer descriptors. */
FC_VATTR vattr;
} out;
} un;
} FC RDWR T;
The FC READ transaction is described in some
detail. The three by other transactions are described
by comparison.
A read data communication transactions is:
fc read ( FC RDWR T[un.in},
***{errno,un.out.tbd,vattr}} );
As sent by a client facility, the "in" structure
of the union is valid. It specifies a file, an offset
and a count. The FS facility locks the buffers which
contain that information; a series of message
transactions with the S facility may be necessary to
read the file fron disk. In its reply, the FS facility
uses the "out" structure to return both the attributes
of the file and an array of buffer descriptors that

CA 02358807 2001-10-11
_g'_
identify the VME memory locations holding the data. A
buffer descriptor is valid only if it's "buf" field is
non-zero. The FS facility uses non-zero values to
identify buffers, but to client facilities they have no
meaning. The attributes and buffer descriptors are
valid only if no error has occurred. For a read at the
end of a file, there will be no error, but all buffer
descriptors in the reply will have NULL "buf'" rield~.~
After the client facility has read the data out of
the buffers, it sends the same message back to the FS
facility a second time. This time the transaction is:
fc release ( FC RDWR T{un.out.bd}, *{errno} );
This fc-release request must use the same message that
was returned by the fc read request. In the reply to
the fc_read, the FS facility sets the message "type"
field of the message to make this work. The following
pseudo-code fragment illustrates the sequence:
msg = ( FC-RDWR T*)k-alloc msg();
initialize_message;
msg = k_send( msg, fc-pid );
copy_data-from buffers_into local memory;
msg = k_send( msg, fc~id );
The same message, or an exact duplicate, must be
returned because it contains the information the FS
facility needs to free the buffers.
Although the transaction summary of Table 4 shows
just one fc_release transaction, there are really four:
one for each type of data transfer: fc_read_release,

CA 02358807 2001-10-11
_8j_
fc write release, fc_readdir_release and
fc read link release. Since the FS facility sets the
"type" field for the second message, this makes no
difference to the client facility.
If the original read transaction returned an
error, or if none of the buffer descriptors were valid,
then the release is optional.
Ta FC WRTTE transaction is identical to FC_READ,
but the client facility is expected to write to the
locations identified by the buffer descriptors instead
of reading from them.
The FC READDIR transaction is similar to read and
write, but no file attributes are returned. Also, the
specified offset is really a magic value--also
sometimes referred to as a magic cookie--identifying
directory entries instead of an absolute offset into
the file. This matches the meaning of the offset in
the analogous VFS/VOP and NFS versions of readdir. The
contents of the returned buffers are "dirent"
structures, as described in the conventional UNIX
"getdents" system call manual page.
The FC READLINK transaction is the simplest of the
four communication transactions. It returns no. file
attributes and, since links are always read in their
entirety, it requires no offset or count.

CA 02358807 2001-10-11
_gq_.
In all of these transactions, the requested
buffers are locked during the period between the first
request and the second. Client facilities should send
the fc release message as soon as possible, because the
buffer is held locked until they do, and holding the
lock could slow down other client facilities when
requesting the same block.
In the p referred embodiment of the present
invention, the these four transactions imply
conventional NFS type permission checking whenever they
are received. Although conventional VFS/UFS calls do
no permission checking, in NFS and the LFS of the
present invention, they do. In addition, the FS
facility messages also supports a "owner can always
read" permission that is required for NFS.
LFS File Management Messactes
The LFS communication transactions, as described
below, are similar to conventional NFS call functions
with the same names.
The communication transaction:
fC null {R MSG,xx*);
does nothing but uses k_reply().
The communication transaction:
fc null null( R MSG,*);

CA 02358807 2001-10-11
-85-
also does nothing, but uses the quicker k-null_reply().
Both of these are intended mainly as performance tools
for measuring message turnaround time.
The communication transaction:
fc_getattr (FC STD T{cred,file,un.mask},
FC FILE T{errno,vattr} );
gets the vnode attributes of the specified file. The
mask specifies which attributes should be returned. A
mask of FC ATTR ALL gets them all. The same structure
is always used, but for un-requested values, the fields
are undefined.
The communication transaction:
fc setattr ( FC SATTR T,FC FILE T{errno,vattr} );
sets the attributes of the specified file. Like
fc-getattr, fc-setattr uses a mask to indicate which
values should be set. In addition, the special bits
FC ATTR TOUCH (AMCJTIME can be set to indicate that the
access, modify or change time of the file should be set
to the current time on the server. This allows a Unix
"touch" command to work even if the times on the client
and server are not well matched.
The communication transaction:
fc lookup ( FC DIROP T{cred,where},FC FILE T );
searches a directory for a specified file name,
returning the file and it's attributes if it exists.
The "where" field of FC DIROP T is an FC DIROP

CA 02358807 2001-10-11
-86-
structure which contains a file, a name pointer, and a
name length. The name pointer contains the vme address
of the name. The name may be up to 256 characters
long, and must be in memory that the FS facility can
read.
The communication transaction:
fc create( FC CREATE T,FC-FILE T );
creates files. The FC CREATE T describes what type of
file to create and where. The vtype field may be used
to specify any file type including directories, so
mkdir is not supported. If the "FC CREATE EXCL" bit is
set in the flag field, then fc_create will return an
error if the file already exists. Otherwise, the old
file will be removed before creating the new one.
The communication transaction:
fc remove ( FC DIROP T{cred,where},*{errno} );
removes the specified name from the specified
directory.
The communication transaction:
fc rename ( FC RENAME T,* );
changes a file from one name in one directory to a
different name in a (possibly) different directory in
the same file system.
The communication transaction:
fc link ( FC LINK T,*{errno} );

CA 02358807 2001-10-11
-87-
links the specified file to a new name in a (possibly)
new directory.
The communication transaction:
fc-symlink ( FC-SYMLINK T,*{errno} );
creates the specified symlink.
The communication transaction:
fc rmdir ( FC DIROP T{cred,where},*{errno} );
removes a directory. The arguments for fc-rmdir are
like those for fc remove.
The communication transaction:
fc statfs ( FC STATFS T{fsid},*** );
returns file system statistics for the file system
containing the specified file.
VFS/VOP LFS Su,Lport Transactions
The communication transactions described below
are provided to support the VFS/VOP subroutine call
interface to the LFS client layer. Most VOP calls can
be provided for using the message already defined
above. The remaining VOP function call support is
provide by the following transactions.
The communication transactions:
fc-fsync ( FC STD T{cred,file},*{errno} );
fc_syncfs ( FC STD T{cred,fsid), *{errno} );
ensure that all blocks for the referenced file or file
system, respectively, are flushed.

CA 02358807 2001-10-11
_88_
The communication transaction:
fc access( FC STD T{cred,file,mode},*{errno} );
determines whether a given type of file access is legal
for specified credentials ("cred") on the specified
file. The mode value is "FC-READ_MODE",
"FC WRITE MODE", or "FC-EXEC MODE". If the mode is
legal, the returned errno is zero.
Table 5 lists the in'sr-facility message types
supported by the FS facility.
Table 5
FS Facility Message Types
(F MSGTYPE)
#define FC ID ( (long)( ('F' 8) ; ') ) 16
('C )
\* External Messages *\
#define FC MANAGER ( FC_ID )
FIND 1
;
#define _ _ ( FC_ID )
FC MOUNT 2
;
#define FC UNMOUNT ( FC-ID
3 )
;
#define FC READ ( ; ID )
4 FC_
#define _ WRITE ( ; ID )
FC_ 5 FC_
#define FC READDIR ( ; ID )
6 FC_
#define FC READLINK ( ; ID )
7 FC_
#define FC READ_RELEASE ( ; ID )
8 FC_
#define _ WRITE_RELEASE ( ; ID )
FC_ 9 FC_
#define FC READDIR_RELEASE ( ; ID )
10 FC_
#define FC READLINR RELEASE ( ; ID )
11 FC_
#define FC NULL ( ; _ID )
12 FC
#define _ NULL_NULL ( ; _ID )
FC_ 13 FC
#define FC GETATTR ( ; _ID )
14 FC
#define _ SETATTR ( ; _ID )
FC_ 15 FC
#define FC LOOKUP ( ; _ID )
16 FC
#define _ CREATE ( ; _ID )
FC_ 17 FC
#define FC REMOVE ( ; _ID )
18 FC
#define _ _RENAME ( ; _ID )
FC 19 FC
#define FC LINK ( ; _ID )
20 FC
#define FC SYtdLINK { ; _ID )
21 FC

CA 02358807 2001-10-11
-89-
#define FC RMDIR ( 22 FC_ ID
; )
#define _ STATFS ( 23 FC- ID
FC ; )
#define FC FSYNC ( 24 FC_ ID
; )
#define _ ACCESS ( 25 FC_ ID
FC ; )
#define _ SYNCFS ( 26 FC_ ID
FC ; )
/* Internal Messages. */
#define FC PARTITION ( 27 FC_ ID
REG ; )
#define _ _ ( 28 FC _ID
FC-UNREG-PARTITION ; )
The FS facility message structures are listed
below.
/* Standard Structure handles many messages. */
which
typedef struct {
MSGTYPE type;
K
_ errno;
long
CRED cred; /* Access credentials */
FC
_ file;
FC FH
union {
FSID fsid; /* For fc_get_server. */
FC
_ mode; /* {READ,WRITE,EXEC} for
long
access. */
fc
PID pid; _
K /* FS facility pid of
_ server. */
long mask; /* Mask attributes.
(FC ATTR_*). */
} un;
} FC STD T;
/* Structure fo r fs control-- mounting, unmounting.
*/
typedef struct {
MSGTYPE type;
K
_ errno;
long
long fc; /* IN: Which FC to use. (i.e.
0, 1, ...)*/
-
long flags; /* /
IN: Mount flags. *
FC PARTITION partiti on; /* IN: Describes SP
- partition to use. */
PID fc~id; /* OUT: PID of manager for FS.
R
_ */
FH file; /* OUT: Root file handle of
FC
_ fi le system. ~*/
} FC MOUNT T;

CA 02358807 2001-10-11
-90-
typedef struct
h_MSGTYPE type;
FC_CRED creel;
FH file;
FC
_ mask; /* Mask attributes.
long
(FC ATTR ") */
FC_SATTR sattr;
} FC SATTR T;
typedef struct {
K_MSGTYPE type;
long errno;
FC file;
FH
_ vattr;
VATTR
FC
_
} FC FILE T;
typedef struct
void *buf;
vme addr; /* fc returned data. */
t
_ count; /* fc returned data length. */
long
} FC BUF DESC;
The FC BUF DESC
structure is
used in the two
message data transfer
protocols. A
typical sequence
is:
fc_read ( FC_RDWR_~{flags,un.in},
FC_RDWR_T{fiags,un.out} );
fc_release ( FC_RDWR_T{flags,un.out},
FC-RD~iR-T{flags,un.out} )
Note that the "out" union member is the output
for the first message and the input for the second.
#define FC RDWR BUFS 2
typedef struct {
R_MSGTYPE type;
long errno ;
union {
struc t {
FC file; /* For first
FH
_ message. */
CRED creel;
FC
_ flags;
long
long offset; /* User requested
file offset. */

CA 02358807 2001-10-11
_90_
long count; /* User requested
count. *~
} in;
struct {
/" Structure used in response to
fc_release message. */
FC_BUF_DESC bd(FC_RDWR_BUFS);
/ * Buf f er
descriptor. */
FC_VATTR vattr; /* For
responses. */
long new offset; /* For READDIR.
*/
}
} un;
} FC RDWR T;
/* #define FC RDWR_SYNC 0x0001
/* #define FC RDWR_NOCACHE 0x0002 /* Don't cache
buffer. */
This structure is used in those operations that
take a directory file handle and a file name within
that directory, namely "lookup", "remove", and
"rmdir"
typedef struct {
K_MSGTYPE type;
long errno;
FC_CRED cred;
FC_DIROP where; /* File to look up
or remove. */
} FC DIROP T;
Not all fields that can be set can be specified in
a create, so instead of including FC SATTR, only the
values that can be set as included.

CA 02358807 2001-10-11
typedef struct {
MSGTYPE t ype;
K
_
CRED cred;
FC
_
DIROP where;
FC
_
short flag;
short vtype; /* Type for new file. */
short mode; /* Mode for new file.
u */
_ num; /* Major number for
short major_
devices.
*/
short minor_ num; /* Minor number for
devices.
*/
} FC CREATE T;
/* Values for the flag.
*/
#define FC CREATE EX~'L Exclusive. */ .
Ox00C1 /*
typedef struct {
K_MSGTYPE type;
long errno;
CRED cred;
FC
_ from;
FC_FH
FC_DIROP to;
} FC RENAME T;
typedef struct {
K_MSGTYPE type;
long errno;
CRED cred;
FC
_ from;
FH
FC
_ to;
FC_DIROP
FC LINR T;
typedef struct {
K_MSGTYPE type;
long errno;
CRED cred;
FC
_ from; /* File to create. */
DIROP
FC
_ mode; /* File permissions.
short */
u
_ tour; /* Pointer to contents
vme t
fo r symlink */
long to_len;
} FC SYMLINR T;

CA 02358807 2001-10-11
-93-
typedef struct
MSGTYPE
type;
K
_ errno;
long
FSID fsid;
FC
_ bsize; /* Block size. */
long
u
_ blocks; /* Total number of
u
long
_ blocks. */
long bfree; /* Free blocks. */
u
_ bavail; /* Blocks available to
long
u
_ non-priv users. */
u files; /* Total number of file
long
_ slots. */
long ffree; /* Number of free file
u
_ slots. */
1~ long favail; /* File slots available
u
- to non-priv users. */
struct timeval me; /* Server's current
sti
time of day. */
j FC_STATFS_T;
#define FC _MAXNAMLEN 255
#define FC MAXPATHLEN 1024
struct dirent {
fc
u _ off; /* offset of next disk
long d
_ dir ectory entry */
a long d fileno; file number of entry */
/*
u _ reclen; length of this record */
short /*
d_
_ short namlen; length of string in d-name
u d_ /*
_ */
ch name[ FC MAXNAMLEN
ar d + lj; /*
name (up
to
_ MAXNAMLEN + 1)
*/
):
.
b. NC Facility Communication
Transactions
The communication transactions that the NC
facilities of the present invention recognize, and that
the other messaging kernel layers of the present
invention messaging kernel layer recognize as
appropriate to interact with the NC facility, are

CA 02358807 2001-10-11
-94-
summarized in Table 6 below. The NC facility also uses
and recognizes the FS facility communication
transactions described above.
Table 6
Summary of NC Communication Transactions
I
nc_ register ( REGISTER
dl NC DL
T,***{status}
nc_ set~romis _ _
( _
NC_ IFIOCTL_T{unit,promis},.
***{status} )
nc_ add multi ( IFIOCTL
NC T{unit,mc addr},
_ _
***{status} )
nc del multi ( IFIOCTL
NC T{unit,mc addr},
_ _
***{status} )
nc- set ifflags ( IFIOCTL
NC T{unit,flags},
* _ _
**{status} )
nc get ifflags ( IFIOCTL
NC T{unit},
- - _ _
***{status,flags} )
nc- set_ifmetric ( IFIOCTL
NC T{unit,metric},
_ _
***{status})
nc- set-ifaddr ( IFIOCTL_T{unit,if addr},
NC_
***{status} )
nc- get_ifaddr ( IFIOCTL
NC T{unit},
_ _
***{status,if addr} )
nc get ifstat ( IFSTATS
NC T,*** )
- set macflags _ _
nc- ( IFIOCTL
NC T{unit,flags},
* _ _
**{status} )
nc- get macflags ( IFIOCTL
NC T{unit},
* _ _
**{status,flags} )
nc- set_ip braddr ( INIOCTL_T, *** )
NC_
' nc- get ip braddr ( INIOCTL_T, *** )
NC_
nc- set_ip_netmask ( INIOCTL
NC T, *** )
nc_ get_ip netmask _ _
( INIOCTL
NC_ T, *** )
nc add arp ( _
entry NC ARPIOCTL
T, *** )
_ _ - _ _
nc_ del_arp_entry ( ARPIOCTL_T, *** )
NC_
nc get arp ( ARPIOCTL
entry NC T, *** )
_ add _ _ _
nc _ ( RTIOCTL T
route NC ***)
_ del _ _ ,
nc- -route ( RTIOCTL T, ***)
NC_
NFS Co nfiguration ssa"oes
Me
nc_nfs _start ( NFS_START_T, * )
NC_
nc nfs export ( NFSEXPORT
NC T
***{errno} )
nc - _ NFS_
nfs unexport ( _
NC ,
UNEXPORT
T, ***{errno}
_ _getstat _ NFS_
nc_nfs N ( _
C- _STATS T,*** )

CA 02358807 2001-10-11
-95-
Network Interface Data Messages
nc_xmit~kt ( NC_PRT_IO_T,* )
nc-recv_dl~kt ( NC_PRT_IO_T,* )
nc-recv_ip_p kt ( NC_PKT_IO_T,* )
nc_recv~romis pkt ( NC_PKT_IO_T,*)
nc-forward-ip_pkt ( NC_PRT_IO-T,* )
Secure Authentication Messaczes
ks_decrypt ( RS_DECRYPT_T{netname,netnamelen,desblock},
***{rpcstatus,ksstatus,desblock} )
ks_getcred ( RS_GETCRED_T{netname,netnamelen},
***{rpcstatus,ksstatus,cred} )
A network communications facility can exchange
messages with the host facility, file system facility
and any other network communications facility within
the system 160. The host facility will exchange
messages with the network communications facility far
configuring the network interfaces, managing the ARP
table and IP routing table, and sending or receiving
network packets. In addition, the host facility will
exchange messages with the network communications
facility for configuring the NFS server stack and to
25~ respond in support of a secure authentication service
request. The network communications facility will
exchange messages with the file system facility for
file service using the external FS communication
transactions discussed above. Finally, a network
communication facility will exchange messages with
other network communication facilities for IP packet
routing.

CA 02358807 2001-10-11
-96-
~y_ste~ Call Layer Chanczes
The exportfs(), unexport(), rtrequest(),
arpioctl() and in-control() function calls in the
system call layer have been modified. The exportfs()
and unexport() functions are called to export new file
systems and unexport an exported file system,
respectively. A call to these modified functions now
also initiates the appropriate NC NFS EXPORT or
NC N=S UNEXPORT communication transactions to each of
the network facility.
The rtrequest() function is called to modify the
kernel routing table. A call to the modified function
now also initiates an appropriate NC communication
transaction (NC ADD ROUTE for adding a new route or
NC DAL ROUTE for deleting an existing route) to each of
the Network facility.
The arpioctl() function is called to modify the
kernel ARP table. This function has now been modified
to also init~.ate the appropriate NC communication
transaction (NC ADD ARP for adding a new ARP entry or
NC D?L_ARP for deleting an existing entry) to each of
the network facility.
Finally, the in_control() function is called to
con~igure the Internet Protocol parameters, such as
setting the IP broadcast address and IP network mask to
be used for a given interface. This function has been

CA 02358807 2001-10-11
-97-
modified also initiate the appropriate NC
communications transaction (NC SET IP BRADDR or
NC SET IP NETMASK) to the appropriate network
facility.
NC Facility Initialization
When a network communications facility is
initialized following bootup, the following manager
processes are created:
nc-nfs vp<n> NFS server process for processing
NFS EXPORT and NFS UNEXPORT
communication transactions from the
host;
nc dlctrl<n> Network interface control process for
processing IOCTL communication
transactions from the host; and
nc dlxmit<i> Network transmit process for processing
NC XMIT PRT and NC FWD IP PRT
communication transactions.
where:
<n> is the network processor number:
0,1,2, or 3.
<i> is the network interface (LAN;
number: 0,1,2,3,4,5,6, or 7.

CA 02358807 2001-10-11
_98_
Once initialized, the NC facilities reports the
"names" of these processes to a SC NAME SERVER manager
process, having a known default PID, started and
running in the background, of the host facility. Once
identified, the host facility can configure the
network interfaces (each LAN connection is seen as a
logical and physical network interface). The following
command is typically issued by the Unix start-up script
for each network interface:
ifconfig <interface name> <host name> <options> up
where:
<interface name> is the logical name being
used for the interface;
<host name> is the logical host name of the
referenced <interface name.
The ifconfig utility program ultimately results in two
IOCTL commands being issued to the network processor:
nc_set-ifflags( flags = UP + <options> );
nc set ifaddr( ifaddr=address of host-
name(<hos;. name) );
The mapping of <host name> to address is typically
specified in the "/etc/hosts" file. To start the NFS

CA 02358807 2001-10-11
_99_
service, the following commands are typically then
issued by the Unix start-up script:
nfsd <n>
exportfs -a
where:
<n> specifies number of parallel NFS server
process to be started.
The nfsd utility program initiates an "nc nfs start"
communication transaction with all network
communication facilities. The "exportfs" communication
transaction is used to pass the list of file systems
(specified in /etc/exports) to be exported by the NFS w
server using the "nc-nfs-export" communication
transaction.
Once the NFS service is initialized, incoming
network packets address to the "NFS server UDP port"
will be delivered to the NFS server of the network
communications facility. It will in turn issue the
necessary FS communication transactions to obtain file
service. If secure authentication option is used, the
NFS server will issue requests to the Authei:ticatio~
server daemon running on the host processor. The
conventional authentication services include: mapping
(ks getcred()) a given <network name> to Unix style
credential, decrypting (ks decrypt()) a DES key using
the public key associated with the <network name> and

CA 02358807 2001-10-11
-100-
the secret key associated with user ID 0 (ie. with the
<network name> of the local host).
Routing
S Once a network communication facility is
initialized properly, the IP layer of the network
communication facility will perform the appropriate IP
packet routing based on the local routing database
table. This routing table is managed by the host
facility using the "nc add route" and "nc del route"
IOCTL commands. Once a route has been determined for a
particular packet, the packet is dispatched to the
appropriate network interface. If a packet is destined
to the other network interface on the same network
communication facility, it is processed locally. If a
packet is destined to a network interface of another
network communication facility, the packet is forwarded
using the "nc_forward_ip_pkt()" communication
transaction. If a packet is destined to a conventional
network interface attached to the host facility, it is
forwarded to the host facility using the
"nc-forward-ip~kt()" communication transaction.
The host facility provides the basic network
front-end service for system 160. All packets that are
addressed to the system 160, but are not addressed to
the NFS stack UDP server port, are forwarded to the

CA 02358807 2001-10-11
-101-
host facility's receive manager process using the
following communication transactions:
nc recv dl~kt ( NC-PKT_IO T,* );
where the packet type is not IP; and
nc recv_ip~kt ( NC PKT_IO T,* );
where the packet type is IP.
The communication transaction:
nc recv~romis~kt ( NC-PKT_IO T,*);
transfers packets not addressed to system 160 to the
host facility when a network communication facility has
been configured to receive in promiscuous mode by the
host facility.
To transmit a packet, the host facility initiates
a communication transaction:
nc xmit~kt ( NC_PKT_IO T,*);
to the appropriate network communication facility.
Finally, the host facility may monitor the
messages being handled by a network communication
facility by issuing the communication transaction:
nc-recv_promis~pkt ( NC_PKT_IO T,*),
to the appropriate network communication facility.
Table 7 lists the inter-facility message types
supported by the FS facility.

CA 02358807 2001-10-11
-102-
Table 7
NC Facility Message Types
#define NC ID ( (long)( ('N' «8) ; ('C') ) « 16 )
#define NC-IOCTL CMD CLASS(type) (type & Oxfffffff0}
/* NC "mac"
ioctl's
*/
#define MAC CMDS ((1 4) + NC
IOCTL ID)
#define NC _ (MACIOCTL _
_ CMDS+0)
REGISTER_DL
#define _ MACFLAGS (MACIOCTL CMDS+1)
NC SET
#define _ _ (MACIOCTL CMDS+2)
NC MACFLAGS
GET
#define NC _ (MAC_IOCTLCMDS+3)
GET_LFSTATS
1 5
/* BSD " if"
ioctl's
*/
#define DL CMDS ((2 4) + NC
IOCTL ID)
#define _ _ (DL IOCTL _
NC PROMSIC CMDS+0)
SET
#define _ _ (DL IOCTL CMDS+1}
NC MULTI
ADD
#define _ _ (DL IOCTL CMDS+2)
NC DEL_MULTI
#define _ IFFLAGS (DL IOCTL CMDS+3)
NC SET
#define _ _ (DL IOCTL CMDS+4)
NC IFFALGS
GET
#define.. _ _ .(DLIOCTL CMDS+5)
.NC SET IFMETRIC
#define NC IFADDR (DL IOCTL CMDS+6)
SET
#define _ _ (DL IOCTL CMDS+7)
NC GET IFADDR
/* BSD " in"
ioctl's
*/
#define IN CMDS ((3 4) + NC
IOCTL ID}
#define _ _ (IN IOCTL _
NC IP_BRADDR CMDS+0)
SET
#define NC _ (IN IOCTL CMDS+1)
_
NETMASK
SET
IP
#define NC _ (IN IOCTL CMDS+2)
_
_
IP_BRADDR
GET
#define NC _ (IN IOCTL CMDS+3)
_
GET IP NETMASK
/* BSD "arp" ioctl's */
#define CMDS ((4 4) + NC
' IOCTL ID)
ARP
#define NC _ _
_ (ARP
ARP IOCTL
ADD CMDS+0)
#define NC _ (ARP
_ IOCTL
~1RP CMDS+1)
DEL
#define NC _ (ARP
_ IOCTL
GET ARP CMDS+2)
/* BSD "route"
ioctl's
*/
#define RT CMDS ((5 4) + NC
IOCTL ID)
#define NC _ (RT IOCTL _
_ CMDS+0)
ADD_ROUTE
#define NC _ (RT IOCTL CMDS+1)
DEL ROUTE
/* Host/NC o NC data communication
t transactions.
*/
#define NC DLXMIT MSGTYPES ((6 4) + NC
ID)
#define NC _ (NC _
XMIT DLXMIT
PKT MSGTYPES+0)
#define NC _ (NC DLXMIT
_ MSGTYPES+1)
FWD IP PKT

CA 02358807 2001-10-11
-103-
/* Data communication to host
transactions receiver
processes. */
#define MSGTYPES ((7 4) + ID)
DLRECV NC
NC
#define _ (NC DLRECV _
_ MSGTYPES+0)
DL
PKT
RECV
NC
#define _ (NC DLRECV MSGTYPES+1)
_
_
PROMIS_PKT
RECV
NC
#define _ (NC- DLRECV-MSGTYPES+2)
_
NC-RECV-IP_PRT
/* NFS servercommunication transactions
*/
#define NFS CMDS ((8 4) + NC
ID)
#define NC _ START (NFS _
NFS CMDS+0)
#define _ _ EXPORT (NFS CMDS+1)
NC NFS
#define _ NFS_ (NFS CMDS+2)
NC UNEXPORT
#define _ NFS_ (NFS CMDS+3)
NC GETSTAT
#define _ NFS_ (NFS CMDS+4)
NC_ _STOP
The NC facility message structures are listed
below.
/*
* exported vfs flags.
*/
#define EX_RDONLY 0x01 /* exported read only */
#define EX_RDMOSTLY 0x02 /* exported read mostly
*/
#define EXMAXADDRS 10 /* max number address list */
typedef struct {
u_long naddrs; /* number of addresses */
vme_t addrvec; /* pointer to array of
addresses */
} NC EXADDRLIST;
/*
* Associated with AUTH_UNIX is an array of Internet
* addresses to check root permission.
*/
#define EXMAXROOTADDRS 10
typedef struct {
NC_EXADDRLIST rootaddrs;
} NC UNIXEXPORT;

CA 02358807 2001-10-11
-104-
/*
* Associated with AUTH_DES is a list of network names
* to check root permission, plus a time window to
* check for expired credentials.
*/
#define EXMAXROOTNAMES 10
typedef struct {
u-long nnames;
vme_t rootnames; /* names that point to
netnames */
vme_t rootnamelens; /* lengths */
u_int window;
} NC DESEXPORT;
typedef struct {
long val[2j; /* file system id type */
} fsid t;
/* File identifier. Should be unique per filesystem
on a single machine.
*/
#define MAXFIDSZ 16
struct fid {
u_short fid_len; /* length of data in bytes */
char fid_data(MAXFZDSZ); /* data */
};
/*******************x*********x*xx**x********x********
* NFS Server Communication transaction Structures.
*********************************x**x***************/
typedef struct {
K_MSGTYPE m-type;
35' int nservers; /* number of servers to start
up */
} NC NFS START T;

CA 02358807 2001-10-11
-105-
typedef struct {
MSGTYPE m
K type;
_ - /* error returned */
long errno;
fsid fsid; /* FSID for directory being
t
S _ exported */
struct fid fid; /* FID for directory being
exported */
long flags; /* flags */
u anon; /* uid for unauthenticated
short
_ requests */
long auth; /* switch for authentication
type */
union {
NC_UNIXEXPORT exunix; /* case AUTH_UNIX */
NC_DESEXPORT exdes; /* case AUTH_DES */
} un ;
NC_EXADDRLIST wri teaddrs;
} NC NFS EXPORT
T;
typedef struct {
MSGTYPE m
K type;
_ _ /* error returned */
long errno;
fsid fsid; /* of directory being
t
_ unexported */
struct fid fid; /* FID for directory being
unexported */
} NC NFS UNEXPORT T;
/*
* Return server statis tics.
*/
typedef s truct {
int rscalls; /* Out - total RPC calls */
int rsbadcalls; /* Out - bad RPC calls */
35' int rsnullrecv;
int rsbadlen;
int rsxdrcall;
int ncalls; /* Out - total NFS calls */
int nbadcalls; /* - calls that failed */
int reqs(32J; /* - calls for each request
*/
} NC NFS STATS T;

CA 02358807 2001-10-11
-106-
/*____________________________________________________
* Network Interface IOCTL communication transaction
structures
*__________________________________________________*/
typedef struct {
K_MSGTYPE m_type;
short status; /* output */
char unit; /* Only used with IF, MAC and
IN commands. */
char , pad;
K_PID receiver~pid;
short mem_xfer_mode; /* 0-normal, 1-VME
block, 2-AEP */
long recv-mem_size; /* I */
long recv_mem_start_addr;/* I */
ETHADDR intf_addr; /* O: address of
interface */
} NC REGISTER DL T;
typedef struct {
type;
MSGTYPE m
K
_ /* output */
_
short status;
char unit; /* Only used with IF, MAC
and
IN commands. */
char pad;
union {
long promis; /* I */
ETHADDR mc addr;/* I: add and delete */
_ /* I: set flag; O: get
short flags;
flag */
long metric; /* I */
struct sockaddr if_addr;
/* I */
} un;
} NC IFIOCTL T;
~
typedef struct {
type;
MSGTYPE m
K
_ /* output */
_
short status;
char unit; /* Only used with IF, MAC
and
IN commands. */
char pad;
struct if stats {
long if_ipackets; /* packets received */
ibytes; /* bytes received */
long if
_ /* input errors */
ierrors;
long if
_ /* packets sent */
long if_opackets;
long if_obytes; /* bytes sent */
oerrors; /* output errors */
long if
_ s; CSMA collisions */
long if_collision /*
} if_stats;
} NC IFSTATS T;

CA 02358807 2001-10-11
-I07-
typedef struct {
type;
MSGTYPE m
K
_ /* output
_ */
short status;
char unit; /* Only usedwith IF, MAC and
IN commands.*/
char pad;
union {
addr br_ addr;, I */
struct in /*
_ net mask; I */
struct in_addr /*
} un;
} NC INIOCTL T;
typedef struct {
MSGTYPE m type;
R
_ /* output
short status; */
char unit; /* Only usedwith IF, MAC and
IN commands.*/
char pad;
struct arpreq arp- req;
} NC ARPIOCTL T;
typedef struct {
type;
MSGTYPE m
R
_ /* output
_ */
short status;
char unit; /* Only usedwith IF, MAC and
IN commands.*/
char pad;
struct rtentry route-req;
} NC RTIOCTL T;
/*____________________________________________________
* Network Interface Data Communication transaction
Structure
*-_________________________________________________*/
typedef struct {
long len;
caddr address;
t
_ FER;
} PRT DATA BUF
#define MAX_ DL_BUFFRAG 4
#define VME_ XFER_MODE_NORMAL 0
#define VME_ XFER_BLOCK I
#define VME AEP 2 /* enhanced
XFER
_ _ protocol */

CA 02358807 2001-10-11
-108-
typedef struct ether xmit f
K_MSGTYPE m_type;
char src_net; /* Source of packet. */
char dst_net; /* Destination of packet. */
char vme_xfer_mode; /* What transfer mode can be
used to access data in
buflist. */
char padl;
short pktlen; /* Total packet length. */
short pad2;
PKT_DATA_BUFFER pkt buflist[MAX_DL_BUFFRAG+1};
} NC-PKT IO T;
/****x******************************x*****************
* Secure Authentication Server Communication
transactions
****************************************************/
/*
* Name under which the key server registers.
*/
#define KEYSERV NAME "REYSERV"
/* Key server message types. */
#define RS_DECRYPT 69
#define KS GETCRED 137
typedef struct {
K_MSGTYPE type;
u_long rpcstatus; /* RPC status */
u_long ksstatus; /* key server reply
status */
vme_t netname; /* netname */
long netnamelen; /* length of netname */
des_block desblock; /* DES block in and out
*/
} RS DECRYPT T;
typedef struct {
R_MSGTYPE type;
a long rpcstatus; /* RPC status */
u~_long ksstatus; /* key server reply
status */
vme_t netname; /* netname */
long netnamelen; /* length of netname */
unixcred cred; /* credentials returned */
} KS GETCRED T;

CA 02358807 2001-10-11
-109-
c. Host Facility Communication
Transactions
The communication transactions that the host
facility of the present invention recognizes and
provides are summarized in Table 8 below. These
transactions are used to support the initialization
and ongoing coordinated operation of the system 160.
Table 8
Host Facility Message Types
sc fifo ( REGISTER_-FIFO_T,***
register SC_ );
sc_ _ ( GET_SYS_CONFIG_T,***);
get sys config SC_
sc register ( REGISTER_NAME_T,***
name SC_ );
_ _ ( INIT_COMPLETE_T,***
sc_ init_complete SC_ );
sc resolve ( _RESOLVE_NAME_T,***
name SC );
sc _ ( _RESOLVE_FIFO_T,***
_ SC );
resolve
fifo
sc _ ( _TIME_REGISTER_T,***
_ SC );
time
register
sc _ ( _REAL_TIME_T,*** );
_ SC
real_time
sc err ( _ERR_LOG_MSG_T,*** );
log msg SC
sc _ ( ERR LOG MSG2,*** );
_err_log_msg2 SC
Name Service
The name server daemon ("named") is the Unix host
facility process that boots the system and understands
all of the facility services that are present in the
system. That is, each facility provides at lease one
service. In order for any facility to utilize a
service of another, the name of that service must be
published by way of registering the name with the name
server daemon. A name is an ascii string that

CA 02358807 2001-10-11
-110-
represents a service. When the name is registered, the
relevant servicing process PID is also provided.
Whenever the name server daemon is thereafter queried
to resolve a service name, the name server daemon will
respond with the relevant process PID if the named
service is available. This one level of indirection
relieves the need to otherwise establish fixed process
IDs for all of the possible services. Rather, the
multi-tasking kernels of the messaging kernel layers
are allowed to establish a PID of their own choosing to
each of the named services that they may register.
The communication transaction:
sc-register-fifo ( SC REGISTER_FIFO T,*** );
is directed to the named daemon of the host facility to
provide notice that the issuing NC, FS, or S facility
has been started. This transaction also identifies the
name of the facility, as opposed to the name of a
service, of the facility that is registering, its
unique facility ID (VME slot ID) and the shared memory
address of its message descriptor FIFO.
The communication transaction:
sc-get_sys_config ( SC_GET_SYS CONFIG T,***);,
is used by a booting facility to obtain configuration
information about the rest of the system 160 from the
name server daemon. The reply message identifies all

CA 02358807 2001-10-11
-11I-
facilities that have been registered with the name
server daemon.
The communication transaction:
sc-ini\ :omplete ( SC-INIT COMPLETE T,*** );
is sent to \the name server daemon upon completion of
its initialization inclusive of handling the reply
message to its sc_get-sys-config transaction. When the
name server daemon returns a reply message, the
facility is cleared to begin normal operation.
The communication transaction:
sc register name ( SC REGISTER NAME T,*** );
is used to correlate a known name for a service with
the particular PID of a facility that provides the
service. The names of the typical services provided in
the preferred embodiment of the present invention are
listed in Table 9.
Table 9
Named Facility Services
Host Facility Resident
SC_NAME_SERVER - the "Name server" daemon -
executes on the host peer-level processor, or
primary host processor if there is more than
one host facility present in the system.
Provides the system wide name service.
Operates also to collect and distribute
information as to the configuration, both
physical (the total number of NCs present in
the system and the VME slot number of each)
and logical (what system services are
available).

CA 02358807 2001-10-11
-112-
SC-~RRD - the "ERRD" daemon - executes on the
host peer-level processor, or primary host
processor if there is more than one host
facility present in the system. Injects an
error message into the UNIX syslogd system.
This results in the error message being
printed on the system console and, typically,
logged it in an error file.
SC TIMED - the "TIMED" daemon - executes on the
host peer-level processor, or primary host
processor if there is more than one host
facility present in the system. Returns the
current system time. Can also be instructed
to give notification of any subsequent time
changes.
SC KEYSERV - executes on the host peer-level
processor, or primary host processor if there
is more than one host facility present in the
system. When NFS runs in secure (DES
encryption) mode, it provides access to the
conventional Unix daemon that, in turn,
provides access to keys which authenticate
users.
FS Facility Resident
FC VICE-PRES - executes on the FC peer-level
processor, or primary FC processor if there
is more than one such facility present in the
system. Coordinates the operation of
multiple FS facilities by servicing all
requests to identify the PID of the unnamed
~ manager process that controls access to a
FSID. At least one unamed manager process
runs in each FS facility.
FC-STATMAN# - executes in a respective FC
facility (#). Functions as a "statistics
manager" process on the FC facility to
collect and allow other facilities to request
a report of current statistics, such as the
number of messages received.

CA 02358807 2001-10-11
-lI3-
S Facility Resident
S-MANAGER# - executes the respective S
facility (#). All low-level disk requests
for the disk array coupled to the storage
processor (#) are directed to this manager
process. Unnamed worker processes are
allocated, as necessary to actually carry out
the request.
S-STATMAN# - executes in a respective S
facility (#). Functions as a "statistics
manager" process on the S facility to collect
and allow other facilities to request a
report of current statistics.
NC Facility Resident
NC NFS_VP# - executes in a respective NC facility
(#). Controls the operation of NFS for its
respective NC facility. Accepts messages
from the host facility for starting and
stoping NFS and for controlling the export
and unexport of selected file systems.
NC_DLCTRL# - executes in a respective NC
facility (#). Functions as the Data Link
controller for its NC facility (#). Accepts
ioctl commands for a local message specified
data link and allocates a worker process, as
necessary, to carry out the message request.
NC-DLXMIT# - executes in a respective NC
facility (#). Functions as the Data Link
transmitter for its NC facility (#). Accepts
transmit commands for a local message
- specified data link and allocates a worker
process, as necessary, to carry out the
message request.
NC_STATMAN# - executes in a respective NC
facility (#). Functions as a "statistics
- ~ manager" process on the NC facility to
collect and allow other facilities to request
a report of current statistics.

CA 02358807 2001-10-11
-114-
The communication transaction:
sc-resolve-name ( SC-RESOLVE_NAME T,*** );
is used by the messaging kernel layer of a facility to
identify the relevant process PID of a service provided
by another facility. The reply message, when returned
by the name server daemon, provides the "resolved"
process ID or zero if the named service is not
supported.
The communication transaction:
sc-resolve_fifo ( SC_RESOLVE FIFO T,*** );
is issued by a facility to the name server daemon the
first time the facility needs to communicate with each
of the other facilities. The reply message provided by
the name server daemon identifies the shared memory
address of the message descriptor FIFO that corresponds
to the named service.
Time Service
The time server daemon ("timed") provides system
wide timer services for all facilities.
The communication transaction:
sc time_register ( SC_TIME-REGISTER_T,***
is issued by a facility to the timed daemon to
determine the system time and to request periodic time
synchronization messages. The reply message returns
the current time.

CA 02358807 2001-10-11
-lI5-
The communication transaction:
sc real time ( SC REAL TIME T,*** );
is issued by the time server daemon to provide
"periodic" time synchronization messages containing the
current time. These transactions are directed to the
requesting process, based the "client_pid" in the
originally requesting message. The period of the
transactions is a function of a default time period,
typically on the order of several minutes, or whenever
the system time is manually changed.
Error LoQCrer Service
The error server daemon ("errd.') provides a
convenient service to send error messages to the system
console for all facilities.
The communication transaction:
sc_err-log msg ( SC ERR-LOG MSG_T,*** );
prints the string that is provided in the send
message, while the transaction:
sc_err-log msg2 ( SC_ERR_LOG MSG2,*** );
provides a cr.c~ssage and an "error id" that specifies a
print format specification stored in an "errd message
format" file. This format file may specify the error
message format in multiple languages.

CA 02358807 2001-10-11
-116-
/ * * * * * * * * * * * * * * * x * * * * * * * * * * * * * * * * * * * * * *
* x
* Structures and Constants for the SC_NAMED process.
* * * * * * * * * * * * * * * * * * * * x * * * * * * * * * * * * * * * * * *
* * * * * /
/*
* Board types.
*/
#define BT 0
NONE
_ 1 /* Host Processor */
#define BT
UNIX
_ 2 /* Storage Processor */
#define BT
PSA
_ 3 /* File Controller. */
#define BT
FC
_ 4 /* Network Controller.
#define BT */
NC
_ 5 /* Test Environment */
#define BT
PLESSEY
_ 6 / M a s s a g a T r a
#define BT TRACE ANAL * c a
Analyzer.
*/
#define BT MEM 7 /* memory board. */
/*
* Slot descriptor.
*/
typedef struct
short board type;
short slot_id;
) SLOT DESC T;
/*************************************
* SC NAMED: Types and structures.
************************************/
#define SC MSG GROUP ( (long)( (' .
)
S'
8)
;
('C')
16 .
#define SC _REGISTER FIFO (1 ; SC 'MSGGROUP )
#define SC RESOLVE (2 ; _ MSG _
FIFO SC GROUP )
- #define SC _ (3 ; _ _ GROUP )
_ SC MSG
_REGISTER_NAME
#define SC _RESOLVE-NAME (4 ; _ MSG _
SC GROUP )
#define SC DELAY (5 ; _ MSG _
SC GROUP )
#define SC _ (6 ; _ MSG _
_GET SYS_CONFIG SC GROUP )
#define SC INIT COMPLETE (7 ; _ MSG _
SC GROUP )
_
#define R-MAX_NAME_LEN 32 /*...-Maximum process
name length. */
typedef struct {
R_MSGTYPE type;
short my-slot_id;
short sender_slot_id;
char name(K_MAX_NAME_LENJ;
t416_FIFO-DESC fifo desc;
short flags; /* flags defined
below */

CA 02358807 2001-10-11
-117-
} SC REGISTER_FIFO T;
/*
* SC_REGISTER
FIFO T flags:
S _
*/
#define NO CM ACCESS 1 /* can't access common
memory */
typedef struct {
R_MSGTYPE type;
short my-slot_id;
short dest_slot id;
M16_FIFO DESC fifo /* 0 => not found
desc; */
_
} SC RESOLVE_FIFO T;
typedef struct {
R_MSGTYPE type;
R_PID pid;
char name[R MAX NAME LEN];
} SC-REGISTER NAME T;
typedef struct {
R_MSGTYPE type;
R /* 0=> not found
PID pid; */
_ LEN ] ~ / * i npu
char name ( R MAX NAME t
_ _
*/
} SC-RESOLVE NAME T;
typedef struct {
R_MSGTYPE type;
SLOT_DESC T config(M16 MAX
VSLOTS];
} SC GET-SYS CONFIG-T; _
_
typedef struct {
~
R_MSGTYPE type;
short my_slot_id;
} SC_INIT COMPLETE T;

CA 02358807 2001-10-11
-118-
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* SC_TIMED: Types and structures.
********************************************/
#define SC_TIMED_REGISTER ( 101 ; SC )
MSG
GROUP
#define SC-REAL-TIME ( 102 _ )
_
; SC MSG GROUP
typedef struct {
K_MSGTYPE type;
K_PID client~id;
long max_update_period; /* in seconds. */
/* output */
long seconds; /* seconds since Jan. 1, 1970
*/
long microseconds; /* and micro seconds. */
} SC TIMED REGISTER T;
typedef struct {
K MSGTYPE type;
long seconds; /* seconds since Jan. 1, 1970
*/
long microseconds; /* and micro seconds. */
} SC REAL TIME T;
/*************************************
* SC ERRD: Types and Structures.
************************************/
/*
* SC ER RD message structures.
* Error log usage notes:
35' * - Must include "syslog.h"
* - Priority levels are:
* LOG _EMERG system is unusable
* LOG _ALERT action must be taken
* immediately
* LOG _CRIT critical conditions
* LOG _ERR error conditions
* LOG _WARNING warning conditions
* LOG _NOTICE normal condition
* LOG _INFO informational
* LOG -DEBUG debug-level messages
*/
#define SC_ERR_LOG_MSG (301 ; SC
MSG
GROUP )
#define SC_ERR-LOG-MSG2 _
_
(302 ; SC
MSG GROUP )
_
#define ERR-LOG_MSG_LEN (K-MSG_SIZE - sizeof(K_MSGTYPE)
- sizeof(short))

CA 02358807 2001-10-11
-119-
typedef struct {
K_MSGTYPE type; /* SC_ERR_LOG_MSG */
short priority_level;
char msg[ERR-LOG MSG LEN];
} SC-ERR-LOG_MSG T; - - /* message */
typedef struct {
K_MSGTYPE type; /* SC_ERR_LOG_MSG */
short id;
short filll; /* Message id */
/* Unused. */
union {
char c[80]; /* constants. */
short s[40];
long 1[20];
} data;
} SC-ERR-LOG MSG2 T;
d, S Facility Communication Trans-
actions
The communication transactions that the S
facilities of the present invention recognize, and
that the other messaging kernel layers of the present
invention recognize as appropriate to interact with the
S facility, are summarized in Table 10 below.

CA 02358807 2001-10-11
-120-
Table 10
Summary of S Communication Transactions
sp- noop msg ( SP_MSG,*** );
sp_ send config ( SEND_CONFIG_MSG,*** );
sp- receive_config ( RECEIVE_CONFIG
MSG,*** );
sp _r/w-sector _
( SP_RDWR_MSG,*** );
sp _r/w_cache~g ( SP_RDWR_MSG,*** );
sp -ioctl_req ( SP_IOCTL_MSG,*** );
sp _start_stop msp ( SP IOCTL_MSG,*** );
sp _inquiry msg ( SP MSG,*** );
sp -read message
buffer msg (
SP_MSG,*** );
sp -set-sp_interrupt-msg (SP_MSG,*** );
The S facility generally only responds to
communication transactions initiated by other faci.l-
ities. However, a few communication transactions are
initiated by the S facility at boot up as part of the
initial system configuration process.
Each S facility message utilizes the same block
message structure of the FS and NC facility messages.
The first word provides a message type identifier. A
second word is generally defined to return a completion
status. Together, these words are defined by a
SP HEADER structure:
typedef {
char reserved; /* byte 0
*/
char msg-code;
/* byte i
*/
char msg modifier; /* byte 2
*/
char memory-type; /* byte 3
*/
char complete-status; /* byte 4
*/
char bad-drive; /* byte 5
*/
char sense_key; /* byte 6
*/
char sense_code; /* byte 7
*/
} SP HEADER;

CA 02358807 2001-10-11
-121-
The reserved byte will be used by the other facilities
to identify a S facility message. Msg-code and
msg_modifier specify the S facility functions to be
performed. Memory_type specifies the type of VME
memory where data transfer takes place. The S facility
uses this byte to determine the VMEbus protocols to be
used for data transfer. Memory_type is defined as:
03 -- Primary Memory, Enhanced Block Transfer
O1 -- Local Shared Memory, Block transfer
00 -- Others, Non-block transfer
The completion status word is used by the S
facility to return message completion status. The
status word is not written by the S facility if a
message is completed without error. One should zero
out the completion status of a message before sending
it to the S facility. When a reply is received, one
examines the completion status word to differentiate a
k-reply from a k_nul?_reply.
20' The bad drive value specifies any erroneous disk
drive encountered. The higher order 4 bits specify the
drive SCSI ID (hence, the drive set); the lower order 4
bits specify the S facility SCSI port number. The
sense_key and sense-code are conventional SCSI error
identification data from the SCSI drive.
The currently defined S facility functions, and
identifying msg-code bytes are listed in Table 11.

CA 02358807 2001-10-11
- 122-
Table 11
S Facility Message Types
O1 - - No Op
S 02 - - Send Configuration Data
03 - - Receive Configuration Data
04 - - S facility IFC Initialization
05 - - Read and Write Sectors
06 - - Read and Write Cache Pages
07 - - IOCTL Operation
08 - - Dump S facility Local RAM
09 - - Start/Stop A SCSI Drive
OA - - not used
OB - - not used
OC - - Inquiry
OD - - not used
OE - - Read Message Log Buffer
OF - - Set S facility Interrupt
The message completion status word (byte 4-7 of a
message) is defined as:
Byte 00 -- completion status
O1 -- SCSI device ID and S facility SCSI port
number
02 -- SCSI sense key
03 -- SCSI sense code

CA 02358807 2001-10-11
-I23-
The completion status byte values are defined
below:
00 -- Completed without error
Oi -- Reserved
02 -- SCSI Status Error on IOCTL btessage
03 -- Reserved
04 -- An inquired message is waiting to be
executed
05 -- An inquired message is not found
06 -- VME data transfer error
07 -- Reserved
08 -- Invalid message parameter
09 -- Invalid data transfer count or VME data
address
OA -- S facility configuration data not
available
OB -- Write protect or drive fault
OC -- Drive off-line
OD -- Correctable data check
OE -- Permanent drive error or SCSI interface
error
OF -- Unrecovered data check
After receiving a message, the S facility copies
the contents into its memory. After a message's
function is completed, a k reply or k-null-reply is
used to inform the message sender. K_null-reply is
used when the processing is completed without error;
k_reply is used when the processing is completed with
error. When k reply is used, a non-zero completion
status word is written back to the original message.
Therefore, when a reply is received, a message sender
checks the status word to determine how a message is
completed. When k-null-reply is used, the original
message is not updated. The S facility simply
acknowledges the normal completion of a message.

CA 02358807 2001-10-11
-124-
If a message is not directed to a disk drive, it
is executed immediately. Disk I/O messages are sorted
and queued in disk arm elevator queues. Note, the
INQUIRY message returns either 04 or 05 status and uses
k_reply only.
The input parameters for this message are defined
as:
sp_noop msg ( SP MSG,*** );
The only parameter needed for this message is the
message header. The purpose for this message is to
test the communication path between the S facility and
a message sender. A k-null-reply is always used.
Send ConfiQUration Data
The input parameters for this operation are
defined as:
sp_send_conf ig ( SEND CONFIG-MSG,*** );
This message is used to inform the S facility about the
operating parameters. It provides a pointer pointing
to a configuration data structure. The S facility
fetches the configuration data to initialize its local
RAM. The configuration data is also written to a
reserved sector on each SCSI disk such that they can be
read back when the S facility is powered up. Hence, it

CA 02358807 2001-10-11
-125-
is not necessary to send this message each time the S
facility is powered up.
In the configuration data structure, vme_bus-re-
quest-level specifies the S facility data transfer
S request level on the VME bus. The access mode
specifies if the S facility should run as independent
SCSI drives or as a single logical drive. In the
latter case, number of disks should be same as
number of banks because all nine drives in a bank are
grouped into a single logical disk.
Total sector is the disk capacity of the attached
SCSI disks. Total capacity of a disk bank is this
number multiplying the number_of disks. When addition-
al disk banks are available, they could have sizes
different from the first bank. Hence, total sector is
a three entry array. Stripe-size is meaningful only
when the S facility is running as a single logical disk
storage subsystem. Different stripe sizes can be used
for different drive banks. Finally, online drive-bit--
map shows the drives that were online at the last
reset. Bit S of online drive bit map[1] being set
indicates drive 5 of bank 1 is online. Total sector
and online drive bit map could not and should not be
specified by a user.
The configuration data are written to the disks in
a S facility reserved sector, which is read at every S

CA 02358807 2001-10-11
-126-
facility reset and power up. When the configuration
data are changed, one must reformat the S facility
(erase the old file systems). when this message is
completed, a k-reply or k-null-reply is returned.
Receive Conficuration Data
The input parameters for this operation are
defined as:
sp-receive-config ( RECEIVE CONFIG MSG,*** );
This message requests the S facility to return
configuration data to a message sender. Vme~ointer
specifies a VME memory location for storing the
configuration data. The same configuration data
structure specified n the last section will be
returned.
Read and Write Sectors
The input parameters for this operation are
defined as:
sp-r/w-sector ( SP_RDWR MSG,*** );
Unlike most S facility messages, which are processed
immediately, this message is first sorted and queued.
Up to 200 messages can be sent to the S facility at one
time. Up to thirty messages are executed on thirty
SCSI drives simultaneously. The messages are sorted by

CA 02358807 2001-10-11
-127-
their sector addresses. Hence, they are not served by
the order of their arrivals.
There are two possible functions specified by this
message:
msg mod = 00 -- Sector Read
- 01 -- Sector Write
Scsi-id specifies the drive set number. Disk-number
specifies which SCSI port to be used. Sector count
specifies the number of disk sectors to be transferred.
For a sector read message, erase-sector_count specifies
the number of sectors in the VMS memory to be padded
with zeros (each sector is 512 bytes). For a sec-
for write message, erase-sector_count specifies the
number of sectors on the disk to be written with zeros
(hence, erased). To prevent sectors from being erased
inadvertently, a sector write message can only specify
. one of the two counters to be non-zero, but not both.
Sector address specifies the disk sector where read or
write operation starts. Vme address specifies a
starting VME memory location where data transfer takes
place.
There are three drive elevator aueues maintained
by the S facility for each SCSI port (or one for each
disk drive). The messages are inserted in the queue
sorted by their sector addresses, and are executed by
their orders in the queue. The S facility moves back
and forth among queue entries like an elevator. This

CA 02358807 2001-10-11
-128-
is done to minimize the disk arm movements. Separate
queues for separate disk drives. These queues are
processed currently because the SCSI drive disconnects
from the bus whenever there is no data or command
transfer activities on the bus.
If no error conditions are detected from the SCSI
drivels), this message is completed normally. When
data check is found and the S facility is running as a
single logical disks, recovery actions using redundant
data are started automatically. When a drive is down
and the S facility is running as a single logical
disk, recovery actions similar to data check recovery
will take place. Other drive errors will be reported
by a corresponding status code value.
R_reply or R-null-reply is used to report the
completion of this message.
Read/Write Cache Pages
The input parameters for this operation are
defined as:
sp_r/w-cache~g ( SF-RDWR_M:G, * * * ) ;
This message is similar to Read and Write Sectors,
except multiple vme-addresses are provided for
transferring disk data to and from disk sectors. Each
vme-address points to a memory cache page, whose size
is specified by cache page_size. When reading, data

CA 02358807 2001-10-11
-129-
are scattered to different cache pages; when writing,
data are gathered from different cache pages (hence, it
is referred to as scatter-gather function).
There are two possible functions specified by this
message;
msg mod = 00 -- Cache Page Read
- O1 -- Cache Page Write
Scsi-id, disk number, sector-count, and sector address
are described in Read and Write Sector message. Both
sector address and sector-count must be divisible by
cache~age_size. Furthermore, sector-count must be
less than 160 (or 10 cache pages). Cache~age-size
specifies the number of sectors for each cache page.
Cache pages are read or written sequentially on the
drivels). Each page has its own VME memory address.
Up to 10 vme addresses are specified. Note, the limit
of 10 is set due to the size of a S facility message.
Like the sector read/write message, this message is
, also inserted in a drive elevator queue first.
If no error conditions are detected from the SCSI
drivels), this message is completed normally. When an
error is detected, a data recover action is started.
When there is a permanent drive error that prevents
error recovery action from continuing, an error status
code is reported as completion.
R-reply or h_null-reply is used to report the
completion of this message.

CA 02358807 2001-10-11
-130-
IOCTL Request
The input parameters for this operation are
defined as:
sp-ioctl-req ( SP-IOCTL_MSG,*** );
This message is used to address directly any SCSI disk
or peripheral attached to a SCSI port. Multiple
messages can be sent at the samQ.time. They are served
in the order of first come first serve. No firmware
error recovery action is attempted by the S facility.
Scsi-id, scsi port, and scsi_lun address identify
uniquely one attached SCSI peripheral device.
Command-length and data_length specify the lengths of
command and data transfers respectively. Data buffer-
-address points to a VME memory location for data-
transfer. The command bytes are actual SCSI command
data to be sent to the addressed SCSI peripheral
device. Note, the data length must be multiples of 4
because the S facility always transfers 4 bytes at a
time. Sense_length and sense addr specify size and
address of a piece of VME memory where device sense
data can be stored in case of check status is received.
These messages are served by the order of their
arrivals.
When this message is terminated with drive error,
a corresponding status code is returned. R-reply and

CA 02358807 2001-10-11
-131-
k_null_reply are used to report the completion of this
message.
Start/Sto~ SCSI Drive
S The input parameters for this operation are
defined as:
sp_start-stop_msp ( SP_IOCTL-MSG,*** );
This message is used to fence off.any message to a
specified drive. It should be sent only when there is
no outstanding message on the specified drive. Once a
drive is fenced off, a message directed to the drive
will receive a corresponding error status back.
When the S facility is running as a single logical
disk, this message is used to place a SCSI disk drive
in or out of service. Once a drive is stopped, all
operations to this drive will be fenced off. In such
case, when the stopped drive is accessed, recovery
actions are started automatically. When a drive is
restarted, the data on the drive is automatically
reconfigured. The reconfiguration is performed while
the system is online by invoking recovery actions when
the reconfigured drive is accessed.
When a drive is reconfigured, the drive configura-
tion sector is updated to indicate that the drive is
now a part of a drive set.

CA 02358807 2001-10-11
- 132--
Message Inquiry
The input parameters for this message are defined
as:
sp-inquiry-msg ( SP MSG,*** );
S This message requests the S facility to return the
status of a message that was sent earlier. A k_reply
is always used. The status of the message, if
available in the S faci~ity bu~fers, is returned in the
completion status word.
This message is used to verify if a previous
message was received by the S facility. If not, the
message is lost. A lost message should be resent.
Message could be lost due to a local board reset.
However, a message should, in general, not be lost. If
messages are lost often, the S facility should be
considered as broken and fenced off.
Read Message Loa
The input parameters for this message are defined
as:
sp read message buffer msg ( SP MSG,*** );
The S facility keeps a message buffer which contains
the last 200 messages. Data buffer specifies a piece
of VME memory in which the messages are sent.
Number of message should not exceed 200. Each message
is 128 bytes long as defined at the beginning of this

CA 02358807 2001-10-11
-133-
Section. An application program must allocate a buffer
big enough to accommodate all returned messages.
Normally this message is sent when there is no
active messages. Otherwise, it is very difficult to
S determine how many used messages are in the S facility
message buffer. For example if there are 200 active
messages, there will be no used ones in the message
buffer. Where there are les::; than requested messages
in the message buffer, 128 bytes of zeros are trans-
mitted for each shortage. R reply and k null_reply are
used for the completion of this message.
SP Interrupt
The input parameters for this message are defined
as:
sp_set_sp-interrupt_msg (SP MSG,*** );
This message tells the S facility to pass control to an
on-board debug monitor, as present in the SP boot rom.
After completing this message, the S facility no
longer honors any messages until the monitor returns
control. A k-null-reply is always returned for this
message.

CA 02358807 2001-10-11
-139-
The S facility message structures are listed
below:
typedef struct psa-msg { /* A Template
Message */
SP_HEADER header;
vme_t vme
addr;
u _
long data
length;
_ -
u-long sram
addr;
u_char _ E 2];
msg-body[K MSG_SIZ -
3
void (*rtnadr) (); /* return
address
of
~ a message */
ready
struct psa -msg *rblink; points a work area
/* to
or msg ink /
l *
u_long start
time;
} SP_MSG; _
typedef struct {
char vme
bus request level;
char -
access_mode;
char number_of_disks;
char number_of_banks;
short firmware_revision;
short hardware_revision;
int total
sector[3]
int _
stripe size[3]
int online drive bit
map[3]
] config data;
typedef struct {
SP header; /* byte 0-7
HEADER
_ /* byte 8-11 */
config data *vme
ptr;
long data_length; /* byte 12-15 sizeof
config
data
*/
} SEND CONFIG MSG; _
typedef struct [
SP header;
HEADER /* byte 0-7 */
_
conf ig data *vme~ointer;
long data_length;
} RECEIVE_CONFI G_MSG;
typedef struct {
SP_HEADER header; /* byte 0-7 */
char scsi_id; /* byte 8 */
char disk_number; /* byte 9 */
short reserved;
/* byte 10-11 */
short sector /* byte 12-13 */
count;
short - byte 14-15 */
erase
sector
count; /*
long _ /* byte 16-19 */
_
sector
address;
u_long _ /* byte 20-23 */
vme address;
/ SP RDWR MSG;

CA 02358807 2001-10-11
-135-
typedef struct {
SP_HEADER header; /* byte 0-7 */
char scsi_id; ;* byte 8 */
char disk number;
/* byte 9 */
short reserved; /* byte 10-11
*/
short sector-count; /* byte 12-13
*/
short cache~age /* byte 14-15
size; */
long _ /* byte I6-19
sector_address; */
u_long vme_address(10); /* byte 20-23
*/
} SP-RDWR MSG;
typedef struct {
SP_HEADER header; /* byte 0-7 */
har scsi_id;
= /* byte 8 */
char scsi /* byte 9 */
port;
char _ /* byte 10 */
scsi
lun
address;
char - /* byte 11 */
_
command
length;
u_long - /* byte 12-15
data length; */
u_long data_buffer_address;/*byte 16-19
*/
char command /* byte 20-39
bytes(20]; */
u_long _ /* byte 40-43
sense-length; */
u_long sense /* byte 44-47
addr; */
} SP_IOCTL -
MSG;
_
IV Start-up~erations
A. IFC Initialization
The chart below summarizes the system operations
that occur during system boot.

CA 02358807 2001-10-11
-136-
Table 12
Summary of System Initialization
Phase 1; All peer-level processors
f
boot to "boot-level" ready state;
)
Phase 2: The host boot level facility
{
boot Unix image through boot-level S facility;
execute Unix image;
start SC_NAME_SERVER process;
}
Phase 3: The host facility
{
for each boot-level facility {
probe for existence;
initialize FIFO for receiving;
)
for each ( SP NC FC ) {
read boot image and parameters from boot-
level S facility;
download boot image and boot parameters
(including the PID of the SC NAME-SERVER
process) to the shared memory program
store of the peer-level processor;
start controller;
)
Phase 4: Each peer-level processor
{
begin executing facility image
initialize controller {
send SC_REG_FIFO to SC_NAME_SERVER;
send SC_GET_SYS_CONF to SC_NAME_SERVER;
send SC INIT CMPL to SC NAME SERVER;
} _ _ _ _
start manager processes {
send SC_REG_NAMEs to SC_NAME_SERVER;
send SC_RESOLVE_NAMEs to SC_NAME_SERVER;
send SC RESOLVE FIFOs to SC NAME SERVER;
} _ _ _
)

CA 02358807 2001-10-11
-137-
The SP peer-level processors boot from onboard
EPROMs. The SP boot program, in addition to providing
for power-on diagnostics and initialization to a ready
state, includes a complete S facility. Thus, the SP
S peer-level processor is able to perform SCSI disk and
tape operations upon entering its ready state. In
their ready states, the NC, FC, SP and H processors can
be downloaded with a complete instantiation of their
respective types of facilities. The downloaded program
is loaded into local shared memory; for the S facility,
for example, the program is loaded into its local 256h
static ram. The ram download, particularly to static
ram, allows both faster facility execution and use of
the latest release of the facility software.
After powering up or resetting the SP processor,
the host facility, executing its boot program, waits
for the SP boot program to post ready by indicating a
ready state value in an SP status register.
Once the S boot program has posted ready, a Sector
Read message from the host boot program can be used to
retrieve any disk block to any VME memory location.
Generally, the read request is to load the host
facility from disk block 0, the boot block. In
preparing a read-sector message for the S facility
after power up, the local host boot program specifies

CA 02358807 2001-10-11
-138-
the following (in addition to normal read_sector
message contents):
sender~id=Oxffffffff
dest~id=0x00000001
By specifying the above, the local host boot program
signals the S facility to bypass normal IFC reply
protocols and to, in turn, signal a reply complete by
directly by changing the Oxfffffiff message value in
the original message image to any other value, such as
the value of the message descriptor. That is, after
building a read sector message, the host boot program
writes a message descriptor to the S facility. The
host boot program can then poll this sender~id word to
determine when the message is completed. Messages to
the S facility are sent in this manner until the full
host facility boot is complete.
Once the local host boot program has loaded the
host facility and begun executing its initialization,
the host facility generally switches over to normal IFC
communication with the S facility. To do this, local
host facility sends an IFC Initialization message to
the S facility. After receiving this message, the S
facility expects a shared memory block, as specified by
the message, to contain the following information:
o Byte 00-03 - Bootlock, provides synchronization with
the local host facility
o Byte 04-05 -- S facility board slot id,
o Byte 06-07 -- Reserved,

CA 02358807 2001-10-11
-139-
o Hyte 08-09 -- This board's IFC virtual slot ID
o Byte 10-11 -- System controller process number,
o Byte 12-27 -- System controller fifo descriptor
Byte 00-O1 -- System controller fifo type,
S Byte 02-03 -- System controller slot id
Byte 04-07 -- Fifo address
Byte 08-09 -- Soft fifo index,
Byte 10-11 -- Soft fifo index mask,
Byte 12-13 -- Interrupt request level,
Byte 14-15 -- Interrupt vector address,
o Byte 28-31 -- Address of this common memory, and
o Byte 32-35 -- Size of this common memory.
o Byte 36-39 -- Hardware fifo address of the S facility
The first thing the S facility does is check the
bootlock variab le. When it is set to a "BOOTMASTER"
value, it means the local host facility is up and ready
to receive message
from the S facility.
Otherwise, the
S facility waits
for the local
host facility
to
complete its own
initialization
and set the bootlock
word. As soon as the bootlock word is changed, the
S
facility procee ds to perform IFC initialization. The
following IFC messages are sent to the local host
facility:
1. Register FIFO
2. Get System Configuration
3. Initialization Complete
4. Register Name
5. Resolve FIFO
The second message allows the S facility to'know
who is in what VME slots within the system. The S
facility will only register one name, "SPn" (n is
either 0 or 1), with a processor ID of 1. Hence all

CA 02358807 2001-10-11
-140-
messages directed to the S facility specify PID -
SP-SLOT « 16 + 0x0001. Basically, a processor ID
(PID) is a 4-byte word, in which the higher order two
bytes contain the processor's VME slot ID. The lower
order two bytes identify a process within a processor.
The register FIFO message formally informs the
local host facility about the S facility's fifo
address. The get system configuration message
retrieves a table describing all available processors
from the local host facility. After completing
initialization, using the Initialization Complete
message, the S facility advertises its services by
issuing the Register Name message, which informs the
host facility that the S facility service process is up
and running. When another facility sends a message to
the S facility for the first time, the S facility uses
a Resolve FIFO message, directed to the host facility,
to obtain the fifo address needed for a reply.
Thus, a multiple facility operating system
architecture that provides for the control of an
efficient, expandable multi-processor system
particularly suited to servicing large volumes of
network file system requests has been described.
Clearly, many modifications in variations of the
present invention are possible in light of the above

CA 02358807 2001-10-11
141-
teachings. Therefore, it is to be understood that
within the scope of the appended claims, the principles
of the present invention may be realized in embodiments
other than as specifically described herein.

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Inactive: Expired (new Act pat) 2010-08-20
Inactive: Late MF processed 2007-06-06
Letter Sent 2006-08-21
Grant by Issuance 2005-11-01
Inactive: Cover page published 2005-10-31
Inactive: Final fee received 2005-07-18
Pre-grant 2005-07-18
Notice of Allowance is Issued 2005-01-18
Letter Sent 2005-01-18
Notice of Allowance is Issued 2005-01-18
Inactive: Approved for allowance (AFA) 2004-12-24
Inactive: IPC assigned 2004-12-08
Inactive: IPC assigned 2004-12-08
Inactive: Adhoc Request Documented 2004-12-07
Inactive: Adhoc Request Documented 2004-12-07
Inactive: Delete abandonment 2004-12-07
Inactive: Delete abandonment 2004-11-15
Inactive: Abandoned - No reply to s.30(2) Rules requisition 2004-09-03
Inactive: Abandoned - No reply to s.29 Rules requisition 2004-09-03
Amendment Received - Voluntary Amendment 2004-09-02
Inactive: S.30(2) Rules - Examiner requisition 2004-03-03
Inactive: S.29 Rules - Examiner requisition 2004-03-03
Letter Sent 2003-11-18
Inactive: Cover page published 2001-12-21
Inactive: Office letter 2001-11-20
Inactive: First IPC assigned 2001-11-09
Letter sent 2001-10-26
Inactive: Applicant deleted 2001-10-24
Divisional Requirements Determined Compliant 2001-10-24
Application Received - Regular National 2001-10-24
Application Received - Divisional 2001-10-11
Request for Examination Requirements Determined Compliant 2001-10-11
All Requirements for Examination Determined Compliant 2001-10-11
Application Published (Open to Public Inspection) 1991-04-04

Abandonment History

There is no abandonment history.

Maintenance Fee

The last payment was received on 2005-08-10

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
NETWORK APPLIANCE, INC.
Past Owners on Record
ALLAN SCHWARTZ
DAVID HITZ
GUY HARRIS
JAMES LAU
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Representative drawing 2001-11-22 1 3
Description 2001-10-11 141 4,316
Claims 2001-10-11 7 284
Drawings 2001-10-11 9 194
Abstract 2001-10-11 1 32
Cover Page 2001-12-20 1 45
Description 2004-09-02 141 4,307
Claims 2004-09-02 6 249
Drawings 2004-09-02 9 189
Representative drawing 2005-10-12 1 3
Cover Page 2005-10-12 1 46
Commissioner's Notice - Application Found Allowable 2005-01-18 1 162
Maintenance Fee Notice 2006-10-16 1 173
Late Payment Acknowledgement 2007-06-18 1 166
Correspondence 2001-10-24 1 43
Correspondence 2001-11-20 1 15
Correspondence 2005-07-18 1 36
Fees 2007-06-06 1 38