Methods and systems for a storage system | Patent Number 07197662
US 07197662 B2Melvin James Bullen
David James Herbison
William Thomas Lynch
David James Herbison
Steven Louis Dodd
William Thomas Lynch
Steven Louis Dodd
A storage system that may include one or more memory sections, one or more switches, and a management system. The memory sections include memory devices and a section controller capable of detecting faults with the memory section and transmitting messages to the management system regarding detected faults. The storage system may include a management system capable of receiving fault messages from the section controllers and removing from, service the faulty memory sections. Additionally, the management system may determine routing algorithms for the one or more switches.
The present application relates to the U.S. patent application Ser. No. 10/284,278 by M. James Bullen, Steven L. Dodd, David J. Herbison, and William T. Lynch, entitled “Methods and Systems for a Storage System Including and Improved Switch,†and the U.S. patent application Ser. No. 10/284,268 by M. James Bullen, Steven L. Dodd, David J. Herbison, and William T. Lynch, entitled “Methods and Systems for a Memory Section,†both of which are incorporated by reference herein in their entireties.
The present invention relates to data storage, and more particularly, to methods and systems for a high throughput storage device.
A form of on-line transaction processing (OLTP) applications requiring a high number of data block reads or writes are called H-OLTP applications. A large server or mainframe or several servers typically host an H-OLTP application. Typically, these applications involve the use of a real time operating system, a relational database, optical fiber based networking, distributed communications facilities to a user community, and the application itself. Storage solutions for these applications use a combination of mechanical disk drives and cached memory under stored program control. The techniques for the storage management of H-OLTP applications can use redundant file storage algorithms on multiple disk drives, memory cache replications, data coherency algorithms, and/or load balancing.
A brief overview of the storage management technologies of cached disk arrays (CDAs) and solid-state disk storage systems (SSDs) follows.
Cached disk arrays (CDAs) combine disk drives and solid-state memory systems under common program control. The disk drives in CDAs are servo-mechanical devices. Advances in motor technology currently allow the platters of the disk drives to spin at 15,000 revolutions per minute; advanced systems may spin their platters at 18,000 revolutions per minute.
CDAs combine several racks of rotating disks with a common memory cache in an architecture where capacity may be added through the addition of more racks of devices, more cache, or both. CDAs often are used by companies to provide storage services in their mission critical applications, including H-OLTP applications.
The on-board cache of a CDA stores frequently used data because access times for data in cache memory can be short relative to access times for data on the drives. Such high-end storage system devices with rotating media, such as CDAs, include less than ideally desirable characteristics in terms of total throughput and memory cache size.
A solid-state disk (SSD) is a storage device corresponding to the solid-state memory attached to a computer's central processing unit through its internal bus structure. To an external computer (server or mainframe) the SSD appears as a very fast disk drive when it is directly attached to the computer over a fast communications link or network. Operating under stored program control, SSDs store frequently used information like transaction logs, database indices, and specialized data structures integral to the efficient execution of a company's mission critical applications.
It would be desirable for large capacity storage to provide sufficient throughput for high-volume, real-time applications, especially, for example in emerging applications in financial, defense, research, customer management, and homeland security areas.
Accordingly, the present invention is directed to methods and systems that address the problems of prior art.
In accordance with the purposes of the invention, as embodied and broadly described herein, methods and systems for an apparatus are provided including one or more memory sections, one or more switches, and a management system. The one or memory sections include one or more memory devices capable of storing data in storage locations, and a memory section controller capable of detecting faults in the memory section and transmitting a fault message in response to the detected faults. The one or more switches include one or more interfaces for connecting to one or more external devices, and a switch fabric connected to one or more memory sections and the external device interfaces and interconnecting the memory sections and the external device interfaces based on an algorithm. A management system is provided capable of receiving fault messages from the memory section controllers and removing from service the memory section from which the fault message was received, and wherein the management system is further capable of determining an algorithm for use by a switch fabric in interconnecting the memory sections and the external device interfaces, and instructing the switch to execute the determined algorithm.
The summary and the following detailed description should not restrict the scope of the claimed invention. Both provide examples and explanations to enable others to practice the invention. The accompanying drawings, which form part of the description for carrying out the best mode of the invention, show several embodiments of the invention, and together with the description, explain the principles of the invention.
FIG. 1
is a block diagram of a storage hub environment, in accordance with methods and systems provided;FIG. 2
is a more detailed block diagram of a storage hub, in accordance with methods and systems provided;FIG. 3
illustrates a logical architecture for a management complex, in accordance with methods and systems provided;FIG. 4
is a block diagram of a physical architecture for a management complex, in accordance with methods and systems provided;FIG. 5
is a block diagram of a exemplary memory section, in accordance with methods and systems provided;FIG. 6
illustrates a functional diagram of a switch and memory section, in accordance with methods and system consistent with the invention;FIG. 7
illustrates an alternative functional diagram of a switch and memory section, in accordance with methods and systems provided;FIG. 8
illustrates a diagram of an alternative exemplary switch, in accordance with methods and systems provided.FIG. 9
illustrates a diagram of an alternative switch, in accordance with methods and systems provided;FIG. 10
illustrates an exemplary pipeline shift register, in accordance with methods and systems provided;FIG. 11
includes a more detailed block diagram of an exemplary embodiment of a memory interface device, in accordance with methods and systems provided;FIG. 12
illustrates a flow chart for an exemplary writing operation, in accordance with methods and systems provided;FIG. 13
illustrates a flow chart for an exemplary reading operation, in accordance with methods and systems provided;FIG. 14
illustrates a logical diagram of partitioned memory devices, in accordance with methods and systems provided;FIG. 15
illustrates an alternative embodiment of a memory interface devices, in accordance with methods and systems provided; andFIG. 16
illustrates an alternative memory section, in accordance with methods and systems provided.Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
FIG. 1
is a block diagram of one embodiment storage hub environment, in accordance with methods and systems provided. As illustrated, the storage hub environment includes a storage hub 10, servers 12-1 and 12-2, external management systems 14-1 and 14-2, a non-volatile storage device 16, an IP network 18 and a connection to another network 20. The storage hub 10 may include a large amount of storage (not shown) and stores the data in data blocks. Although the data may be stored in data blocks, any other mechanism for storing the data may be used without departing from the scope of the invention. The non-volatile storage device 16 may be a magnetic storage device, such as a CDA as described above. The non-volatile storage device 16 may be used to store back-up versions of the data stored by the storage hub 10.The description below is organized in the following manner. First, a brief overview of the storage hub 10 environment illustrated in
FIG. 1
is presented. Then, more detailed descriptions of the components of the storage hub 10 are presented, after which a more detailed description of exemplary methods for writing data to the storage hub, reading data from the storage hub 10, and a testing operation for the storage hub 10 are presented. Then, exemplary alternatives to these components are presented. It should, however, be understood that these are all exemplary descriptions regarding example methods and systems for implementing the invention. As such, one of skill will recognize that there are other methods and systems that may be used for practicing the invention that is defined by the claims of this application.The servers 12-1 and 12-2 are, for example, standard commercially available servers or farms of servers that can be connected to internal or external networks (not shown). For example, the servers 12-1 and/or 12-2 may be connected to an internal network such as an Ethernet for receiving requests for the retrieval or storage of information from end users connected to the network. Alternatively, the servers 12-1 and/or 12-2 could be connected to external networks, such as the Internet, for receiving requests for retrieval or storage of information from end users connected to the external network. Further, although two servers 12-1 and 12-2 are illustrated, the storage hub 10 may be connected to any number of servers 12.
When an application being executed by the server 12 requires data, the server 12 determines if the storage hub 10 stores the data. The servers 12 may store a record showing whether the data their applications require is on the storage hub 10. The server 12 then sends a data request to the storage hub 10 requesting the data. The storage hub 10 reads the data from the location in which it is stored and sends it to the server requesting the data 12-1 or 12-2. The server may run different types of applications and database management systems that may require data from the storage hub 10. Examples of typical applications include, by way of example only, billing systems, customer relationship management systems, reservations systems, ordering systems, security systems, etc. Examples of database management systems include ORACLE, DB2, Sybase, Informix, etc.
Additionally, the storage hub 10 may receive a request from a server 12-1 or 12-2 to store data. Thereafter, the storage hub 10 preferably provides the server 12 with either an acknowledgement that the write occurred (i.e., the storage of the data) or a failure message. Such messages could include, for example, an acknowledgement that the data block was safely stored on both the storage (not shown) in the storage hub 10 and on the CDA 16 when a CDA 16 is used as backup for the storage hub 10, an acknowledgement that the data block is safely stored in the storage hub's 10 storage (not shown), no acknowledgement of any sort, or a failure message.
The external management system 14 may be directly connected to the storage hub 10, such as external management system 14-2. Or, the external management system 14 may be connected to the storage hub 10 via a network, such as external management systems 14-1 that is connected to the storage hub 10 via network 18. Network 18 may be any type of network, such as an internal Ethernet network, an IP network, or the Internet. Although
FIG. 1
illustrates both external management systems 14-1 and 14-2 connected to the storage hub 10, in other implementations there may be only one or any number of external management systems, or an external management system 14 need not be included. For example, in other implementations it may be desirable to have 3 or more external management systems. Additionally, the external management system may be a computer running proprietary or commercially available software, such as, for example, HP Openview. The storage hub 10 may provide surveillance and administration information to the external management system 14, such as the status and location of stored data blocks.FIG. 2
illustrates a more detailed block diagram of the storage hub 10, in accordance with methods and systems provided. As illustrated, the storage hub 10 includes a switch or switches 22-1 and 22-2, a management complex 26, and memory sections 30-1 thru 30-n. In this embodiment, both switches 22-1 and 22-2 may be active or one of the switches may be active while the other is a redundant switch for use in the event the active switch suffers a problem. AlthoughFIG. 2
illustrates two switches, the storage hub 12 may include only one switch or any number of switches.In
FIG. 2
, server 12-2 connects to the storage hub 10 via a network 20 thru an input/output (I/O) controller 24. The network may be any type of internal or external network, such as an Ethernet network or the Internet. The I/O controller 24 preferably is an appropriate I/O controller for connecting to the particular network 20. Preferably, the I/O controller 24 converts signals between a native protocol of the network 20 and a local protocol used by the storage hub 10. Potential protocols include, but are not limited to, Telecommunications Control Protocol/Internet Protocol (TCP/IP), System Network Architecture (SNA)-based protocols, Serial Communications Control Interface (SCCI), Intelligent Serial Communications Interface (ISCI), Fibre Channel, Infiniband, and other third generation input/output (3GIO) protocolsThe memory sections 30 preferably include the storage for the storage hub 10 along with other hardware for accessing the storage. As used herein, the term “memory section†refers to any subsystem including one or more memory devices that may be used for storing information. This architecture is applicable to any device that can store data. Thus, when the storage hub 10 receives a request to store data, the data is forwarded to a memory section 30, which stores the data. Likewise, when a request for data is received by the storage hub 10, the request is directed to the memory section 30 storing the requested information. The memory section 30 then reads the requested data, after which it is sent to the server 12 requesting the data. More detailed descriptions of exemplary memory sections 30 and their operations are presented below.
The management complex 26 of the storage hub 10 performs management-type functions for the storage hub 10 and connects the storage hub 10 with the external management system 14. As used herein the term “management complex†refers to any software and/or hardware for performing management of the storage hub 10. A more detailed description of the management complex 26 is presented below.
The I/O Controller 24 and switches 22-1 and 22-2 are preferably under common management control by the management complex 26 to allow data blocks to be sent to and received from the storage hub in the native protocol of the network 20.
Each server 12-1 and 12-2 preferably includes a device driver 28-1 and 28-2, respectively. The device driver 28 is a program running in software on a server that permits applications on the server to cause data to be read from or written to (i.e., stored in) the storage hub 10. When a server 12 receives a request to read or write data, the device driver 28 of the server 12 forwards the request to the switch in the storage hub 10. The device driver 28 may be, for example, a standard device driver supplied as part of server-resident software, or it may be, for example, proprietary software supplied by a vendor of storage devices. Additionally, in some applications, the device driver 28 may be independent of any application resident on the server.
The switches 22-1 and 22-2 are connected to the server 12-1, the I/O controller 24, the CDA 16, the memory sections 30-1 thru 30-n, and each other via an industry standard communications interface protocol. These communications interface protocols may be, for example, Fibre Channel, Asynchronous Transfer Mode (ATM), Ethernet, Fiber Distributed Data Interface (FDDI) a Systems Network Architecture (SNA) interface, or X.25. Any type of physical connection, e.g., copper or fiber optic cables, may be used for connecting these various components. The management complex 26 is preferably connected to the switches 22, memory sections 30-1 thru 30-n, the I/O controller 26, and the external management system 14 via gigabit Ethernet connections. Although these are preferable connections, persons skilled in the art will recognize there are numerous other protocols and physical media that may be used to connect these devices. Further, the memory sections 30 may simultaneously support multiple protocols and physical media for connecting these devices.
The switches 22 may be any type of switch using any type of switch fabric, such as, for example, a time division multiplexed fabric or a space division multiplexed fabric. As used herein, the term “switch fabric†the physical interconnection architecture that directs data from an incoming interface to an outgoing interface. For example, the switches 22 may be a Fibre Channel switch, an ATM switch, a switched fast Ethernet switch, a switched FDDI switch, or any other type of switch. The switches 22 may also include a controller (not shown) for controlling the switch.
For write operations, the data block, in addition to being written to the memory sections 30 of the storage hub 10, may also be written to the cached disk array 16 or another storage hub (not shown). After the data is written, the storage hub 10 may send an acknowledgement to the device driver 28 of the server 12 depending upon the configuration management parameters in the management complex 26. Examples of configuration management parameters are status parameters, write-acknowledgement parameters, routing parameters, reporting interval parameters, and the current date and time.
For a read data block request and at the request of the device driver 28 requesting the data block, the switches 22 direct the request to the appropriate memory section 30, which retrieves the data block and transmits it through a switch 22 to the device driver 28 of the server 12 from which the request originated.
During read and write data block operations and depending on the configuration management parameters in the management complex 26, the memory section 30 gathers administrative data that it sends to the management complex 26. The management complex 26 then makes this data available to the external management system 14.
Additionally, the management complex 26 may gather and provide the external management system 14 with surveillance and administrative information. Surveillance information may include, for example, memory section heartbeats (i.e., a signal that shows that the memory section can still communicate), alarms, and acknowledgement of alarms. Administration information may include, for example, statistics about data read and written, statistics about the number of active memory sections, statistics about memory section availability, and reports that present the preceding information to the external management system.
The external management system 14 may also provide the management complex 26 with configuration management data. This configuration management information may include, for example, valid communications network addresses, a period for heartbeat intervals, data block sizes, and command sets.
The storage hub 10 may also perform bit-level error recovery using standard means available in the industry. For example, error correction codes (ECC), also referred to as error detection and correction (EDAC) codes, using circuitry and/or software may be used to test data for its accuracy. These codes and techniques include parity bit or cyclic redundancy checks, using multiple parity bits in order to detect and correct errors, or more advanced techniques (e.g., Reed-Solomon codes) to detect multiple errors. Further, each memory section 30 of the storage hub 10 may include its own error correction scheme.
The following provides a more detailed description of the components of the storage hub 10 illustrated in
FIG. 2
: the management complex 26, the switches 22, and the memory sections 30. After which, more detailed descriptions of exemplary reading, writing, and testing operations are presented. Then, alternative exemplary embodiments of the memory sections 30 are provided along with exemplary characteristics of the storage hub 10 and its components.FIG. 3
illustrates a logical architecture for a management complex 26, in accordance with methods and systems provided. As illustrated, the management complex 26 may include functions that manage administrative processes 32 and functions that manage control processes 34. These management functions can include one or more central processing units (CPUs) for executing their respective processes. Additionally, the management complex 26 may use one or more application program interfaces (APIs) for communications between these functions.FIG. 4
is a block diagram of a physical architecture for a management complex 26, in accordance with methods and systems provided. As illustrated, the management complex includes one or more control processors 34-1 thru 34-n, a shared memory 36, one or more administration processors 32-1 thru 32-m, a storage device 38, and a communications network 40. As discussed above, the control processors 34 may include one or more central processing units (CPUs). These control CPUs 34-1 thru 34-n interface with the shared memory 36. The communications network 40 may be an internal network and may use any type of communications protocol, such as Gigabit Ethernet.One or more of the control processor (e.g., 34-1 thru 34-m) may function as the master(s), while remaining control processors (e.g., 34-(m+1) thru 34-n) may be kept in a hot standby mode, so that they can be quickly switched to in the event one of the master control processor (e.g., 34-1) fail.
The control CPU's 34 may be attached to a communications network, such as a Gigabit Ethernet network, and be directly attached to the magnetic storage device 38.
The administrative processors 32 each may include a memory (not shown) and also be attached to the communications network 40. These administration processors may also connect to the magnetic storage device 38. The magnetic storage device 38 stores various control and administrative information from the control processors 34 and administration processors 32. The magnetic storage device 38 may be any type of magnetic storage device, such as, for example, servo-mechanical disc drives. In other embodiments, the storage device 38 need not be included.
The control processors 34 perform configuration management functions for the memory sections 30, I/O controllers 24, switches 22, and the device drivers 28 of the servers 12. As used herein, the term “configuration†is a broad term that encompasses the various possible operating states of each component of the storage hub. As used herein, an “operating state†refers to a possible way in which the storage hub or one of its components operates as defined by parameter values. These parameter values, for example, may be set by a user of the storage hub, such as, for example, a system administrator, through, for example, an external management system 14. Operating states may include, for example, how often a component (e.g., a memory section 30) sends performance statistics to the management complex 26, the list of events that causes a component (e.g., a memory section, etc.) to report an alarm, and/or the type of alarm reported (e.g., catastrophic failure of component, minor fault with component, etc.). Further, as used herein, the term “configuration management†means the understanding of the current operating states of the storage hub's components and the capability to react to changes in the states of those components as defined by software running in the control processors 34. For example, the control processors 34 may control in real time the number of active memory sections 30 in the storage hub 10, the switches 22, and the device drivers 28 of the servers 12, if any, and any external servers 22 connected to the storage hub.
The software in the control processors 34 may also be capable of bringing new memory sections into service and taking memory sections out of service independently of other functions that the management complex performs and without materially affecting the operation of other memory sections 30 or adversely affecting the overall performance of the storage hub. The instructions to perform this function are carried from the control process 34 to the switches 22 and may be carried to the device drivers 28 in the servers 12. In the case that new capacity is added to the storage hub 10, then it is possible to bring new memory sections 30 into service with the software capability in the control processors 32. In the case that a memory section 30 has failed, then the faulty memory section 30 may be replaced and a new one brought into service. A further description of fault management follows.
The control processors 34 may also, for example, be able to perform fault management for the storage hub 10. The term “fault management†as used herein means attempting to detect faults and take corrective action in response to the detection of a fault. For example, the control processors may recognize an operational failure of a memory section 30 or part of a memory section 30 and re-map data to working memory sections 30. Then, the control processors 34 may communicate this re-mapping to the external management system 14 and the device drivers 28 running on servers 12 attached to the storage hub 10.
The control processors 34 may also manage “bad-block†remapping functions when a memory section fails 30 and the writing of data to the magnetic storage device 38 in the event of power failures. Bad block remapping is a process wherein data blocks discovered by the section controller 54 or management complex 26 to be in a damaged memory device are, if possible, recovered.
For example, if the control processors 34 discover that block 65,000 in memory section 30-2 does not read correctly, the control processor 34 may decide to remap block 65,000 in memory section 30-2 to block location 1,999,998 in memory section 30-2. The control processor 34 may then direct the CDA 16 to read the data block and cause it to be written in location 1,999,998 in memory section 30-2. Once completed, the control processor 34 may inform the switches 22 and memory section 30-2 that block 65,000 may now be read from location 1,999,998.
As another example of bad block remapping, if for example only one memory device on a memory section is faulty, a control processor 34 in the management complex 26 may inform the section controller 54 about the bad device, determine where the data on the faulty memory device is backed-up (e.g., CDA 16), and direct the backed-up data to be loaded into a replacement memory device on the same memory section or on a different memory section. In the latter case, the management complex also informs the switch about the data being relocated to a new memory section.
As yet another example, in the event the control processors 34 determine that a memory section 30 is faulty, the control processors 34 may direct that the entire memory section 30 is taken out of service and that a replacement memory section takes its place. To accomplish this, the control processors 34 may, for example, direct the CDA 16 to transfer a back-up version of the data for the faulty memory section 30 to another memory section 30-N that may be, for example, a spare memory section 30 for use in the event a memory section 30 goes bad. The new memory section 30-N then may operate as though it were the now faulty memory section 30. The control processors 34 may then communicate this information to the various device drivers 28 and the external management system 14.
The control processors 34 may also provide the memory sections 30, the switch controller(s) 202, and the I/O Controllers 24 with updated and new software. For example, if software used by the memory sections 30 or the switches 22 become corrupted and/or fails, the control processors 34 can load backup copies of current or previous versions of a software image from its storage 38. A software image is a binary object code that may be run directly by a computer. The software image for the control processor 34 in one embodiment is stored on the magnetic storage 38. Further, the control processors 34 may also control the loading of a data block from the CDA 16 into the memory sections 30 and visa versa.
In addition, the control processors 34 may receive information such as, for example, the time a component sent an alarm or the total elapsed time a component was in alarm from the components of the storage hub 10 over a communications interface.
The control processors 34 also may allow the administration processors 32 to gather data on parameters like the number of active memory sections 30, the total throughput of the storage hub 10 over time, the size of memory section queues, etc., that comprise the operating state of the storage hub. (Note that memory section queues are those queues in the section controller that comprise the list of yet-to-be completed read operations and write operations). In addition, the control processors 34 are responsible for monitoring their own operational status, such as, for example determining which control processor is active as Master, which are on standby, and which, if any, are not operational. Additionally, the control processors 34 may monitor the Storage Hub's environment for extreme temperatures or humidity, etc.
The control processors 34 may also store a copy of the software (i.e., a software image) run by the switches 22. A more thorough description of the switches 22 is present below. If the need arises, it can reload the switch software to one or more of the switches. As discussed below, the switch 22 may include one or more switch controllers (not shown) for executing this software to control the switch 22. In the event the switch 22 uses multiple controllers configured in a master-slave architecture, the control processor 34 may determine which of the controllers in the switch is(are) the master(s) and which is(are) the slave(s).
Additionally, the control processors 34 may determine the status (active, idle, out-of-service) of ports (not shown) on the switch 22, whether the ports are used to connect to servers 12 or to memory sections 30. The control processors 34 may also provide configuration management data to the switches 22. Examples of configuration management data include the date, the time, a routing algorithm to use, an interval for a status check, the identity of active server ports, etc. Further, the control processors 34 may instruct the switch to use different “hunt†algorithms to find idle ports that may be used in establishing connections. These algorithms may be included in the software executed by the switch controller, examples of which include rotary hunt, skip route, and least-used.
The administration processors 32 preferably collect information and statistics from the I/O controllers 24, memory sections 30, switches 22, and the control processors 34. The information and statistics collected may include information for generating statistical reports, telemetry data, and other alarms and administrative data. The administration processors 32 provide this information to the external management system 14 using a protocol, such as, for example, TCP/IP or any other suitable protocol. The administration processors 32 may collect data on such parameters from the device drivers 28, the switches 22, and the memory sections 30.
Users of the external management system, such as for example, a system administrator, may request a change in the configuration management parameters of the storage system 10. This change may, for example represent the addition of new memory sections 30. Users of the external management system 14, such as for example, a system administrator, may also request the administration processors 36 to collect statistical data from a storage area network environment (a set of storage devices connected by a network dedicated solely to the storage devices) including one or more storage hubs 10, a network area storage environment (a set of storage devices connected by a network shared with other traffic) including one or more storage hubs 10, and other external systems. For example, this statistical data may include the total incoming requests from each storage environment or from a particular server.
The administration processors 32 may execute a database program such that the administration data is stored in a standard database, which can then be used to provide the information to system administrators of the storage hub 10 in reports and graphs on a computer screen or on paper. For example, the system administrators of the storage hub may use an external management system 14 to gain access to this information. Alternatively, the system administrators of the storage hub 10 may access this information directly through an interface to the administration processors. Like the control processors 34, the administration processors 36 can monitor themselves and communicate their own operational state to the control processor 34, which determines which administration processors 34 are active or inactive for any reason.
The management complex 26 may instruct a non-volatile storage device to load data into one or more of the memory sections 30. For example, as illustrated in
FIG. 2
, the storage hub 10 may be connected to a non-volatile storage device such as a CDA 16. The management complex 26 may then be able to send instructions to the CDA 16, switches 22, and memory sections 30 to perform various activities. These activities may include the loading of the memory sections 30 from the non-volatile storage device 16 when the storage hub 10 is powered, when the storage hub 10 has been restarted after, for example, having lost power in an outage, as a result of administrative changes to the configuration of the storage hub 10, as a result of the failure of a memory section 30, or as a result of a user-initiated command.Although the above presents numerous management and control functions capable of being performed by the management complex 26, it should be understood that the management complex 26 may perform all, a subset, or even entirely different functions. Additionally, although
FIGS. 3 and 4
illustrate an exemplary management complex being implemented using separate administration processors 32 and control processors 34, a management complex may be implemented using only one, none, or any number of processors.FIG. 5
is a block diagram of an exemplary memory section 30, in accordance with methods and systems provided. As illustrated, the memory section 30 may include a switch portal (“S-portal†) 42, a section controller 54, a read only memory (ROM) 56, a temporary storage 58, a temporary storage interface device 60, a temporary store selector (“T-selector†) 62, a synchronizer 68, one or more memory interface devices 64-1 thru 64-8, and one or more memory devices 66-1 to 66-n.The memory devices 66 may be any type of memory devices, such as, for example, dynamic random access memory (DRAMs), synchronous dynamic random access memory (SDRAMs), Rambus DRAMs (RDRAMs), magnetic random access memory, resistance random access memory, ferroelectric random access memory, polymer random access memory, chalcogenide random access memory, single in-line memory module (SIMMs), dual in-line memory module (DIMMs), rambus in-line memory modules (RIMMs), rotating media, etc. Although, the term memory interface device is used herein, it should be understood that this term should be interpreted broadly to include any type of access device capable of accessing information stored in a memory device. A more detailed description of exemplary memory interface devices is presented below.
The section controller 54 may, for example, include a microprocessor 51, internal memory 52, a management complex interface(s) 53, memory device control circuitry 55, communications channel interface (CCI) control circuitry 57, test circuitry 59, timing circuitry 61, and a Header/test interface 63. The microprocessor 51 may be, for example, a chip such as the Motorola G2 executing appropriate software. The internal memory 52 may be, for example, 32 megabytes of useable SRAM for program and data storage. This internal memory 52 may be included in the microprocessor 51, such as for example in a Motorola G2. The management complex interface 53 may, for example, be a TCP/IP running over gigabit Ethernet interface that the section controller 54 may use in communicating with the management complex 26. The header/test interface 63 may be an appropriate interface for providing information from the section controller 54 to the memory interface devices 64.
The section controller 54 further may access bootstrap read only memory 56 that may be used by it when power is first applied. This bootstrap read only memory 56 may, for example, contain a small software image that allows the section controller 54 to communicate with the control processors 34 to obtain the current software image via the management interface 53. The section controller 54 may further include CCI control circuitry 57 that may, for example contain a direct memory address circuitry for use in the management of the communications channel interface 46.
The section controller 54 may also include memory device control circuitry 55 for controlling the memory devices 66. This memory device control circuitry 55 may, for example include a memory latching circuit for controlling the state of the memory devices 66 through the binary states of the memory latch. A further description of memory latching is presented below. The section controller 54 may further include test circuitry 59 for testing the memory section 30. A more detailed description of an exemplary test procedure is presented below. Additionally, the section controller may include a header/test interface 63 for providing header type information (e.g., a data block identifier, destination address, etc.) and testing the memory section 30. Also, the section controller 54 may include timing circuitry 61 that may provide master and slave clock signal and other timing signals, such as start and stop read or write signals, etc. for use by the memory section.
The S-portal 42 may include a selector 44 and a communications channel interface 46. The communications channel interface 46 provides the interface for connecting the memory section 30 with the one or more servers 12 via the switches 22. This connection may be, for example, via one or more fiber optic or copper cables. The selector 44 may include circuitry for connecting the communications channel interface 46 with the one or more memory interface devices 64, such that the selector 44 may connect any memory interface device 64 with any I/O port of the communications channel interface 46. The section controller 54 via the CCI circuitry 57 may provide control signals to the selector 44 regarding how the selector should connect the memory interface devices 64 and communication channel interface 46. Additionally, the selector 44 may be directed to send data, such as, for example, test data, from a memory interface device 64 to the section controller 54 via the CCI circuitry 57.
The communications channel interface 46 can use any type of protocol, such as, for example, any standard channel interface protocol and the selector 44 may or may not be included. Exemplary standard channel interface protocols include Fibre Channel, System Network Architecture-based protocols, Intelligent Serial communications Control Interface, and other third generation input/output (3GIO) protocols.
The temporary storage interface device 60 is any type of device capable of accessing the temporary storage device 58. For example, the temporary storage interface device 60 may include one or more shift register arrays (not shown), including a plurality of shift registers interconnected in series, such that the data may be serially clocked through the shift register arrays. For a further description of shift register arrays and their use in accessing storage media such as memory devices, see the patent application by William T. Lynch and David J. Herbison, entitled “Methods and Systems for Improved Memory Access,†filed on the same day as this application, which is incorporated by reference herein in its entirety.
The temporary storage 58 may be any type of memory device, such as a DRAM, SDRAM, SIMM, DIMM, a disk drive etc. The T-selector 62 may be any type of selector for selecting between a plurality of inputs.
The storage hub 10 may use a synchronizer 68 in embodiments where the temporary storage interface device 60 includes shift register arrays. In such an embodiment, the synchronizer 68 may, for example, accept data to be stored in the memory section 30 and use phase lock loop circuitry to extract a clock frequency from the incoming data stream. A temporary storage interface device 60 including shift register arrays may then use this clock signal to shift the data in writing data to the temporary storage device 58. This clock signal may be used, for example, to compensate for possible differences in either the phase or frequency of the incoming data from the memory section's system clock. When data is shifted out of the temporary storage interface device 60 for storage in the memory devices 66, the system clock for the memory section is preferably used to shift the data.
The section controller 54 may be capable of detecting faults in the memory section 30. For example, the section controller 54 may detect errors in the hardware or protocol used by the communications channel interface 42 through the communications channel interface circuit 57. Additionally, the section controller 54 may, for example, detect errors in the memory interface device 64 through the use of the Header/Test interface 63. Further, if the memory devices 66 include circuitry for detecting and/or correcting faults, such as, for example, electronic error correction circuitry (e.g, DIMMs), the memory devices 66 may communicate detected faults to the section controller 54 through the memory control 55. In the event the section controller 54 detects a fault, the section controller 54 may transmit information regarding the fault (e.g., time, component, type of fault) through the management interface 53 to the management complex 26.
The section controller 54 may also include an interface available for an external system (not shown) that permits the external system to obtain information about the section controller 54 through interaction with the microprocessor 51. This interface may, for example support a keyboard and display for direct diagnostic observations. The external system interface (not shown) may also, for example support an interface to a personal computer or similar system for direct diagnostic observations. The external system, not shown, may also use this interface, for example, to install special software on the microprocessor 51 in support of testing or related diagnostic functions.
The above description provides one example of an exemplary memory section. Other methods and systems may be used for implementing a memory section without departing from the scope of the invention. For example, the discussion below presents a different exemplary embodiment of a memory section using PCI bus technology.
FIG. 6
illustrates a functional diagram of a switch 22, in accordance with methods and system consistent with the invention. As illustrated, the switch 22 includes a switch/server communications interface 204 for interfacing with a server 12, a switch/memory section communications interface 208, a switch fabric 206, and a switch controller 202. The switch/server communications interface 204 and switch/memory section communications interface 208 may be standard switch interfaces found in commercially available switches and the terms memory section and server are used to indicate the devices to which the connections leaving the switch 22 preferably connect. The switch fabric 22 may be any type of switch fabric, such as an IP switch fabric, an FDDI switch fabric, an ATM switch fabric, an Ethernet switch fabric, an OC-x type switch fabric, or a Fibre channel switch fabric. Thus, the switch 22 may be any type of commercially available switch.In this embodiment, the management complex 26 of the storage hub 10 may exercise control over the switch 22 through the switch controller 202, and may exercise control over the communications channel interface 46 of the memory section 30 through the section controller. For example, as discussed above, the management complex 26 may provide the switch controller 202 with an algorithm for switching traffic through the switch fabric 206. Further, as discussed above, the management complex 26 may provide other information including, for example, providing the switch with new copies of the software it executes, a regular period to send a heartbeat (i.e., a signal that verifies the switch still can communicate), a list of valid communications network addresses, alarm acknowledgements, and command sets. Further, as discussed above, the management complex 26 may provide other information including, for example, instructions to copy a communications message, modify its contents, and then process the new message. The management complex 26 may provide other information including, for example, instructions to broadcast information to multiple addresses.
FIG. 7
illustrates an alternative functional diagram of the management of the switch 22 and the communications channel interface 46 of the memory section 30, in accordance with methods and systems provided. In this embodiment, the switch controller 202 and memory section interfaces 208 need not be included in the switch 22, and the management complex 26 of the storage hub 10 exercises direct control over the switch fabric 206 and server interfaces 204. Thus, in this embodiment the communications channel interface 46 of the memory section 30 directly connects to the switch fabric 206.In an alternative embodiment to that of
FIG. 6 and 7
, the selector 44 need not be included and all memory interface devices 64 may be connected to the switch fabric 206.FIG. 8
illustrates a diagram of an alternative exemplary switch 22 that may be used in the storage hub 10, in accordance with methods and systems provided. More particularlyFIG. 8
illustrates a switch 22 for connecting one or more memory sections 30 to one or more servers 12. This example illustrates M servers 12-1, 12-2, . . . 12-M connected to a single memory section 30. In this example, the server interfaces 204 of the switch 22 include M switch/server communications interfaces (SSCI) 204-1 thru 204-M, and the memory section interfaces 208 of the switch include N switch/memory section communications interfaces (SMCI) 208. Additionally, the switch fabric 206 of the switch 22 includes one or more switching planes 808.In this example, the servers 12 each includes a device driver 28, and the memory section 30 includes one or more communications channel interfaces (CCI) 46-1 thru 46-N. In this example, P parallel lines connect each device driver 28 to the switch 22 and each CCI 46 to the switch 22. Although in this example, the number of lines in each connection is equal, in other examples they may be different. The device driver 28 may be, for example, the above-discussed device driver 28, or may be included in the device driver 28.
Any of the M servers may generate and transfer a data request from its device driver 28 to a memory section 30 via the switch 22. A server 12 may include in the data request a data block identifier that identifies a particular data block it wishes to write or a data block in the storage hub 10 that it wishes to read. The corresponding SSCI 204 of the switch 22 then receives the data request and forwards it to the switch controller 202. The switch controller 202, in this example, determines the memory section 30 to which the information request is destined from the data block identifier included in the data request.
To determine the memory section 30, the switch controller 202 may, for example, consult a table that defines the relationship between data block identifiers and memory sections, use an algorithm to compute the address, or use some other technique.
Once the memory section is determined, the switch controller 202 then may establish a transmission path through each switching plane 808 for each of the parallel lines P from the device driver 28 to the SMCI 208 corresponding to the determined memory section 30. The data request may also be modified by the switch controller 202 to contain a new address that may be used by the switch 22 in directing the data request to the correct memory section 30. The modified data request is then transmitted across the switching planes 808 to the SMCI 208. This transmission across the P lines may be synchronous.
While the path through the switch is established, the data may reside in a separate storage queue (not shown) in the switch or in a memory (not shown) for the switch controller 202. The data request may also be copied and further modified by the switch controller 202 in accordance with any particular requirements of the storage hub 10. For example, as previously discussed, the management complex 26 may instruct the storage hub 10 to back up all data that is written to the storage hub 10 or to one or more particular memory sections 30. In such an example, the switch controller 202 may copy the write data request including the data to be stored and modify the request in accordance with any particular requirements of the CDA 16. Then, the switch controller 202 may then establish a path for sending the write data request to the CDA 16 and then send the modified copy of the request to the CDA 16, so that the write data is backed up. Likewise, subsequent data blocks that comprise the write request may also be sent to the memory device 30 are copied and sent to the CDA 16. The management complex 26 may, for example, provide the switch controller 202 with any required information and software needed by the switch to determine how to modify data requests, provide multiple destinations with copies of modified data requests, and provide multiple destinations with copies of data.
When a memory section 30 sends information such as data blocks to a server 12, the data blocks from the memory section 30 arrive at the switch 22 through the SMCI 208 corresponding to the CCI 46 for the memory section 30 sending the data block. The data blocks may include an identifier that is inserted into the data by the memory section 30. The memory interface devices of a memory section 30, for example, may insert this address, as described below. Further, this address may be for example a data block identifier identifying the particular data block that was read from the memory section 30, or a port or device to which the data is to be sent. In this example, P parallel lines connect each CCI 46 to the switch 22, although the number of lines in each connection may be different. Further, P may be any number greater than or equal to 1.
The SMCI 208 then forwards the data block to the switch controller 202, which determines the server 12 to which the data block is destined from an identifier (e.g., data block identifier, destination address, etc.) within the transmitted data. The switch controller 202 then establishes with this destination address or data block identifier, for each of the P lines from the CCI 46, a path though the switch 22 to the SSCI 204 to which the data is to be sent. The switch 22 then transfers the data block across the switching planes 808 to the SSCI 204. The transmission of a data block across the P lines may be, for example, synchronous.
FIG. 9
illustrates an alternative switch 22 connected to one or more memory sections 30, in accordance with methods and systems provided. In this example, muxing (the combining of several data streams into fewer data streams, each on its own communications path) and demuxing (the separation of a set of data streams into more data streams, each on its own communications path) are used in both the memory section 30 and the switch 22. In this example, P parallel lines connect each memory section's CCI 46 to the switch 22, although the number of lines in each connection may be different.In this example, in memory section 30-1, Q lines emanate from memory interface device 64-1 and R lines emanate from memory interface device 64-2. A corresponding mux (902-1 and 902-2) then multiplex the lines from each of these memory interface devices (64-1 and 64-2) into P streams, where, Q and R are positive integers greater than the positive integer P.
In memory section 30-2, J lines emanate from memory interface device 64-4, where J is a positive integer less than P. A demux 904 then demuxes these J lines to P lines.
The P parallel lines (streams), however, may also be muxed or demuxed anywhere along the switching path. For example, as illustrated, the P lines muxed into T-line by mux 906 after the SMCI 208-1. The T-lines are then passed through the switching planes 808 to demux 908, which demuxes the T-lines into P lines and passes the P-lines to an SSCI 204.
Additionally, in embodiments employing a memory interface device including a shift array, one or more pipeline shift registers (not shown) may be inserted at points in the transmission and switching path to maintain the clock frequency of those transmissions at the appropriate multiple (muxing function) or sub-multiple (demuxing function) of the clock frequency of the memory interface device shift register array. For example, a shift register pipeline may be included in the CCI 46.
FIG. 10
illustrates an exemplary pipeline shift register, in accordance with methods and systems provided. For this example, this pipeline shift register is inserted at the outputs of the CCI 46, such that each of the P lines exiting a CCI 46 are attached to a latch shift register 1002-1, 1002-2 . . . 1002-P. As illustrated, each of the P lines is attached to the S input of the latch shift register, and its inverse is connected to the R input of the latch shift register. The latch shift registers, further receive a master clock signal that may be generated by a master clock c