So far, the custom software we built at SRON assumed that users would manually configure the software when any hardware was replaced, and that there was only one electronics board present. The following is a preliminary design on the software changes required to support an array of boards that is placed in a standard format rack.

Overview

The basis is, that we are developing standard electronics boards for certain tasks. For instance, a standard design for an electronics card that contains a DAC and an ADC. These cards must work in array-like structures, for instance a standard 19 inch subrack which contains a number of these standard electronics boards. This could be used door biasing a sensor, reading out a sensor, housekeeping duties and PC communication duties. Such a rack would consist of a number of slots, each which consist of a rail that guides an inserted board to the backplane. This backplane provides power and connection to the software. The central requirement is: the user should be able to add and remove the boards without complex procedures. Any procedures like board-specific power-up/power-down sequences should be handled automatically.

The setup thus consists of two parts:

The hardware consisting of:
- A rack with electronics boards
- An off-the-shelf, generic electronics board with an LEON-based FPGA core (made by Pender Electronics)
The software consisting of:
- The PC which runs the EGSE software (Electronic Ground Support Equipment)
- The LEON core running ESW (our embedded software).

Use cases

To support the above sketched requirements, we can recognize several use cases in this setup:

Adding, removing or replacing a board
Powering up/down the rack (planned or unexpectedly)
Powering up/down the software (planned or unexpectedly)

Use case: adding, removing or replacing a board

The user must be able to add or remove a board into the rack, and the software should detect this. Also, most boards must be initialized in some way. Thus there must be hooks that run a particular script when hardware changes. This also means that the hardware must actively identify itself, let the script take care of this, or give the software some uniform way of checking this. More on this later.

Replacing, or the moving of a board from one slot to another can be covered by a simple remove/add action.

Use case: powering up a rack

Since the hardware and software can be powered on and off independently, both situations must be covered. Thus the software must have some sort of discovery mechanism when starting. The hardware must have some way of rate limiting if it actively advertises the adding or removing of a board. More on this later.

Powering down a rack

There are two possible ways in which a rack is powered down: expectedly and unexpectedly. The software does not need to be adapted either way. In the case of an expected power down, there should be a project-specific power down script. In the case of an unexpected power down, it should be determined whether the project needs a way of detecting this.

Powering up the software

When the EGSE is powered up, it should see whether a rack is connected and if so, a discovery mechanism should see what boards are present. More on the discovery mechanism later. When the ESW is powered up, no particular actions are necessary.

Powering down the software

There are two possible ways in which the EGSE is powered down: expectedly and unexpectedly. The software does not need to be adapted either way. In the case of an expected power down, there should be a project-specific power down script. In the case of an unexpected power down, it should be determined whether the project needs a way of detecting this.

The ESW can also be powered down, either accidental or as per request. There is no difference between the two, since the ESW functions as a pass-through and does not maintain state.

Objects and attributes

For the above use cases, the software obviously requires a constant and up to date register of all available boards plus their addresses. The following objects can be found in the use cases: rack, slot, board. A rack is divided in slots. A slot contains a board. Typically, racks can have shelves but for now, we assume that there's only one shelf. Also, racks are contained in a cabinet but again, there can be only one rack for now.

The current requirements do not necessitate that the software exactly knows which slots are occupied. Thus, this concept is currently not taken into account. That leaves us with the following classes:

Rack, with attributes: boards.
Board, with attributes: version, address and insertion script.

Addresses and discovery

There are two options for addressing. Currently, all boards have an address pre-programmed into the FPGA. This is fine in a situation where we can manually increment the unique address. The software will then simply use a discovery mechanism where a dummy request is sent to each possible address. When a reply is received, the board is then added to the present board list. Discovery must be quick since it inhibits other usage of the bus, and is done periodically. Thus the most logical place to run the discovery, is probably the ESW.

But when using multiple off-the-shelf boards, it is much easier to let the boards actively know that they were inserted, and let the software hand out addresses. The software still needs a discovery mechanism in case the software is brought down for some reason. This can be the same as previously mentioned.

Releases

In the first release:

We will let the software poll the hardware chain, and thus detect when hardware has been inserted or removed.
Boards will use pre-programmed addresses.
The concept of slots is not necessary.

For version two, we see the following points:

Inserted boards actively trigger a signal that notifies the software. This is currently not incorporated into the protocol that we use for communication between hardware and software. The hardware must have a way of rate limiting its active signal, perhaps by waiting for a random time.
After a board is inserted, the software will issue an address that uniquely identifies the board. The board will then start listening only to that address.

For version three, we see the following points:

Introduction of the concept of slots into the software and hardware. When (or better: before) a board is removed from the rack, a signal is sent to the software.

Unresolved points

Above, it is assumed that the embedded software runs on a LEON3-based board that's directly connected to the backplane of the rack. Is this assumption correct?
The current bus protocol is not really fast (2 MHz), is this fast enough to support a number of measurement boards?
Do we need to detect when the rack is powered down immediately? Or is it OK to wait for the next action (user or scripted) to generate a communication timeout?
It should be possible for selected boards to be powered down or reset from the software. However, since in the current backplane design one power line is provided for all boards, this does not seem possible right now.

Miscellaneous notes

Some boards are high enough to occupy multiple slots, for instance when the board has a high speed FPGA which has a heat sink stuck on it. But for now, we ignore this. The software does not need to know that some slots appear empty but are actually physically blocked from usage.
If a board is in the middle of a measurement, can it answer the discovery mechanism in the standard hardware design?