Paper ID No. SO96.5.11
THE SOFTWARE MAINTENANCE:
CONCEPT, IMPLEMENTATION AND GAINED EXPERIENCE IN THE SAX PROGRAM
A. Martelli, F. Torchia, G. Battistoni
Alenia Spazio S.p.A. - Strada Antica di Collegno 253, 10146
Torino, Italy
Phone: +39 11 7180203; Fax: +39 11 7180036; E-mail: amartell@to.alespazio.it
U. Marklund, F. Rame
Società Italiana Avionica (S.I.A.) S.p.A. - Strada Antica
di Collegno 253, 10146 Torino, Italy
Phone: +39 11 7720212; Fax: +39 11 725679; E-mail: frame@sia-av.it
ABSTRACT
The SAX Italian scientific satellite for X-ray astronomy, in orbit
since late April '96, has been designed to autonomously support
its two-to-four years of mission lifetime with a small need for
ground intervention. It has been programmed to properly manage
both the nominal and pre-conceived contingency functions, basing
on a complex software architecture decentralized in nine on-board
computers. The ground capability of easily operating, programming
and even modifying the SAX software has been identified and proved
to be a success key for the mission. The Software Maintenance
Facility (SMF) started to properly accomplish its tasks already
before launch, providing software modifications as a recovery
to the "last minute" anomalies, and just after it, supplying
software changes related to the first problems encountered. An
overview of the implemented SMF architecture is given in this
paper presenting the defined hardware configuration and the software
developed tools. The background of the facility is also described,
emphasizing its major key points: from the Verification Facility,
widely used during the design, integration and test of the SAX
system software, to the EGSE tools and data-base, to the interface
link with the Operation Control Centre. Lessons learned through
experience, gained both during that phase and the in-orbit commissioning,
will finally be presented.
INTRODUCTION
As usual, at launch time, after years of software testing performed
at different levels, from the development environment to the final
flight model, after months of system validations and simulations
and after the longest (and at the same time shortest) hours of
the pre-launch, every component of the software/operations team
attending the launch was, in his heart, looking for a fully nominal
completion of, at least, the first post-separation phases. Things
are often different from expectations, so it was for SAX: everything
was nominal but a critical trend in the battery temperature. Soon
a modification in the control performed by the on-board Application
Software had to be implemented in order to properly manage any
autonomous reconfiguration potentially induced by this anomalous
behaviour. It was the first step of the SMF support during the
SAX mission. On the other hand, it was just part of a normal work
having its root in the former Software Verification Facility.
A SMF ready to operate during the satellite mission lifetime is
becoming a must in very large mission programs. On-board malfunctions
could, in fact, affect, more or less severely, either the satellite
performances or the mission targets. To recover such negative
effects an extra effort was, in the past, usually required to
the ground, in order to act on the satellite with new contingency
procedures, applied with not negligible impacts on the commanding
management and telemetry control. The decision whether or not
to implement a SMF should be made early in the system development
cycle on the basis of many parameters. The first one is the trade-off
between the implemented on-board functionalities and the capability
to overcome on-board problems by ground operations. A further
constraint is due to the criticalities of the intervention timings
(e.g. the limited satellite visibility w.r.t. the urgency of on-board
updatings). A third parameter to take into account is the complexity
of the on-board H/W and S/W structure: the more modules and functions
are implemented on-board, the more likely it is that an anomaly
occurs during the mission because of components failures or software
errors inevitably left undetected during the ground test campaign
especially for large real time environment. Last but not least,
the decisive parameter is eventually the need to cope with post-launch
customer requirements to implement new/different on-board functions
and/or to upload software modules not ready before the final pre-launch
integration and test. All these aspects have driven the current
SAX SMF design, with a very useful "side effect": the
capability, to properly operate software modifications, not only after but
also before, the launch date provided the opportunity to implement and
test software changes, coming from the last minute needs.
THE SAX MISSION
The SAX satellite is part of a scientific program whose objective
is to observe celestial X-ray sources in a very broadband spectrum.
The mission is planned to achieve a systematic, integrated and
comprehensive exploration of galactic and extra-galactic sources.
Circular and equatorial at 600 Km of altitude, with just one ground
station, the SAX orbit allows at most 11 minutes of visibility,
which implies that only 10% of the mission is under visibility.
These characteristics have determined the need to implement on-board
the capability of supporting, in an autonomous way, the execution
of on-ground pre-defined mission plans. That also means that the
on-board software structure must be capable of continuously
controlling in a safe manner both the nominal activities and the
pre-conceived anomalies.
The management of the SAX operations is implemented by a hierarchical
structure involving the local subsystem/scientific instrument
software, the On-Board Data Handling (OBDH) Application Software
(ASW) and the ground Operation Control Centre (OCC) in an increasing
priority order. Only few inputs from ground are needed for tuning
either the performances or the functionalities accordingly to
the current mission targets. The ground intervention is limited
to periodically uploading commands for the link management and
attitude/instrument operating plans for the observation programming.
THE ON-BOARD SOFTWARE
The SAX architecture makes extensive use of a distributed on-board
intelligence. Nine u-processor controlled subsystems, with their
own software, autonomously perform the proper control and setting
of the nominal operations. A FDIR management is as well performed
by themselves, keeping under control the configuration, functioning
and health status of the relevant units. In case of detected malfunction
the redundant unit would be activated or, in a severe case, a
safe mode functioning is assumed.
The OBDH is assigned the task of operating a system supervision.
The OBDH Basic Software (BSW) purpose is mainly to support the Satellite
data collection and commands distribution from/to the subsystems. The OBDH
ASW purpose is to keep under control all the subsystem level operations,
ensuring the proper nominal/safety satellite consistency. It plays
the role of on-board coordinating all major flight operations
between themselves and with respect to the ground scheduled plans,
as well as the role of detecting, isolating and recovering system
level anomalies. One of the major aspects featured by the ASW
design is the capability of easily modifying the SW control, devoted
to the system operations, by means of simple enabling/disabling
commands. As the most important ASW functions are implemented
by a table driven mechanism, the relevant control can be enabled,
inhibited or modified by properly acting on the relevant entry
via dedicated commands. The capability of acting on the OBDH operating
system makes it possible, to ground operating SW, intervention at
very low level. The OBDH SW and in particular the ASW are based
on a very modular architecture so that each main function is implemented
as a stand alone task. Properly acting on the operating system
primitives, the task scheduling mechanism can be modified. This
mainly allows the introduction of new tasks implementing new/different
functionalities. Patching of the Intelligent Terminal software
is the lowest level of possible intervention by ground. It can
be accomplished through the OBDH BSW support which either autonomously
executes the patch command on itself, if so addressed, or routes
the new data/instructions towards the relevant Intelligent Terminal
via OBDH Bus protocol. At least 20% of spare RAM/EPROM has been
left free for each IT for such operations. Modifications of critical
functions are supported and driven by dedicated tasks in order
not to affect the running software; on the other hand operations
on the memory of not critical components, as the scientific instruments,
require the relevant u-processor in 'wait' state for safety reasons.
THE SMF CONCEPTS
Modification of the on-board software is definitely not an easy task. This is not only because of the already mentioned ground constraints, e.g. applicable procedures, telecommand load, visibility limitation, etc., but mainly because of the implemented H/W and software architectures. The process of acting on the on-board code and data varies depending on the following aspects of the implemented design:
A very thorough knowledge of the flight software and its environment
is therefore required to the SMF team as well as a rigorously
defined modification process control. For SAX, the same formal
standards and procedures, followed for the software development,
have been applied to the maintenance activities. Differences can
anyway be found in the entry point to the development process:
initiators of a software change can in fact be either a new requirement
or a an identified on-board anomaly. A scheme of the modification/validation
process is provided in figure 1.
If starting from a software problem report, the first step of the maintenance
process is, obviously, an accurate evaluation of the related satellite telemetry,
including some complementary information such as: the system configuration
status, the applied procedures/commands and, in some cases, memory
dumps. This phase is the first assessment of what reported from
the Operational Control Centre. The second step has to address
the further analysis identifying the possible causes of the anomaly,
the required investigations and the suggested recovery solutions.
In parallel with this activity a set-up of the facility is carried
out in order to make available a representative environment of
the current on-board configuration. This will support the trouble
shooting and software debugging activities whose goal is to find
out the source of the anomaly. It is also expected that, in some
cases, the cause of the reported malfunction couldn't be discovered
with a 100% of confidence. In any case clear indications about
how to overcome the problem have to be pointed out from this stage.
The software modification process will then operate for implementing
new or modified functions. The identified solutions are then verified
against the given requirements by an appropriate level of regression
testing. The output of the entire validation process normally
leads to the delivery of a new S/W release, the modified parts
of which will eventually be uploaded on-board.
If it is a project
directive that activates the SMF process, only the simulation
set-up, the software modification
Fig. 1 - SMF validation process
and the regression test are nominally directly involved. Patches preparation, according to the SAX command standards, is the final part of the activity. For properly carrying out the above mentioned maintenance process, it is quite evident the need of making available a facility capable of receiving and processing the satellite telemetry, supporting the SW modification, validation and configuration process, accomplishing the following main tasks:
THE SMF EVOLUTION
SYSTEM ARCHITECTURE
The SMF, as shown in figure 2, is constituted by an integrated multifunction system that can be represented in three main blocks. Each of them implements a specific environment:
The analysis and data presentation is devoted to gathering the SAX telemetry data provided, on demand, by the Operational Control Centre. Being the OCC located in Rome and the SMF site at Alenia, in Turin, an ISDN link has been chosen to support data exchanging by means of two 64 Kbps channels. This makes available all the satellite housekeeping data within 15 minutes, i.e. much less than one orbit period, allowing telemetry quick look and analysis before next passage. Once received, data are unpacked and cross-checked with the EGSE data base. This database has been widely used and validated during the system tests and contains all the information needed for a detailed analysis: e.g. parameter locations, calibration curves, validity criteria, range values, alarms, etc.. The data presentation is based on a graphical tool showing both the system and subsystem parameters in synoptic formats on the screen. This allows both the SMF team and the project team a friendly access to the analysis environment. All data are stored on optical disks to maintain an off-line archive to be restored for further analysis or problem comparisons.
The test & simulation environment accomplishes the following main functions:
These operations are realized in a distributed environment controlled by the test console. Console activities are performed in concurrent mode by specific user processes. Its data presentation allows to trace OBDH bus and telemetry raw data in order to have a quick look at the system status for run-time control. All displayed data, as well as all the other managed S/S data, are logged in specific files to permit an accurated off-line analysis during the validation phase. Furthermore, telemetry data are dispatched to a dedicated monitoring workstation which displays pre-definable data sets in a user friendly format. This workstation is capable of managing, in real-time, 20 seconds of history depth, in up to 6 different windows of 60 parameter-words each. It also gives the possibility to run the Quick Software Loading function, implemented to upload and/or download the on-board software of main S/S's. The AOCS console, along with the dynamics simulator, supports all the S/S internal monitoring and stimulation necessary to reproduce the Attitude and Orbit behaviour. This equipment can be used in stand alone configuration, simulating missing OBDH S/S; on the other hand, when integrated in SMF, it is controlled by specific test console tasks, still maintaining its proper evolution, monitoring and log capability. All test procedures used to stimulate and validate the target systems are written in a proprietary interpreted language. Its syntax is very easy to learn and makes use of not more than 15 keywords. It has been defined in order to model a finite state machine and it allows, contrary to compiled languages, an immediate intervention during the test run since commands can be interactively issued from keyboard. Test preparation is focused on the definition of the BUS simulation file. That is the most critical operation because it is sometime necessary to check and modify several monitors and simulated data. This phase has been optimized creating a BUS simulation database of files, each corresponding to a nominal satellite condition to reach. Analogously a database of prepared TLC, which cover the major needs, has been created. The OBDH bus simulation/monitoring is based on dedicated Programmable Array Logic circuitry board housed in a PC. It is fully capable to handle the raw data
Fig. 2 - SMF system architecture
throughput for monitoring and, at the same time, to cope with
dynamic simulation of any missing SAX S/S. Telemetry and telecommands
management is supported by a dedicated board, as well housed in
a PC. It is programmed by means of user SW drivers and driven
by the test console. Telecommands prepared and tested, in SMF,
are sent to OCC with the same formats used for the validation
tests. Furthermore, both boards are capable to run in local mode
using the same input-files as when remote-controlled. The SAX
Scientific Instruments data generation is supported by signal
generator programming. Any missing instrument is fully simulated
by the bus front-end, both in timing and in functionalities.
The system representative targets were set-up to reproduce the on-board
satellite software environment. This allows to have a fully compliant
configuration as far as the system functionalities and performances
are concerned. To achieve this goal, the engineering models of
the satellite intelligent terminals have been integrated in the
SMF whilst all the peripherical units, e.g. actuators and sensors,
have been simulated.
PROBLEMS SOLVED
All the on-board functions underwent extensive test campaigns from module to system levels and, only at the last moment, were frozen in order to reach the most reliable definition and implementation. Nevertheless few non conformances and missing functionalities were found after the final software release and integration. A list of the tackled problems is provided in table 1, along with a summary of the undertaken solutions and assigned categories.
| PROBLEM DESCRIPTION | IMPLEMENTED SOLUTION | CATEGORY | |
| Defects in the BSW code in the command management function | Code correction |
Low complexity SW anomaly |
|
| Malfunction in the on-board time updating mechanism | Introduction of a specific software control |
Low complexity HW anomaly |
|
| Possible stuck condition of one scientific instrument and of AOCS, derived by a DMA potential conflict | New OBDH ASW process to periodically control and remove any stuck by resetting to a safe configuration |
High complexity HW anomaly |
|
| Potential risk of contemporary overcharge and overtemperature of the batteries | New OBDH ASW safety process to control the stop/start of the Battery charge cycle |
High complexity Change of specification |
|
| Ensure, during launch, proper On/Off/Alarm thresholds in the Thermal control | Add in OBDH ASW the new values to be downloaded to the thermal S/S in case of switchover |
Low complexity Late data delivery |
|
| Prevent any stuck open condition of the flow control valves - if used at separation | Modify the AOCS ASW code to manage this contingency |
Low complexity Change of specification |
|
| Prevent loose of the proper On/Off/Alarm thresholds in the Thermal control | Add in OBDH ASW the new values to be downloaded to the thermal S/S in case of switchover |
Low complexity Late data delivery |
|
| Make the star tracker dark current control adaptative to the environment conditions | Load in the OBDH ASW data segment the new code to be downloaded to the Star Tracker |
High complexity HW anomaly |
|
| Optimize the gyro substitution sequence | Updating of the sequence and reloading into the OBDH ASW data area |
Low complexity Missing functionality |
|
| Change some values and control logic in the AOCS Software as outcome from the commissioning | Generate and load new AOCS code release |
High complexity Missing functionality Change of specification |
Note: last four entries refer to post-launch problems.
Tab. 1 - Problems solved
In these cases the SMF resulted decisive for the complete
success of the program, overcoming critical conditions detected
both before and after the SAX launch.
CONCLUSIONS
From the implementation & testing phase and, furthermore,
from the mission support, an important experience is matured.
This is applicable not only to the SMF itself but can also be
used as feedback to the on-board architectures and implementations.
The key point to be absolutely ensured is, of course, the availability
of adequate on-board memory margins. Critical functions have to
be stored in non volatile memory. This has the advantage of not
loosing the software code in case of resets, but, on the other
hand, code defects couldn't be removed unless a download to RAM
takes place at initialization. A RAM modification can then be
operated. To make permanent any modification - not to loose critical
functions or not to loose upload time during the mission - re-writable
memory, e.g. EEPROM, with dedicated protections, is strongly suggested.
Modifications of critical memory areas as well as interventions
on running code have to take into account many constraints, such
as access protocol, task scheduling, timing limitation, that cannot
usually be fully controlled by ground. Specific tasks of the OBDH
should be designed for properly downloading, on request, a set
of data/code to the addressed terminals. Similar specific tasks
should also be designed in the intelligent terminals software
for properly supporting the access to the protected memory, as
well as the modification of their critical functions without being
asinchronously interrupted by external patches. A table driven
design allows ground interventions on most of the on-board functions,
e.g. scientific data allocations, FDIR management, telemetry formatting,
attitude controls, etc..
In many cases different telemetry formats
or a different data sampling rate from the satellite is desired
by the analysts: a more powerful information can be obtained if
a programmable telemetry can be commanded by ground. This would
help very much the analysis and trouble-shooting activity addressing
much faster the SMF activity to the problem solution.
The final recommendation to point-out concerns the facility design. Possible
delays in the HW or SW deliveries have to be taken into account
for not affecting the start and continuation of the SMF activities.
The lack of a unit or a software module should be tackled by a
very modular design, implementing the capability to properly simulate
also those parts of the target environment not yet ready, because
of delivery delay, or temporary not available, since under upgrading/repairing.