Paper ID No. SO96.5.11

THE SOFTWARE MAINTENANCE:
CONCEPT, IMPLEMENTATION AND GAINED EXPERIENCE IN THE SAX PROGRAM


A. Martelli, F. Torchia, G. Battistoni

Alenia Spazio S.p.A. - Strada Antica di Collegno 253, 10146 Torino, Italy
Phone: +39 11 7180203; Fax: +39 11 7180036; E-mail: amartell@to.alespazio.it


U. Marklund, F. Rame

Società Italiana Avionica (S.I.A.) S.p.A. - Strada Antica di Collegno 253, 10146 Torino, Italy
Phone: +39 11 7720212; Fax: +39 11 725679; E-mail: frame@sia-av.it


ABSTRACT


The SAX Italian scientific satellite for X-ray astronomy, in orbit since late April '96, has been designed to autonomously support its two-to-four years of mission lifetime with a small need for ground intervention. It has been programmed to properly manage both the nominal and pre-conceived contingency functions, basing on a complex software architecture decentralized in nine on-board computers. The ground capability of easily operating, programming and even modifying the SAX software has been identified and proved to be a success key for the mission. The Software Maintenance Facility (SMF) started to properly accomplish its tasks already before launch, providing software modifications as a recovery to the "last minute" anomalies, and just after it, supplying software changes related to the first problems encountered. An overview of the implemented SMF architecture is given in this paper presenting the defined hardware configuration and the software developed tools. The background of the facility is also described, emphasizing its major key points: from the Verification Facility, widely used during the design, integration and test of the SAX system software, to the EGSE tools and data-base, to the interface link with the Operation Control Centre. Lessons learned through experience, gained both during that phase and the in-orbit commissioning, will finally be presented.


INTRODUCTION


As usual, at launch time, after years of software testing performed at different levels, from the development environment to the final flight model, after months of system validations and simulations and after the longest (and at the same time shortest) hours of the pre-launch, every component of the software/operations team attending the launch was, in his heart, looking for a fully nominal completion of, at least, the first post-separation phases. Things are often different from expectations, so it was for SAX: everything was nominal but a critical trend in the battery temperature. Soon a modification in the control performed by the on-board Application Software had to be implemented in order to properly manage any autonomous reconfiguration potentially induced by this anomalous behaviour. It was the first step of the SMF support during the SAX mission. On the other hand, it was just part of a normal work having its root in the former Software Verification Facility. A SMF ready to operate during the satellite mission lifetime is becoming a must in very large mission programs. On-board malfunctions could, in fact, affect, more or less severely, either the satellite performances or the mission targets. To recover such negative effects an extra effort was, in the past, usually required to the ground, in order to act on the satellite with new contingency procedures, applied with not negligible impacts on the commanding management and telemetry control. The decision whether or not to implement a SMF should be made early in the system development cycle on the basis of many parameters. The first one is the trade-off between the implemented on-board functionalities and the capability to overcome on-board problems by ground operations. A further constraint is due to the criticalities of the intervention timings (e.g. the limited satellite visibility w.r.t. the urgency of on-board updatings). A third parameter to take into account is the complexity of the on-board H/W and S/W structure: the more modules and functions are implemented on-board, the more likely it is that an anomaly occurs during the mission because of components failures or software errors inevitably left undetected during the ground test campaign especially for large real time environment. Last but not least, the decisive parameter is eventually the need to cope with post-launch customer requirements to implement new/different on-board functions and/or to upload software modules not ready before the final pre-launch integration and test. All these aspects have driven the current SAX SMF design, with a very useful "side effect": the capability, to properly operate software modifications, not only after but also before, the launch date provided the opportunity to implement and test software changes, coming from the last minute needs.


THE SAX MISSION


The SAX satellite is part of a scientific program whose objective is to observe celestial X-ray sources in a very broadband spectrum. The mission is planned to achieve a systematic, integrated and comprehensive exploration of galactic and extra-galactic sources. Circular and equatorial at 600 Km of altitude, with just one ground station, the SAX orbit allows at most 11 minutes of visibility, which implies that only 10% of the mission is under visibility. These characteristics have determined the need to implement on-board the capability of supporting, in an autonomous way, the execution of on-ground pre-defined mission plans. That also means that the on-board software structure must be capable of continuously controlling in a safe manner both the nominal activities and the pre-conceived anomalies. The management of the SAX operations is implemented by a hierarchical structure involving the local subsystem/scientific instrument software, the On-Board Data Handling (OBDH) Application Software (ASW) and the ground Operation Control Centre (OCC) in an increasing priority order. Only few inputs from ground are needed for tuning either the performances or the functionalities accordingly to the current mission targets. The ground intervention is limited to periodically uploading commands for the link management and attitude/instrument operating plans for the observation programming.


THE ON-BOARD SOFTWARE


The SAX architecture makes extensive use of a distributed on-board intelligence. Nine u-processor controlled subsystems, with their own software, autonomously perform the proper control and setting of the nominal operations. A FDIR management is as well performed by themselves, keeping under control the configuration, functioning and health status of the relevant units. In case of detected malfunction the redundant unit would be activated or, in a severe case, a safe mode functioning is assumed. The OBDH is assigned the task of operating a system supervision. The OBDH Basic Software (BSW) purpose is mainly to support the Satellite data collection and commands distribution from/to the subsystems. The OBDH ASW purpose is to keep under control all the subsystem level operations, ensuring the proper nominal/safety satellite consistency. It plays the role of on-board coordinating all major flight operations between themselves and with respect to the ground scheduled plans, as well as the role of detecting, isolating and recovering system level anomalies. One of the major aspects featured by the ASW design is the capability of easily modifying the SW control, devoted to the system operations, by means of simple enabling/disabling commands. As the most important ASW functions are implemented by a table driven mechanism, the relevant control can be enabled, inhibited or modified by properly acting on the relevant entry via dedicated commands. The capability of acting on the OBDH operating system makes it possible, to ground operating SW, intervention at very low level. The OBDH SW and in particular the ASW are based on a very modular architecture so that each main function is implemented as a stand alone task. Properly acting on the operating system primitives, the task scheduling mechanism can be modified. This mainly allows the introduction of new tasks implementing new/different functionalities. Patching of the Intelligent Terminal software is the lowest level of possible intervention by ground. It can be accomplished through the OBDH BSW support which either autonomously executes the patch command on itself, if so addressed, or routes the new data/instructions towards the relevant Intelligent Terminal via OBDH Bus protocol. At least 20% of spare RAM/EPROM has been left free for each IT for such operations. Modifications of critical functions are supported and driven by dedicated tasks in order not to affect the running software; on the other hand operations on the memory of not critical components, as the scientific instruments, require the relevant u-processor in 'wait' state for safety reasons.


THE SMF CONCEPTS


Modification of the on-board software is definitely not an easy task. This is not only because of the already mentioned ground constraints, e.g. applicable procedures, telecommand load, visibility limitation, etc., but mainly because of the implemented H/W and software architectures. The process of acting on the on-board code and data varies depending on the following aspects of the implemented design:

A very thorough knowledge of the flight software and its environment is therefore required to the SMF team as well as a rigorously defined modification process control. For SAX, the same formal standards and procedures, followed for the software development, have been applied to the maintenance activities. Differences can anyway be found in the entry point to the development process: initiators of a software change can in fact be either a new requirement or a an identified on-board anomaly. A scheme of the modification/validation process is provided in figure 1.
If starting from a software problem report, the first step of the maintenance process is, obviously, an accurate evaluation of the related satellite telemetry, including some complementary information such as: the system configuration status, the applied procedures/commands and, in some cases, memory dumps. This phase is the first assessment of what reported from the Operational Control Centre. The second step has to address the further analysis identifying the possible causes of the anomaly, the required investigations and the suggested recovery solutions. In parallel with this activity a set-up of the facility is carried out in order to make available a representative environment of the current on-board configuration. This will support the trouble shooting and software debugging activities whose goal is to find out the source of the anomaly. It is also expected that, in some cases, the cause of the reported malfunction couldn't be discovered with a 100% of confidence. In any case clear indications about how to overcome the problem have to be pointed out from this stage. The software modification process will then operate for implementing new or modified functions. The identified solutions are then verified against the given requirements by an appropriate level of regression testing. The output of the entire validation process normally leads to the delivery of a new S/W release, the modified parts of which will eventually be uploaded on-board.
If it is a project directive that activates the SMF process, only the simulation set-up, the software modification

Fig. 1 - SMF validation process

and the regression test are nominally directly involved. Patches preparation, according to the SAX command standards, is the final part of the activity. For properly carrying out the above mentioned maintenance process, it is quite evident the need of making available a facility capable of receiving and processing the satellite telemetry, supporting the SW modification, validation and configuration process, accomplishing the following main tasks:


THE SMF EVOLUTION


The SMF comes from a former Facility, used for Software Verification (SVF) at system level. The SVF was already set-up and widely used during all the SAX development life cycle. It had therefore proved to be a powerful environment for trouble-shooting and testing. It was based on an in-house development of the dedicated hardware equipment and the software tools. Its background provided the capability of quickly defining detailed test cases and easily configuring the facility for future improvements. The kernel of the SVF supported quite well the data monitoring and stimulation of the functional system/subsystem models, granting full real-time visibility and control of both the external and internal interfaces. To achieve clear test results the SVF required, during development , test preparation and analysis, a detailed knowledge of SAX, its S/Ss, its Ground Segment and their data exchange protocol. Members of the SAX project team were also involved in the SMF since early stage. This eased familiarization, operativity and increased SAX insight, pushing tunings along the testing. SVF evolution into SMF is an advantage at system-level as well as at low-level trouble-shooting, giving wide flexibility once problem areas are identified. SMF adds modular execution capability with system data optionally taken from configuration libraries. The kernel was merged with two other complementary environments: telemetry analysis environment - from the Electrical Ground Support Equipment - and attitude & dynamics simulation and testing environment. Upgrade also included science data and failure simulation capability.


SYSTEM ARCHITECTURE


The SMF, as shown in figure 2, is constituted by an integrated multifunction system that can be represented in three main blocks. Each of them implements a specific environment:

  1. analysis and data presentation
  2. test & simulation
  3. system representative targets.

The analysis and data presentation is devoted to gathering the SAX telemetry data provided, on demand, by the Operational Control Centre. Being the OCC located in Rome and the SMF site at Alenia, in Turin, an ISDN link has been chosen to support data exchanging by means of two 64 Kbps channels. This makes available all the satellite housekeeping data within 15 minutes, i.e. much less than one orbit period, allowing telemetry quick look and analysis before next passage. Once received, data are unpacked and cross-checked with the EGSE data base. This database has been widely used and validated during the system tests and contains all the information needed for a detailed analysis: e.g. parameter locations, calibration curves, validity criteria, range values, alarms, etc.. The data presentation is based on a graphical tool showing both the system and subsystem parameters in synoptic formats on the screen. This allows both the SMF team and the project team a friendly access to the analysis environment. All data are stored on optical disks to maintain an off-line archive to be restored for further analysis or problem comparisons.

The test & simulation environment accomplishes the following main functions:

These operations are realized in a distributed environment controlled by the test console. Console activities are performed in concurrent mode by specific user processes. Its data presentation allows to trace OBDH bus and telemetry raw data in order to have a quick look at the system status for run-time control. All displayed data, as well as all the other managed S/S data, are logged in specific files to permit an accurated off-line analysis during the validation phase. Furthermore, telemetry data are dispatched to a dedicated monitoring workstation which displays pre-definable data sets in a user friendly format. This workstation is capable of managing, in real-time, 20 seconds of history depth, in up to 6 different windows of 60 parameter-words each. It also gives the possibility to run the Quick Software Loading function, implemented to upload and/or download the on-board software of main S/S's. The AOCS console, along with the dynamics simulator, supports all the S/S internal monitoring and stimulation necessary to reproduce the Attitude and Orbit behaviour. This equipment can be used in stand alone configuration, simulating missing OBDH S/S; on the other hand, when integrated in SMF, it is controlled by specific test console tasks, still maintaining its proper evolution, monitoring and log capability. All test procedures used to stimulate and validate the target systems are written in a proprietary interpreted language. Its syntax is very easy to learn and makes use of not more than 15 keywords. It has been defined in order to model a finite state machine and it allows, contrary to compiled languages, an immediate intervention during the test run since commands can be interactively issued from keyboard. Test preparation is focused on the definition of the BUS simulation file. That is the most critical operation because it is sometime necessary to check and modify several monitors and simulated data. This phase has been optimized creating a BUS simulation database of files, each corresponding to a nominal satellite condition to reach. Analogously a database of prepared TLC, which cover the major needs, has been created. The OBDH bus simulation/monitoring is based on dedicated Programmable Array Logic circuitry board housed in a PC. It is fully capable to handle the raw data

Fig. 2 - SMF system architecture

throughput for monitoring and, at the same time, to cope with dynamic simulation of any missing SAX S/S. Telemetry and telecommands management is supported by a dedicated board, as well housed in a PC. It is programmed by means of user SW drivers and driven by the test console. Telecommands prepared and tested, in SMF, are sent to OCC with the same formats used for the validation tests. Furthermore, both boards are capable to run in local mode using the same input-files as when remote-controlled. The SAX Scientific Instruments data generation is supported by signal generator programming. Any missing instrument is fully simulated by the bus front-end, both in timing and in functionalities.
The system representative targets were set-up to reproduce the on-board satellite software environment. This allows to have a fully compliant configuration as far as the system functionalities and performances are concerned. To achieve this goal, the engineering models of the satellite intelligent terminals have been integrated in the SMF whilst all the peripherical units, e.g. actuators and sensors, have been simulated.


PROBLEMS SOLVED


All the on-board functions underwent extensive test campaigns from module to system levels and, only at the last moment, were frozen in order to reach the most reliable definition and implementation. Nevertheless few non conformances and missing functionalities were found after the final software release and integration. A list of the tackled problems is provided in table 1, along with a summary of the undertaken solutions and assigned categories.

PROBLEM DESCRIPTION IMPLEMENTED SOLUTION CATEGORY
Defects in the BSW code in the command management function Code correction Low complexity
SW anomaly
Malfunction in the on-board time updating mechanism Introduction of a specific software control Low complexity
HW anomaly
Possible stuck condition of one scientific instrument and of AOCS, derived by a DMA potential conflict New OBDH ASW process to periodically control and remove any stuck by resetting to a safe configuration High complexity
HW anomaly
Potential risk of contemporary overcharge and overtemperature of the batteries New OBDH ASW safety process to control the stop/start of the Battery charge cycle High complexity
Change of specification
Ensure, during launch, proper On/Off/Alarm thresholds in the Thermal control Add in OBDH ASW the new values to be downloaded to the thermal S/S in case of switchover Low complexity
Late data delivery
Prevent any stuck open condition of the flow control valves - if used at separation Modify the AOCS ASW code to manage this contingency Low complexity
Change of specification
Prevent loose of the proper On/Off/Alarm thresholds in the Thermal control Add in OBDH ASW the new values to be downloaded to the thermal S/S in case of switchover Low complexity
Late data delivery
Make the star tracker dark current control adaptative to the environment conditions Load in the OBDH ASW data segment the new code to be downloaded to the Star Tracker High complexity
HW anomaly
Optimize the gyro substitution sequence Updating of the sequence and reloading into the OBDH ASW data area Low complexity
Missing functionality
Change some values and control logic in the AOCS Software as outcome from the commissioning Generate and load new AOCS code release High complexity
Missing functionality
Change of specification

Note: last four entries refer to post-launch problems.

Tab. 1 - Problems solved

In these cases the SMF resulted decisive for the complete success of the program, overcoming critical conditions detected both before and after the SAX launch.


CONCLUSIONS


From the implementation & testing phase and, furthermore, from the mission support, an important experience is matured. This is applicable not only to the SMF itself but can also be used as feedback to the on-board architectures and implementations. The key point to be absolutely ensured is, of course, the availability of adequate on-board memory margins. Critical functions have to be stored in non volatile memory. This has the advantage of not loosing the software code in case of resets, but, on the other hand, code defects couldn't be removed unless a download to RAM takes place at initialization. A RAM modification can then be operated. To make permanent any modification - not to loose critical functions or not to loose upload time during the mission - re-writable memory, e.g. EEPROM, with dedicated protections, is strongly suggested.
Modifications of critical memory areas as well as interventions on running code have to take into account many constraints, such as access protocol, task scheduling, timing limitation, that cannot usually be fully controlled by ground. Specific tasks of the OBDH should be designed for properly downloading, on request, a set of data/code to the addressed terminals. Similar specific tasks should also be designed in the intelligent terminals software for properly supporting the access to the protected memory, as well as the modification of their critical functions without being asinchronously interrupted by external patches. A table driven design allows ground interventions on most of the on-board functions, e.g. scientific data allocations, FDIR management, telemetry formatting, attitude controls, etc..
In many cases different telemetry formats or a different data sampling rate from the satellite is desired by the analysts: a more powerful information can be obtained if a programmable telemetry can be commanded by ground. This would help very much the analysis and trouble-shooting activity addressing much faster the SMF activity to the problem solution.
The final recommendation to point-out concerns the facility design. Possible delays in the HW or SW deliveries have to be taken into account for not affecting the start and continuation of the SMF activities. The lack of a unit or a software module should be tackled by a very modular design, implementing the capability to properly simulate also those parts of the target environment not yet ready, because of delivery delay, or temporary not available, since under upgrading/repairing.