Article SpaceOps 96SO96.5.06 1

MODELLING INFORMATION SYSTEMS WITH MODARCH

C. Panem* F. Jocteur Monrozier**

* CNES, 18 Av. E. Belin, 31055 Toulouse Cedex

Fax : 61.28.26.03, E-mail : panem@cnes.fr

** scientifique du contingent

Abstract : System designers are used to evaluate data processing needs and to foresee the operationnal requirements of their system. With simulation tools, they can design, model and evaluate the behaviour and the performances of the information system architecture and automatically perform as many experiments as there are combinations of varying parameters. Advanced analysis tools help to understand the behaviour and the performances of the system. Useful in design phase for evaluation of future performances and resources needs, models can be reused as references for validation purpose, but modelling is also more and more used in exploitation and maintenance phases. Enhancements and evolutions are foreseen even before the system being operational, simulation helps to define the best way to integrate new evolutions and to optimise the existing system. This article shows how the MODARCH tool has allowed to evaluate the architecture of the french CLUSTER data processing and archiving centre and its use for other CNES projects.

1. INTRODUCTION

This article starts by a description of the tools environment that is currently used for modelling information systems in CNES. Then it presents a concrete application modelling with these tools for the french data processing and archiving centre of the ESA project CLUSTER. It ends with some conclusion on modelling benefits and future studies.

2. TOOLS ENVIRONMENT

Modelling data processing architectures is now a reality for CNES ground segments design, with MODARCH/MODLINE®, a set of simulation facilities, which is finely integrated in a design workbench called ELISA (cf. [ELISA]). As shown in figure 1, ELISA inherits its architecture from software engineering technology. It offers three kinds of services, the former are called "horizontal services", like configuration management and project management supports, technical documentation generation and traceability service. The second are called "integration services": ELISA manages all the objects produced during architectural design in an object base, according to a predefined data model which represents all pertinent information for a system designer in an organized network (functions, requirements, documents, equipments, simulation scenarii and results, etc..). The integration services are used for plugging-in Commercial Of The Shelves tools through their data and for transparent access to their operations by means of a uniform user interface. The latter are called "vertical" services and propose access to external tools for each phase of the design process.

Figure 1: ELISA environment for simulation of data processing systems

With ELISA, the modelling process consists of three phases. At first a system designer has to define his system requirements and constraints in terms of functional decomposition with ASA®, resources, performances and operational needs in LOTUS 123® spreadsheet, processes scheduling with graphical chronograms. Then he can create as many models as he can find alternative hardware and software solutions that satisfy his functional needs (distributed, centralised..), simulate them and perform results analysis. The designer starts by the simulation of the software architecture and its behaviour, and then searches the best hardware configuration model that supports the previous tasks. The last action is to select the best solution, with respect to commercial proposals for equipments and middlewares and to compute the overall cost; indeed for each model several different configurations (e.g., HP, SUN, DEC..) and prices can be proposed. At any time, It is possible to generate the documentation concerning the current phase, simulation reports can be printed and provided for choices justification with Frame Maker® tool.

MODARCH/ MODLINE is itself a complete integrated set of tools dedicated to the study of the performances of distributed computer systems. It is particularly suited for systems like distributed applications or databases, client/server architectures, multi-processor architectures, industrial local networks or embedded systems. It relies on queueing network technology (QNAP2 from Inria/F and Bull).

MODARCH is the unique tool in its category that allows to separately model and simulate software applications and computer configuration. Evaluation of the application's behaviour can be done for several possible hardware architectures and different software applications can be evaluated on one particular hardware architecture. This aspect is one that made us select this tool, the objective being to assess the best hardware/software adequation, the mapping of software on hardware has to be as flexible as possible.

MODARCH provides a graphical and simple user interface to describe the system in a hierarchical way, its library offers a set of built-in hardware and software components models. Macroscopic models include processors, terminals, storage units for hardware, they can be connected to each other by communication links : bus, drop cables, point-to-point links. For the application part, the library provides message generators, tasks, communication units (for files or databases). The software components can be linked according to the application's messages exchanges. The tasks behaviour is described by means of a reduced set of commands devoted to resources consumption or messages communication : send/receive, wait, delay, compute, read/write, memory allocation, etc.. Basic standard software lawyers (Unix operating system, communication protocols like Ethernet, X25, NFS, TCP, etc..) are embedded in the library components and their implementation is hidden to the user. Mapping of software elements onto hardware equipments is made through graphical links, MODARCH automatically manages the transfer and routing of messages and data through the networks, no code has to be changed when a task allocation is moved from one computer to another.

MODARCH includes MODLINE advanced tools for animation of the models, for experimental plans creation and execution management : any system characteristic (e.g. CPU value, message length) can be declared as a variable that will generate as many runs as it can have different values. MODLINE also automatically collects simulation results and offers user friendly graphical representations : curves, chronograms, histograms, pie charts.. and compute statistical results like means for hardware components loads, tasks response time, network traffic, etc.. Reports including all simulation objects (models, plans, results, results curves..) can be generated on user request in a lot of standard formats.

The integration within ELISA provides automatic initialisation of models from functional analysis, traceability management between requirement tables, sequencing hypothesis (chronograms) and simulation objects (scenarii, results, analysis plots) and documentation. Developped for the CNES Toulouse Space Centre, ELISA has been used for demonstrative applications since 1994 and is now used with MODARCH, at real scale for the modelling of several ground architectures.

3. A CONCRETE APPLICATION ON CLUSTER PROJECT

The CLUSTER french ground segment for data processing, distributing and archiving had very important needs in terms of resources consumption : 800 Mbytes of input raw data (2 CD-ROMS) that produce 7 Gigabytes of results per day, 5 Gigabytes being archived (3 hours elapsed time for archive) per day, being able to process 14 days of data within 1 week (2 to 3 CP hours for each day of data), graphical editions and diffusion through CD ROM production, operators in the loop, all that for a nominal mission duration of 2 or 3 years. The project had obviously very constraining dimensioning requirements that justified the modelling study.

The CLUSTER study started in January 96. As the CLUSTER "Centre de Traitement de Masse" (or CTM) development was already going-on, the main objectives were to evaluate the operationnal synchronisation and parallelism between the different application processes (data input, archival, data processing) and to propose optimisation of the resources allocation. As usual, as far as the project was going on, input constraints evolved and the model had to take it into account continuously.

Throughout the CLUSTER mission, the CTM had to support the following functions :

- periodically acquire the CD-ROMs containing all raw data sets, from ESOC,

- produce intermediate level data from the raw data concerning 3 french laboratories (CESR, CETP and LPCE), and final products for the overall international CLUSTER scientists community,

- acquire final CSDS (CLUSTER Science Data System) products maintained at the other Data Centres from Cluster experiments,

- archive the CLUSTER data and products,

- distribute the data , either by systematic diffusion, either by taking them at the scientists'disposal.

The CLUSTER CTM is composed of two main sub-systems : the Data Server, which is responsible of the activation of processing chains and of the management of all input and output data, and the CLUSTER Science Data System (CSDS) User Interface, which manages the mission data exchanges between CLUSTER National Data Centres and the access from the scientists.

Figure 2: Modarch model for the french CLUSTER CTM

The Data Server was installed on a Sun Sparc 2000 of the CNES Computer Centre, connected with two services : the SEM (Media Exchange Service), and the STAF (Files Transfer and Archival Service). The CSDS User Interface runs on a dedicated workstation.

The Data Server behaviour was as first modelled in a simplified way, just enough precise for validation of exploitation chronograms. The model takes into account the validation of all processing steps by human operators. The study of the associated constraints (repartition on working/non-working hours and days) on the processes allowed to propose several scheduling hypothesis, and to estimate the needs in terms of hardware resources. The model includes models of the SEM and STAF services, with their own processors, storage devices, output peripherals and associated software tasks.

The graphical model of the CTM Data Server is shown in figure 2, the analysis of processes chronogram showing the repartition of processes between daily and night hours can be found in figure 3, with the assumption that the machine is devoted to the project (no other tasks running on the same machine).

Figure 3: Day/night distribution of processes in single project hypothesis

One problem raised when it was decided to precise the model by taking into account the fact that the machine and also the SEM and STAF services are in fact shared by several other projects. The objective was to decide how to model the overlapping of applications on a set of systems.

With MODARCH, it was easy to perform sensitivity analysis, the tuning was to find the parameters that had a real incidence on the system results. Two kinds of analysis were conducted : study of the optimal repartition of processes during days and nights, and study of shared resources availability percentage.

Figure 4: Sensibility analysis of Resources loads in single project hypothesis

The first solution was to make sensitivity analysis on a selected set of resources : (25% overload on SEM), (50% overload on STAF) and (25% overload on PROJET-CI), and to show on graphics the impact of resources load on the actual model outputs (see figure 4). The model results analysis allowed to find which processes and archivals could be done sequentially or in parallel in order to optimise resources load during days, nights on the week.

4. MODELLING DRAWBACKS and MODELLING BENEFITS

Modelling with tools like MODARCH is not a huge task, the more time consuming activity is to understand the system behaviour and constraints, to make good hypothesis and to analyse simulation outputs. Realising the model is just a piece of cake compared to the time spent on the overall study.

A difficulty was due to the use of multi-projects existing services, that had to be modelled in a macroscopic but still reliable way.

The benefits from the CLUSTER modelling were the concrete proposals on operations chronograms and on possible loads variations due to cohabitation with other projects, these could have been very useful if the mission had not unfortunately been stopped. Once models are available, we can save lot of time by analysing evolutions of parameters or other possible scenarii (CD-ROM delivery, operators vacations, etc..).

5. CONCLUSIONS

Two studies have naturally followed from the CLUSTER study : one concerns the modelling of multi-purpose systems, the second deals with the integration of a fine model of such a service within a higher level system.

The first study's objective is to define how to model a system that is used by several projects, and how to impact the new project with existing operational loads, hidding the other applications behaviour. The study started in May 96 and is now finished, the multi-projects model and the associated usage methodology will be reused for new projects in the ELISA scope.

The second study concerns the realisation of a fine model for the STAF (File Archival and Transfer Service). This study has two objectives : get a better understanding of this system for maintenance and evolution purpose and analyse of how fine model results can be introduced within a higher level model like CLUSTER. The study started in January 96 and is still going-on.

Other models are currently performed at CNES, as there are not so many new projects and associated data processing ground segments, most of them concern the modelling of existing systems. Several benefits can be gained from "a posteriori" modelling activity : understand how the system operates and demonstrate it graphically (very efficient tutorial for new collaborators), propose optimisation of resources usage or processes mapping, propose different operations scheduling if necessary and moreover reuse the actual models for evolution proposals, as it will be done with SPOT4 models for SPOT5 development in real-time domain.

[ELISA] ELISA, an integrated environment for information systems design. C. Panem, O. Renaux. System Engineering Workshop, ESTEC, November 95.

ASA® is a registered trademark of Verilog F.

MODARCH/MODLINE® is is a registered trademark of Simulog.

LOTUS 123® is a registered trademark of Lotus Development Corp.

EMERAUDE PCTE® is a registered trademark of Transtar.

FRAME MAKER® is a registered trademark of Frame Technology Corp.