Next: 7 Further Information and Other Areas of Interest Up: Appnotes Index Previous:5 Hardware / Software Codesign Process Applied to Mixed COTS / Custom Architecture

RASSP Hardware / Software Codesign Application Note

6.0 Lessons Learned

Some of the lessons learned during the RASSP benchmarking process about Hardware/Software Codesign are described. As stated at the outset, Hardware/Software Codesign is the co-development and co-verification of hardware and software through the use of simulation and/or emulation. Based upon the RASSP Benchmarks, the following observations are made:

6.1 Model validation

Model validation is a crucial step with any simulation or analysis activity. Validation occurs in two ways, component validation and process validation. Component validation is the verification of individual models and their characterizations. Process validation is verifying the results of a system model against actual numbers from the resulting system. Individual leaf cell (lowest level model) verification is relatively straightforward. Leaf cells typically are characterized to model different types of components such as memories, ALUs, etc. These can be verified through simple testbenches and for off the shelf parts, timing numbers are readily available. As the design process progresses, a performance model's accuracy should be continually checked against more detailed models as they become available or measurements from the actual components. Any mismatch should be corrected to maintain the performance model's accuracy, to test for continued compliance with requirements, and to support subsequent re-use and model-year upgrades. This activity departs from traditional processes which do not maintain -and therefore effectively discard- the performance model once the architecture design has completed.

6.2 Simulations must be rapid

When conducting performance simulations early in the design process, the simulations must be rapid. However, rapidly simulating a significant portion of the real-time SAR application executing on the full system composed of as many as 24 PE's and crossbar elements required a much higher efficiency and modeling abstraction than that of the typical ISA-level model. To be abstract, yet accurate, only the necessary details were resolved in the model. These included significant protocol events such as initiation and termination of data transfers as well as significant computational events such as the begin and end of bounded computational tasks. The resolved events focus around contention for computation and communication resources whose usage time, once allocated for a task or transfer, was highly deterministic. The simulation is valuable because the contention for the multiple resources is not conveniently predictable.

6.3 Tools must maximize useful information to the designer

The design group produced time-line graphs from the simulation results which showed the history of task executions on the PE's. The graphs were useful in helping to visualize and understand the impact of mapping options that led the design group to modify, optimize, and ultimately verify the partitioning, allocation, and scheduling of the software tasks onto the hardware elements. The time-line graphs showed the times when PE's were idle due to data starvation or buffer saturation that helped isolate other resource contentions and bottle-necks. Plots of memory allocation as a function of time were also valuable in visualizing and balancing the extent of memory usage throughout the algorithm execution. The resultant software task partitioning and schedules led directly to the production of the target source code through a straight-forward translation into sub-routine calls. Because the ultimate implementation became the sum of time-predictable events for which linear additivity basically holds, it was not surprising that the simulation results accurately predicted the physical system's actual run-time performance within a few percent.

6.4 New tools and tool interoperability required

A major portion of the ATL RASSP program was directed toward the development of new tools as well as providing interoperability and seamless transition between tools. The execution of the benchmark program convinced us that the overall approach taken on the RASSP program is valid. New tools at the architecture definition level which facilitate the rapid tradeoff of candidate architectures by predicting performance of the application software on the virtual architecture improve the early design process. The direct coupling of these high level tools with autocoding tools that provide the ability to generate target software automatically which is comparable in performance to hand generated software greatly improves the software development process.

6.5 Software generation for multiprocessor systems will get easier

Software for signal processing including the interprocessor communications on large systems can and will be automatically generated with nearly the same efficiency as hand coding for many applications. Experiments on the RASSP Benchmarks using these new tools indicate that such tools are capable of eliminating the software development associated with interprocessor communication which is where the majority of integration and test time is expended. This elimination of interprocessor communication software development, coupled with graph based application development will provide an order of magnitude productivity improvement in signal processing software generation. Furthermore the productivity improvement ssociated with retargeting an application from one architecture to another will be even greater.

6.6 Software development paradigm shift is required

In order to achieve the very large productivity improvements promised by the utilization of autocoding tools, the software developers must begin to think in a data flow paradigm. If the overall signal processing required is defined in a data flow paradigm using tools designed for this purpose, then architecture tradeoffs naturally flow from this same description; detailed functionality is achieved by systematically incorporating the functionality into the data flow description; and the fully functional data flow graph also drives the automatic code generation process which is supported by a vendor supported run-time kernel that is compatible with the generated autocode. When implemented properly, even larger productivity improvement can be achieved when systems must be modified due to algorithmic upgrades or insertion of new hardware technology. When applications are developed and maintained at the data flow graph level, retargetability becomes (in large part) a responsibility of the autocode tool vendor rather than the application developer.

6.7 Legacy systems

There are few systems which are developed from scratch. More often, existing applications are upgraded with additional functionality, new algorithms or new hardware. The transition of legacy software which is perfectly valid to a data flow paradigm can be a major effort and should not be underestimated. Once the viability of autocoding tools is clearly established, algorithm developers will begin to use the autocode tools for algorithm development which will bypass the need for later code conversion. Legacy systems and their existing software must still be dealt with. The best that current technology offers is that the potentially labor intensive conversion of existing code to a data flow description must be done only once - future software or hardware upgrades become much easier. In contrast, without making this shift to a data flow description which is supported by automatic code generation, existing systems will continue to require major software rewrites each time the hardware is upgraded.

Next: 7 Further Information and Other Areas of Interest Up: Appnotes Index Previous:5 Hardware / Software Codesign Process Applied to Mixed COTS / Custom Architecture

Approved for Public Release; Distribution Unlimited Dennis Basara