Outils personnels
Vous êtes ici : Accueil
Actions sur le document

Welcome to the ROMA website

Par WebmasterDernière modification 26/04/2010 15:04

ROMA's home page

ROMA is an ANR (Agence Nationale de la Recherche) project

Programme "Architectures du Futur" (ANR-06-ARFU6-004-01)

ROMA : Reconfigurable Operators for Multimedia Applications (2007-2010)

In multimedia applications, image processing is the major challenge embedded systems have to face.  It is computationally intensive with power requirements to meet. Image processing at pixel level, like image filtering, edge detection, pixel correlation or at bloc level such as motion estimation have to be accelerated. For that goal, the ROMA project proposes to develop a reconfigurable processor, exhibiting high silicon density and power efficiency, able to adapt its computing structure to computation patterns that can be speed-up and/or power efficient. On the contrary of previous attempts to design reconfigurable processors, which have focused on the definition of complex interconnection network between simple operators, the ROMA project will study a pipeline-based of evolved low-power coarse grain reconfigurable operators to avoid traditional overhead, in reconfigurable devices, related to the interconnection network.

CEA LIST  Contact: Raphael David
THOMSON France R&D Contact: Erwan Raffin                                                              

ROMA in more details :


In multimedia applications, video and image processing is one of the major challenges embedded systems have to face.  Such applications are typically computationally intensive with control statements and designers have to cope with power and performance stringent requirements when system integration is investigated. The ROMA project proposes to develop both a design methodology and a reconfigurable processor able to adapt its computing structure to video and image processing applications. The processor is built around a pipeline of coarse grain reconfigurable operators exhibiting efficient power and performance features. On the contrary of what has been done in previous reconfigurable processors, flexibility is not obtained through the use of a flexible interconnect network but on the use of mutable units. These units can be configured for the function they implement, the code data are represented with and the data bit-width. The configuration of the processor is dynamically done all along the application depending on the tasks that are to be carried out. Higher performance in terms of power consumption and computing power, with at least one-magnitude order with regards to state-of-the-art energy-efficient reconfigurable architectures, is expected.

The methodology is based on a quite traditional top-down design flow but including features that make it possible to provide efficient configurable systems for domain specific applications (figure 1).

 flot ROMA

Figure 1. Design and compilation flow

The reconfigurable processor is in charge of implementing parts of the code corresponding to loops and frequently executed code fragments that can be accelerated and/or which are good candidates for low power designs. The rest of the application is executed on a standard processor core connected to the reconfigurable architecture. This connection will be made at the level of processor datapath (instruction set extension) and at the level of peripherals.

Hence, these parts of code are to be extracted from the specification of the application to implement. A GCC-based compilation flow is used as a front-end. This front-end is then coupled to an internal representation based on hierarchical conditional dependency graphs (HCDG). This formal representation has several advantageous over classical control data-flow graphs (CDFG). It represents data and control information of the design in a unified way. Formal semantics as well as semantic preserving transformations can be defined. Synthesis and code generation steps are highly independents of variations of the input specification. Moreover the exclusiveness information can be simply derived which is of major importance in the case of resource sharing. The compilation techniques studied and developed in this ROMA project operate on this HCDG representation which is annotated for transporting information stemming from the profiling of the applications. Pattern extraction can then be applied to this graph and the following three tasks can be performed:

1) if necessary, pattern code transformation to exhibit features the reconfigurable processor can efficiently implement, that is to say match the patterns and the processor configuration,

2) accuracy evaluation according to the application accuracy constraints. From this evaluation, the optimized size of the data can be carried out. Fixed-point format will be used due to low power requirements.

3) internal data representation choice. The representation of internal variable of a particular computation can be optimized to reduce bit activity or enhance speed and area.

These tasks aim at increasing performance and reducing power consumption of the system.

A compilation step is then completed to generate on the one hand the software environment -including hardware function calls- implemented onto a main processor and on the other hand the configuration of the configurable architecture. Both software and hardware generated code are intended to be direct input codes for conventional back-end commercial tools.

 Performance and low power consumption mainly depend on the configurable processor. Its structure and its units are thus designed and optimized for a set of domain specific computations, in our case dealing with video and image processing. The ROMA project is thus based on an initial identification of processing tasks that will define the granularity of the operators (units). Previous work on configurable architectures have shown that fine or medium grain operators (like adders, multipliers, …) lead to interconnection dominated processors. Increasing the granularity of the operators for a set of well-identified computations allows to cope with this problem. For example, a butterfly operator (or structure) can be very useful for FFT-based applications compare to lower level operators such as adders, multipliers and so on. Furthermore performance can be increased with the pipeline structure of the processor. In other respects, based on the analysis of the nature (integer, real number approximation, accuracy, signed or unsigned, domain) of the operands and results the number system used for the representation of the internal values inside an operator will be optimized. For instance, a signed digit redundant number system can be chosen to represent sums of several values in order to speed up the computation and to avoid the activity due to the sign extension of the 2's complement representation. Actually the operands as well as the result of the operator will respect the original specification given by the source specification, but the representation of any internal variable can be changed. Finally data sizing resulting from the accuracy evaluation step makes it possible to configure the processor to optimize both performance and power.

Thus, targeting video and image processing applications, the ROMA project aims at generating performance and energy efficient architectures, i.e. a synthesizable RTL model of the reconfigurable processor.

The compilation flow (pattern extraction, code transformation, accuracy optimization and data representation) including performance analysis will be developed based on current work of the consortium. A feasibility proof will be firstly attempted based on basic applications.  A video compression application to be defined by our industrial partner THOMSON R&D France is intended to be investigated and prototyped onto a usual commercial prototyping board including a processor and configurable logic (typically a reconfigurable SoC).

Contenus correlés

Réalisé avec le CMS Plone, le système de gestion de contenu Open Source

Ce site respecte les normes suivantes :