Computational Fluid Dynamics, Biochemical Algorithms, Molecular Modeling are cirital applications in domains such as atmospheric, oceanographic simulations, quantum chemistry, biochemistry, energy resrouces and magnetic fusion. Supercomputers with PC or workstation based clusters with million dollar contracts are being employed to respond to the computation demand. Rapid increase in the speed of the microprocessors leads to headroom probem for designers of parallel systems, such that at the end of the design cycle resulting with parallel computers of previous generation's processors competing against current faster processors. This give a narrow time window for the parallel processor to be designed and built to compete against the rapid performance growth of processors. The complexity, variety of techniques and tolls, and the high computations, storage and I/O bandwidths associated with these applications pose several challenges, particularly from the point of scalability, resource utilization (in terms of area and energy) and real-time implementation. Current technologies such as general pupose processors and special purpose programmable processors fall short of providing low cost and flexible solutions. These drawbacks have lead into exploration of the reconfigurable architecture design space. It has been shown by several researchers that above applications are well suited to be executed on spatially parallel processor architectures. Common computation modules, iterative execution, modules providing a gneric base class resulting with large portions of code being reused for the implemenation of new modules are the characteristics of these algorithms wich makes them attractive for a reconfigurable solution. Look up Table (LUT) based Field Programmable Gate Arrays (FPGAs) in particular offer large amounts of on-chip spatial parallel units, thus capable of performing orders of magnitude faster than regular serial processors. But FPGAs suffer from the drawbacks of being application agnostic and hence incur penalties of loss of clock cycles in redundant reconfigurations, generic routing and poor memory architecures which impact speed, power, and silicon area. All these factors have led us into exploring the reconfigurable architecture design space with the applicationd omain being prioritized. This research work proposes a new methodology to derive a common, application specific heterogeneous hierarchical routhing architecture. Methodology involves the steps of control flow graph generation of target application domains through the LANCE compiler, mapping pure data dependent basic blocks in each CDFG into LUTs, packing LUT based netlist into variable size clusters, tree represenation of hierarchy formation of each application, finding edit distance between the trees representing each application, generating actual processing elements by extracting the common patterns, profiling the connectivity information between processing elements, deriving switching and wiring requirements and finalizing the routing architecture. Those specific applications are then mapped on the processing elements, placed and routhed using the existing placer and router algorithms after modifying them to fit in the proposed architecture constraints. Packing module, tree comparison modules have been implemented and tested. Packing function outperforms both VPack and RPack. Chawate's algorithm have been chosen for tree comparison due to its simplicity, memory and time efficiency. A pre-placement step has been proposed to let the actual placement start with an initial smart placement instead of random placement. We propose that this methodolgoy with provide optimum interconnection pathways between different hierarchy levels with variable size processing elements, allocating just enough switching and wiring resources as a result of profiling the computation characteristics of the application domains.
|Published - Jun 21 2004