ISBN-13: 9789400733060 / Angielski / Miękka / 2012 / 406 str.
ISBN-13: 9789400733060 / Angielski / Miękka / 2012 / 406 str.
This book presents a systematic methodology for exploiting word-width information in embedded compilers. It details a technique for a context-driven strength reduction for constant multiplications, including a trade-off with application accuracy requirements.
Preface. Contents. Glossary and Acronyms. 1 Introduction. 1.1 Context. 1.2 Focus of this book. 1.3 Overview of the main differentiating elements. 1.4 Structure of this book. 2 Global State of the Art Overview. 2.1 Architectural components and mapping. 2.2 Platform architecture exploration. 2.3 Conclusion and key messages of this chapter. 3 Energy Consumption Breakdown and Platform Requirements. 3.1 Platform view: a processor is part of a system. 3.2 A video platform case study. 3.3 Embedded processor case study. 3.4 High level architecture requirements. 3.5 Architecture Exploration and Trends. 3.6 Architecture optimization of different platform components. 3.7 Putting it together: FEENECS Template. 3.8 Comparison to related work. 3.9 Conclusions and key messages of this chapter. 4 Overall Framework for Exploration. 4.1 Introduction and motivation. 4.2 Compiler and simulator flow. 4.3 Energy estimation flow (power model). 4.4 Comparison to related work. 4.5 Architecture exploration for various algorithms. 4.6 Conclusion and key messages of this chapter. 5 Clustered L0 (Loop) Buffer Organization and Combination with Data Clusters. 5.1 Introduction and motivation. 5.2 Distributed L0 buffer organization. 5.3 An illustration. 5.4 Architectural evaluation. 5.5 Comparison to related work. 5.6 Combining L0 instruction and data clusters. 5.7 Conclusions and key messages of this chapter. 6 Multi-threading in Uni-threaded Processor. 6.1 Introduction. 6.2 Need for light weight multi-threading. 6.3 Proposed multi-threading architecture. 6.4 Compilation support potential. 6.5 Comparison to related work. 6.6 Experimental results. 6.7 Conclusion and key messages of this chapter. 7 Handling Irregular Indexed Arrays and Dynamically Accessed Data on Scratchpad Memory Organisations. 7.1 Introduction. 7.2 Motivating example for irregular indexing. 7.3 Related work on irregular indexed array handling. 7.4 Regular and irregular arrays. 7.5 Cost model for data transfer. 7.6 SPM mapping algorithm. 7.7 Experiments and results. 7.8 Handling dynamic data structures on scratchpad memory organisations. 7.9 Related work on dynamic data structure access. 7.10 Dynamic referencing: locality optimization. 7.11 Dynamic organization: locality optimization. 7.12 Conclusion and key messages of this chapter. 8 An Asymmetrical Register File: The VWR. 8.1 Introduction. 8.2 High level motivation. 8.3 Proposed micro-architecture of VWR. 8.4 VWR operation. 8.5 Comparison to related work. 8.6 Experimental results on DSP benchmarks. 8.7 Conclusion and key messages of this chapter. 9 Exploiting Word-width Information during Mapping. 9.1 Word-width variation in applications. 9.2 Word-width aware energy models. 9.3 Exploiting word-width variation in mapping. 9.4 Software SIMD. 9.5 Comparison to related work. 9.6 Conclusions and key messages of this chapter. 10 Strength Reduction of Multipliers. 10.1 Multiplier strength reduction: motivation. 10.2 Constant multiplications: a relevant sub-set. 10.3 Systematic description of the global exploration/conversion space. 10.4 Experimental results. 10.5 Comparison to related work. 10.6 Conclusions and key messages of chapter. 11 Bioimaging ASIP benchmark study. 11.1 Bioimaging application and quantisation. 11.2 Effective constant multiplication realisation with shift and adds. 11.3 Architecture exploration for scalar ASIP-VLIW options. 11.4 Data-path architecture exploration for data-parallel ASIP options. 11.5 Background and foreground memory organisation for SoftSIMD ASIP. 11.6 Energy results and discussion. 11.7 Conclusions and key messages of chapter. 12 Conclusions. 12.1 Related work overview. 12.2 Ultra low energy architecture exploration. 12.3 Main energy-efficient platform components. References.
Modern consumers carry many electronic devices, like a mobile phone, digital camera, GPS, PDA and an MP3 player. The functionality of each of these devices has gone through an important evolution over recent years, with a steep increase in both the number of features as in the quality of the services that they provide. However, providing the required compute power to support (an uncompromised combination of) all this functionality is highly non-trivial. Designing processors that meet the demanding requirements of future mobile devices requires the optimization of the embedded system in general and of the embedded processors in particular, as they should strike the correct balance between flexibility, energy efficiency and performance. In general, a designer will try to minimize the energy consumption (as far as needed) for a given performance, with a sufficient flexibility. However, achieving this goal is already complex when looking at the processor in isolation, but, in reality, the processor is a single component in a more complex system. In order to design such complex system successfully, critical decisions during the design of each individual component should take into account effect on the other parts, with a clear goal to move to a global Pareto optimum in the complete multi-dimensional exploration space.
In the complex, global design of battery-operated embedded systems, the focus of Ultra-Low Energy Domain-Specific Instruction-Set Processors is on the energy-aware architecture exploration of domain-specific instruction-set processors and the co-optimization of the datapath architecture, foreground memory, and instruction memory organisation with a link to the required mapping techniques or compiler steps at the early stages of the design. By performing an extensive energy breakdown experiment for a complete embedded platform, both energy and performance bottlenecks have been identified, together with the important relations between the different components. Based on this knowledge, architecture extensions are proposed for all the bottlenecks.
1997-2024 DolnySlask.com Agencja Internetowa