Workload Adaptive Power Management with Live Phase Monitoring and Prediction

Workload Adaptive Power Management with Live Phase Monitoring and Prediction

Canturk Isci

In current computer systems, power dissipation is widely recognized as one of the primary critical constraints. Improving the power efficiency of current and emerging systems has therefore become a pressing challenge and an active research area over recent years. Dynamic, on-the-fly management techniques aim to address this challenge by adaptively responding to the changes in application execution. These application patterns, commonly referred to as "phases", expose distinct, dynamically-varying and often repetitive characteristics of workloads. Dynamic management techniques, guided by workload phase information, can effectively tune system resources to varying workload demands for improved power-efficiency.
This thesis researches new methods to characterize and predict application behavior for a dynamic power management endgoal. Specifically, this work has two major thrusts. First, it explores different approaches to characterize and predict dynamically varying workload power behavior. Second, it discusses runtime management techniques for real systems that can proactively adapt processor execution to varying application execution characteristics.
This work develops a runtime, real-system power model that provides processor power consumption details in terms of the component powers of different architectural units. We show that similarity analysis methods applied to these component powers help expose power phase behavior of applications. A small set of "power signatures" can represent overall workload power characteristics within 5% of the actual behavior. We develop a "transition-guided" phase detection framework that can identify repetitive application phase patterns despite system-induced variability effects. This detection strategy can identify recurrent phase signatures with less than 5% false alarms on running systems. Last, we propose a workload-adaptive dynamic power management framework guided by runtime phase predictions. This predictive power management approach is shown to improve the energy-delay product of a deployed platform by 7% when compared to existing reactive techniques and by 27% over the baseline unmanaged system.
Overall, this thesis shows a roadmap to effective on-the-fly phase detection and prediction on real-systems for application to workload-adaptive dynamic power management. With the increasing focus on adaptive and autonomous system management, this research offers practical techniques that can serve as integral components for current and emerging power-aware systems.

Chapter 1

1  Background and Motivation

Computing systems have experienced a tremendous sustained growth in performance and complexity for more than two decades. Exponentially increasing transistor integration enables more devices to be packed within single chips, which in turn provides more functionality and state with each generation of processors. Figure 1.1 illustrates this for a range of processor families [12,36,53,56,141,144]. Moreover, reduced process dimensions enable faster switching transistors, driving higher operating frequencies with each generation. Coupled with technology advances, new architectural and compiler techniques have pushed the performance bar even higher with deeper pipelines, high speculation, out-of-order and superscalar microarchitectures, and increasing instruction-level parallelism. In addition, new simultaneously multithreaded and multicore systems enable thread-level parallelism [66,134,156,157,170]. All of these advances translate into more computations per unit time with each new computer generation.
Figure 1: Number of transistors within a die over time.
From a historical perspective, these have been tremendous forward progress in computing performance. By leveraging both technological and architectural advances, microprocessor designers have been able to actually surpass the performance trends indicated by Moore's Law [129,133]. For example, when we look at the reported performance results with the SPEC CPU2000 benchmarks between 2000 and 2006, we see more than 10-fold increase in integer performance and 14-fold increase in floating point performance for Intel family processors [165]. This unabated push towards higher performance and reduced form factors has provided currently emerging mobile devices with computing capability that was previously confined to mainframe systems.
Figure 2: Processor power density over time.
Nonetheless, this forward progress in performance has not come for free. Together with increasing clock rates and performance capabilities, the power dissipation of computing systems has also accelerated rapidly. Figure 1.2 illustrates this for Figure 1.1's processor families over the same time period [12,31,53,57,141]. As this figure demonstrates, processor generations also experienced an exponential increase in power density. This increase in power density has recently become one of the primary constraints in microprocessor design. First, stemming from both increased power dissipation and widespread adoption of personal computers, the overall energy impact of computing systems has become an important issue. Once again looking from a historical perspective, the total worldwide processor power dissipation of personal computers increased by more than 50 times over the last decade [173]. Second, increasing power density has also directly influenced thermal limitations of processors, requiring advanced cooling and thermal management strategies [58,155]. Third, increasing power demand, as well as the temporal and spatial power variations within microprocessors have produced significant strain on effective and reliable power delivery [92,141]. Last and more recently, the financial and environmental impacts of computing system power dissipation has also been widely acknowledged. Especially in large-scale data centers, the current annual cost of power delivery and cooling has reached to the order of millions of dollars. If the current trend-that advances in computing performance are accompanied with rising power demand-continues in the next generation systems, the ongoing costs of power and cooling can soon surpass the initial cost of the underlying computing hardware by a growing margin [12]. To address the impacts of computer power dissipation, the Environmental Protection Agency has recently announced new specifications for computer power-efficiency [171]. Based on the projections of these specifications, improving the energy-efficiency of computing systems can potentially achieve $1.8 billion of total energy cost savings over the next five years. Moreover, such emphasis on computing power can eliminate greenhouse gas emissions equivalent to the annual emissions of 2.7 million cars.
Interestingly, this is not the first time the computing industry has faced the power challenge. Early mainframe systems that relied on bipolar devices had experienced a similar exponential growth in power until the early 1990s, at which point the mainframe industry had to move towards CMOS devices that enabled an order of magnitude improvements in power densities [146]. Less than two decades later, we have once again approached the limits of power density. As CMOS technology continues to be the viable design option for microprocessors, there is a growing necessity to devise and employ effective power management techniques in all levels of computing systems, from circuits and architectures to systems and software. Indeed, recent years have unveiled numerous research efforts that aim to address power-efficiency at all levels of abstractions.
These different power-management strategies can be categorized as either static and dynamic management approaches. Static, or offline, techniques involve design-time decisions, profile-based optimizations and compiler-driven management responses. These approaches are employed at various design stages and abstraction layers. These include circuit-level techniques such as transistor reordering and dual-threshold circuits [104,118,161], architectural mechanisms such as profiling-based adaptations at subroutine granularities or execution checkpoints [7,75], systems- and application-level approaches such as task partitioning and stretching, deadline-based scheduling, software transformations and remote execution [43,102,114,164], and compiler-driven management techniques that involve profiling and instrumentation of applications with power management hints or state keeping instructions [1,65,71,122,154,180].
Dynamic, or online, power management techniques involve runtime control mechanisms in hardware or software; they tune the configurable computing resources during execution. There is a large variety of dynamic management techniques across the whole spectrum of computing systems hierarchy, spanning from circuit level techniques to application and compiler level power management. Circuit-level adaptations include techniques such as adaptive body biasing and multi-threshold CMOS circuits (power gating) [4,97,98]. Architectural power management techniques involve pipeline reconfigurations [3,8,26,90,139,153], adaptive cache scaling and decay [41,48,96,140], pipeline-delay-based supply voltage tuning [47], speculation control [23,123], multiple clock domain architectures [147,178] and management techniques for chip multiprocessors [94,103,115]. At the system-level many power-aware adaptations exist that target at dynamic management of the system operation and the underlying platform components. One of the most widely used dynamic power management techniques at the system level is workload-dependent dynamic frequency and voltage scaling [33,176]. Some other employed dynamic power management techniques are adaptive disk control [60], energy-efficient I/O and memory management [110,162,136,143,177,186], task-level energy budgeting [5,20,119] and power-aware scheduling [67,127]. In addition to system-level management approaches, there are also some power-aware dynamic compilation techniques [73,172,179].
Static approaches generally have the broad view of the entire application, and lead to simpler control. However, they lack the actual dynamic execution information of applications. Many software-level static management approaches also require prior profiling of applications or recompilations to incorporate compiler directives. In contrast, dynamic techniques are directly exposed to the dynamic execution behavior and can guide management responses on-the-fly. However, the major drawback of these online techniques lies in their limited view of application execution as they cannot know a priori the whole application structure. In general, dynamic management also necessitates more elaborate monitoring and control schemes to track execution characteristics and to apply management responses. Nonetheless, as the need for aggressive power management continues to increase, such control mechanisms become more attractive in emerging systems despite the design effort they require. In particular, as current workloads exhibit highly variable and nondeterministic characteristics, and as the pool of legacy applications grows, static techniques bring limited benefits. Dynamic management techniques offer significant additional improvements in overall system power efficiency.
My research particularly aims to leverage the broad view of application execution at runtime by monitoring architectural characteristics of applications and inferring dynamically-varying workload behavior. I use observed runtime workload characteristics to detect and predict repetitive application execution and this repetitive behavior information guides dynamic management techniques. One of the primary drivers of dynamic power management is the inherent variability in both the running workload demands and the underlying computing systems. Efficiently matching the underlying resources to the dynamically varying application demands by adaptively configuring these computing structures is a powerful enabler for power-efficient computation. My dissertation research focuses on two important research challenges for such workload-adaptive and dynamically-controlled execution:
One primary focus of my dissertation research is to bring real-system experimentation and validation with real measurements into architecture research. In the following chapters of this dissertation, I provide an overview of the different research aspects and the accomplishments of my research along these two thrusts.

2  Research Overview

My dissertation research explores architectural and real-system techniques to characterize and predict wide-scale power behavior of programs and develops autonomous methods that track and predict dynamically-varying workload characteristics to guide runtime, workload-adaptive power management techniques. Many of the presented studies aim to explore and leverage the phase behavior of workloads. This phase behavior represents the temporal variations in workload behavior that are commonly observed during execution. These workload phases are known to exhibit repetitive patterns due to the iterative nature of dynamic execution and can be observed in various forms such as performance characteristics, power consumption and traversed execution address space. Moreover, different phase patterns can be observed at different phase granularities from a few hundred of instructions to billions of instructions. Figure 1.3 shows an example of this phase behavior with an execution snapshot from the SPEC CPU2000 vortex benchmark when its execution characteristics are classified into two major phases. In this example, the three charts show the phase behavior for vortex for two performance metrics as well as with the actual measured power behavior.
Figure 3: Phase behavior as observed from the measured performance metrics and power for the vortex benchmark. This execution snapshot can be roughly separated into two phases that repeat throughout benchmark execution.
From a high-level perspective, my thesis research contributes to existing literature in four related research areas:
Moreover, in this dissertation I describe three different real-system infrastructures that I developed for experimentation and evaluations. These infrastructures are deployed in running systems for remote power monitoring and estimation, phase analysis with dynamic instrumentation and real-measurement feedback, and phase-prediction-driven dynamic power management. Below, I provide an overview of each of these four major aspects of my dissertation research, which are detailed in the subsequent chapters of this thesis.

2.1  Live, Runtime Power Estimation

The ability to measure or model processor power dissipation lies at the heart of power-oriented computing research. At the architecture level, much of this is performed via simulator infrastructures. These either perform analytical power derivations for architecture components based on technology parameters [24] or use empirical power model macros derived from lower-level production simulators [21]. Regardless of the taken approach, the architectural power modeling principle remains similar, where the derived maximum component powers are scaled with component utilization rates and architectural parameters to form component-level power estimates. Together with holding or idle power at zero utilization, these power estimates can then approximate the processor power dissipation. While such simulation-oriented techniques provide extensive detail, they are generally prone to limited absolute accuracy, they are impractical for long-timescale simulations and they often consider applications in an isolated environment, thus lacking the effects of underlying system events. Real system measurements can remedy these shortcomings [51,142,168]. However, they generally lack the architectural detail provided by simulations and focus only on total power dissipation.
This line of my research explores an alternative approach to modeling processor power consumption that aims to leverage the advantages of both domains. I propose a real-system power measurement and estimation approach that can also provide microarchitecture-level detail. Fundamentally, this power modeling approach is similar to the simulation approach, where we consider maximum component powers scaled with activity factors. However, instead of cycle-level accounting, my technique relies on hardware performance monitoring events to track component activity. Moreover, I develop this as a runtime power estimation strategy that operates at native application execution speed. I use real power measurement feedback to calibrate power estimators, to incorporate nonlinear power behavior of processor components due to baseline power management techniques and to provide a validated absolute estimation accuracy. While there are prior studies that also investigate event-counter-based power estimations [13,93,95], these studies do not focus on the distribution of power to the architectural components. Furthermore, they only consider processors with small power variation. My work provides both validated total power estimates and their decomposition into architectural components. These estimates are evaluated on a high-end system with aggressive speculation and baseline power saving techniques, where the observed power at different execution regions can vary by as much as 600%. This runtime power estimation framework can approximate processor power behavior within 5% of actual power consumption, as validated with simultaneous real measurements.

2.2  Phase Analysis for Power

In recent years, there has been a growing interest in application phase behavior. Part of this interest focuses on identifying workload phases for characterization purposes and summarizing execution, while others explore methods to detect phases at runtime to guide dynamic adaptations [6,41,72,90,152,153]. With such phase-based adaptations, computing hardware and software can be tuned at runtime to the demands of different program phases. Prior research has considered a range of possible phase analysis techniques, but has focused almost exclusively on performance-oriented phases. Moreover, the bulk of phase-analysis studies have focused on simulation-based evaluations. However, effective and practical analysis of application phase behavior on real-systems is essential to employing these phase-based adaptations on running systems. In addition, there is generally a missing link between phase characterizations and their ability to represent power behavior. Such power characterization is very important especially for dynamic power and thermal management, providing a direct relation between dynamic workload execution and its impact on processor power consumption.
In this thesis I describe a phase analysis methodology that is targeted directly towards characterizing workload power behavior. This approach uses the temporal similarity among estimated component power dissipations to discern the phase patterns in workload power behavior. The power phase characterizations acquired with this method capture the power variations during workload execution within 5% of actual measurements using a small set of representative phases. These phases generally summarize overall execution with less than 1% of the complete execution information. I develop a novel real-system framework for power-oriented phase analysis that coordinates performance monitoring, power estimations, dynamic instrumentation and real power measurements. With this evaluation infrastructure I demonstrate the comparative benefits of different phase characterization techniques that utilize control-flow or event-counter features of applications. This part of my work shows that while both features reveal significant insights to power phase behavior, event counter features further provide 33% improvements in the characterization of workload power variations.

2.3  Mitigating System Induced Variability Effects on Real-System Phase Detection

One primary requirement for the application of phase-based dynamic adaptations is the ability to discern repetitive execution. Detecting repetitive phases in application execution helps apply dynamic management responses proactively, thus improving their overall effectiveness. Real system experiments bring additional challenges to the detection of such repetitive behavior due to system induced variations. Therefore, it is essential to understand how these indeterministic system events alter workload phases from phase to phase and from run to run. Consequently, for a phase detection technique to be effective on real systems, it should be resilient to these variability effects.
This part of my work examines the phase behavior of applications running on real systems to reliably discern and recover phase behavior in the face of application variability stemming from real-system and time sampling effects. I discuss and classify the extent and type of the alterations application phases experience with real-system experiments. I propose a set of new, "transition-based" phase detection techniques. These techniques can detect repetitive workload phase information from time-varying, real-system measurements with less than 5% false alarm probabilities. In comparison to previous detection methods, my transition-based techniques achieve on average 6-fold improvements in phase detection efficiency by mitigating the system induced variability effects.

2.4  Runtime Phase Tracking and Phase-Driven Dynamic Power Management

One of the primary motivations for doing power management dynamically is the highly variable phase behavior within applications at different execution regions. Dynamic management techniques highly benefit from this application phase behavior, which can help identify workload execution regions with different characteristics, and thus can dictate different dynamic management responses. Most existing dynamic management techniques respond to these phase changes reactively. When they observe a noticeable deviation from previous application characteristics, these techniques adjust the underlying system configurations dynamically, assuming this recent behavior will persist in future execution [33,41,90,162,176,186]. These approaches have difficulty however, when applications change characteristics at a high rate. In such cases recognizing and predicting phases on-the-fly provides better adaptation of the applied dynamic configurations. Therefore, it is important to develop methods to identify and predict repetitive phases, to proactively apply dynamic management responses.
My work develops online phase prediction methods that can be applied in running systems and demonstrates how these runtime phase predictors can effectively guide dynamic, on-the-fly processor power management. I describe a general-purpose phase prediction framework that can be configured for different power-performance trade-offs and can be utilized to track various application characteristics for the desired management actions. This phase predictor operates at runtime with negligible overheads and autonomously tracks and predicts application phases. These phase predictions can be employed to guide various management techniques. In my real-system experiments I demonstrate their benefits with dynamic voltage and frequency scaling (DVFS) as an example technique. I implement this complete runtime phase prediction and phase-driven dynamic adaptation infrastructure on a mobile laptop platform. Compared to existing reactive and statistical approaches, our phase predictor significantly improves the accuracy of the predicted workload behavior, reducing the misprediction rates by 2.4X for applications with variable behavior. My experiments demonstrate that DVFS-based dynamic management improves the energy-delay product of the experimental system by 27% on average, when guided by my runtime phase predictor. Compared to prior reactive approaches, these dynamic adaptations improve the energy-delay product of applications by 7%, while incurring less performance degradation.

3  Literature Review

This section gives a general overview of existing work related to my thesis research. Each of the following chapters provides more detailed discussions of prior work specific to each of the presented studies. Here, I discuss related literature along the main areas of contribution discussed above. These are categorized under three areas: processor power modeling, workload characterization and phase analysis, and workload-adaptive power management.

3.1  Processor Power Modeling

Earlier work on processor power modeling involves power measurement feedback for software and instruction-level power models. These include instruction energy tables and inter-instruction effects for processor and memory [113,126,168]. Software power models aim to map energy consumption to program structure [51,142]. In general, these techniques are employed in simpler or embedded processors with minimal clock gating and power management that exhibit low temporal variations. In these cases, the power behavior largely depends on the operating frequency and voltage [28] and simple table-based approaches provide good approximations to processor power behavior.
Architectural and functional module-level power modeling has also been prevalent in power-aware computing studies. These have focused mostly on high-level abstractions of processor components. These abstractions encompass energy consumption models driven by functional unit complexity, profiled averages or switching activities particular to different units [105]. Starting from simple average-case estimates [145], these power estimators evolved into activity and lookup based power models [106,107] that can also incorporate inter-module interactions [125]. As more capable and detailed execution- or trace-driven architectural simulation tools became available, accompanying cycle-accurate power modeling tools have also been developed.
Among different power estimation frameworks, here I mention several of the most commonly used models. Wattch is a processor power modeling infrastructure that relies on parameterized power models for different processor building blocks such as array and associative memory structures, logic, interconnect and clock tree [24]. SimplePower is another cycle-accurate energy estimation tool that uses energy models together with switch capacitance tables for each microarchitectural unit [175]. These approaches use analytical energy models that rely on circuit capacitance parameters. In contrast, PowerTimer uses an empirical energy estimation model based on circuit-level energy models derived from low-level simulations [21]. Last, SoftWatt provides a full-system power model, including the processor and the complete memory hierarchy [59].
More recently, there has been growing interest in runtime architectural power modeling on real-systems. These approaches enable power estimations for the long timescales that are required for system-level and thermal adaptations. Since these approaches lack extensive simulation-style detail, they rely on supporting hardware or software functionality such as performance counters to drive power estimations. Prior work demonstrates that several performance monitoring events correlate highly with processor power dissipation [13]. These events can be configured to track and estimate processor power behavior and can be used to infer the distribution of power to microarchitectural components [93,95,176]. This runtime information is used in conjunction with analytical models for detailed component-level power estimates [18,19,34,111]. Simple runtime models are also employed to track the operating system's contribution to power consumption [116]. While the above approaches consider fixed, static power models, adaptive, feedback-driven power estimation models have also recently been explored [61]. As power dissipation and thermal limitations become pressing issues in large-scale systems, such runtime models are also emerging in the server and cluster domains to enable efficient monitoring and dynamic management of large-scale systems [45,63].
In runtime power modeling, my work is one of the first studies that provides microarchitecture-level power estimations on real systems for a high-end, highly speculative processor. I develop power estimation models that track the power consumption of microarchitectural units in all execution regions with high or low processor utilization. Moreover, my work presents a complete power modeling and validation framework including remote runtime monitoring and real-time power measurement feedback.

3.2  Workload Characterization and Phase Analysis

There is a large body of existing work related to workload characterization and the analysis of application phase behavior. These studies can be classified under various themes such as online and offline approaches, simulation-based and real-system characterization, characterizations with different workload features and for different endgoals.
One set of existing research employs different characterization techniques to summarize execution with representative regions or phases. Some of these techniques use simulations to classify workload execution based on programmatical information (such as executed instruction addresses and visited basic blocks) [32,40,72,151,152] or performance characteristics [35,46,101]. Another line of phase characterization research focuses on real-system studies that track hardware events or dynamic program flow [6,29,108,128,131,132,169]. Several of these studies employ a wide range of similarity measures and clustering methods such as k-means, regression trees, principal or independent component analysis for online or offline classification of execution into self similar regions.
A major area of research focuses on monitoring and detecting workload phase behavior for dynamic adaptations [68]. These studies use various workload features and evaluation techniques in their analyses. Part of these studies focus on different indicators of dynamic program flow to monitor varying workload characteristics such as branch counts [90], working set signatures [41], traversed basic blocks [109,153] and visited subroutines [75]. These approaches track patterns in execution flow to trigger suited dynamic management responses that employ various architectural reconfigurations. In addition to the above simulation-oriented studies, some real-system studies consider detecting specific application behavior for dynamic responses. These works track application phases to control management schemes readily available in current systems such as voltage and frequency scaling [176,179], to detect changes in execution space and to drive dynamic optimization strategies in runtime systems [38,100,120].
Application phase monitoring and detection guides dynamic adaptations to react to the changes in observed characteristics. Once the new behavior is detected, corresponding responses in tune with the demands of the new phase can be activated. However, predicting this change in application characteristics can provide additional benefits by initiating management proactively. This is especially important in the case of quickly varying application behavior, where the fundamental frequency at which the application phases change is close to the sampling rate of the tracked characteristics. Existing research has employed different strategies to predict varying workload characteristics. Compiler- and application-level techniques develop static, analytical models based on program structure to predict changes in workload characteristics such as memory access patterns [52,54]. Several prediction schemes that dynamically update their decisions during workload runtime have been proposed at the systems and architecture levels. At the system level, both statistical and table-based approaches that predict specific workload characteristics based on previous history have been proposed [44]. In addition, memory related runtime phase predictors based on memory reuse distance patterns [150], as well as dynamic code region based phase predictions [99] have been studied in prior related work. In architectural studies, the ability to propose hardware support has led to more elaborate phase prediction mechanisms. Run-length and control-flow based phase predictors have been developed with hardware support to predict phases in the dynamic execution space of applications [153]. In addition to predictors of future workload phases, alternative schemes that predict phase changes and durations have also been employed in architectural implementations [109]. Overall, these works demonstrate effective prediction techniques across a wide range of granularities, with variety of workload features spanning both hardware and software mechanisms.
My research contributes to the existing body of phase analysis work in characterization, detection and prediction of application phases with a primary focus on real-system phase analysis methods. While most of the existing phase characterization work focuses on performance behavior of workloads, my thesis presents new techniques to identify power phase behavior of applications using hardware performance monitoring features. It develops novel strategies to detect repetitive application phases on real systems in spite of the system-induced perturbations on workload characteristics. Last, my work demonstrates a fully-autonomous, real-system phase prediction infrastructure that predicts future phase behavior of applications at runtime by leveraging the pattern behavior in execution phases.

3.3  Workload-Adaptive Power Management

Earlier in this chapter, I have discussed the extensive range of research broadly in the area of dynamic management, spanning from circuits to systems and applications. Here I review some of these approaches that particularly aim to tune system execution to the dynamic changes in the workload characteristics. I discuss related work in workload-adaptive power management under three abstractions: compiler- and application-level techniques, system-level management and architectural adaptations.
High-level workload adaptations involving compilers and applications give high-level software more responsibility for power management. Typically, these approaches can operate in two opposite directions. First, part of the existing work has developed strategies to adapt the workloads themselves for varying power constraints by providing different degrees of quality of service. These adaptations include application features with different qualities or optional application steps that are activated only at high energy settings. Some techniques also involve choosing between local and remote program or data components based on their power-performance trade-offs [50,102,143]. This first direction deliberately induces changes in workload characteristics to respond to energy constraints, and can be referred to as power-driven workload adaptations.
In the second direction, several techniques have considered employing special directives within applications to guide lower-level power management. Such directives are introduced via compiler support or specialized application programming interfaces to perform bookkeeping operations about application characteristics [1,7], to insert offline profiling information for code regions at different power management states [71,122,154] and to inform the underlying system layers about different application operations such as I/O intensive regions [65,177].
System-level power management techniques are applied in two different manners. First, some studies have considered performing operating system tasks such as scheduling and memory management in a power-aware manner. Second, additional studies make use of the operating system to assist lower-level management functionalities in their management decisions. In these applications, the operating system is extended with monitoring and control interfaces that track workload characteristics and provide control directives to the underlying management schemes such as frequency scaling and disk power management. In the first direction, prior studies have considered energy-aware scheduling of workloads with different characteristics to balance power consumption, to reduce power density and to control energy dissipation rate in both single and multiprocessor systems [14,67,127,184]. Other workload-adaptive system research has discussed power-aware memory management [135,186] and page allocation [110]. Some recent studies have also presented methods for power-efficient distribution of parallel, multithreaded applications into multiple homogeneous or heterogeneous processing components [5,37]. In the second direction, previous studies have discussed system-level adaptations for disk power management [60], controlling network interfaces and managing other input/output devices [174]. In addition, there has been a growing body of work in system-level management for dynamic voltage and frequency scaling [33,49,176]. More recently, there has also been interest in machine learning techniques for power management across multiple platform components [167], as well as dynamic compilation support for workload-adaptive power management [73,172,179].
At the architecture level, existing work has proposed several strategies that track varying workload characteristics to perform architectural adaptations. Tracking methods differ significantly in their approaches. These can be simple occupancy or usage based models [3,139], metrics that characterize varying workload performance [8,26] access frequency monitoring [48,96], inconsistency checks [47] or more detailed hardware structures that aim to discern varying application phases [41,90,153]. In general, architectural management approaches focus on modulating the effective size or speed of different hardware units. Among different architectural components, memory hierarchy is one of the most investigated structures. Different studies have proposed adaptively disabling or reducing supply voltages for different cache ways and unused blocks [48,96,140]. Some work has proposed dynamically configurable caches based on varying working set size information and changes in control flow [9,41,153]. Architectural management schemes for higher levels of memory hierarchy, including main memory and disks have also been explored [117,186]. Besides the memory hierarchy, several studies have focused on other architectural adaptations, such as adaptive issue queues [8,26,139]. These approaches have considered monitoring changes in application performance (i.e. rate of executed instructions) and changes in the occupancy of queue structures to tune their configurations to the changes in workload characteristics. Other management schemes have also been proposed for adaptive pipeline scaling and dynamic configurations of other architectural components such as reorder buffers and register files [3,90]. These techniques have also employed some amount of architectural support (for example, the branch behavior buffer and power profiling units) to track dynamically-varying workload demands and to effectively match the dynamic configurations to different application phases.
My thesis in particular discusses workload-adaptive power management techniques that operate at the architecture and system boundary. It leverages architectural execution information to guide system-level adaptations. Most of the existing system adaptations either function reactively by responding to recent execution behavior or rely on prior profiling information. My work, however, describes a predictive and completely on-the-fly adaptation strategy that utilizes runtime phase predictions to manage dynamic adaptations, without effecting the execution or the structure of workloads.

4  Thesis Contributions

My thesis makes four main contributions to the existing literature. First, I describe a generic approach to microarchitecture-level power modeling using processor hardware performance monitoring features. I demonstrate a detailed, yet practical runtime power monitoring and estimation approach with simultaneous measurement support for runtime validation feedback. Overall, this framework paves the way for many following runtime power and thermal management studies that can benefit from insight on live processor power dissipation.
Second, I provide two important contributions to the general body of workload characterization and phase analysis research. I demonstrate practical real-system methods for identifying application phases at runtime. These techniques can be readily employed in system-level dynamic power and thermal management studies. Moreover, my work defines phases targeted directly to discern varying power characteristics of workloads, using event-counter-based power estimations at the basis of its similarity analysis.
Third, this thesis presents a complete flow of methods that mitigate the negative impacts of system-induced variability and sampling effects on the detection of repetitive application behavior. My work describes a taxonomy of phase transformations due to variability and sampling effects. I introduce a new, transition-based phase characterization, which is shown to be more resilient for repetitive phase detection under the influence of these transformations. This work provides a quantitative evaluation of phase detection techniques and quantifies their effectiveness in recognizing recurrent execution.
Last, in this thesis I demonstrate a complete real-system framework for runtime phase prediction and its application to workload-adaptive power management. I describe a configurable runtime phase prediction methodology that seamlessly operates on a real mobile system with negligible overheads. I depict the immediate benefits of runtime phase prediction for on-the-fly, phase-driven dynamic power management. Although the examples shown in this thesis use certain phase definitions for specific power management techniques, the developed approaches represent a general-purpose phase monitoring and prediction framework. My infrastructure can be employed for monitoring and predicting different workload characteristics that can guide a range of dynamic management techniques.

5  Thesis Outline

The following chapters of this dissertation present the main accomplishments of my research in more detail. I present this in a progressive manner, starting with the experimentation basics and the power analysis framework, followed by phase analysis basics, phase detection and prediction methods and finally their application to dynamic power management. In particular, Chapter 2 presents the fundamentals of my real-system experimentation framework and develops runtime processor power monitoring and estimation techniques. Chapter 3 discusses different phase analysis strategies and demonstrates their effective application for power-oriented workload phase characterization. Chapter 4 focuses on the interesting challenges of phase detection in real-system experiments and develops an effective phase detection framework, which is resilient to system-induced variations in observed workload characteristics. Chapter 5 introduces an efficient real-system phase prediction method and outlines a complete infrastructure that is driven by runtime phase predictions for workload-adaptive power management. This chapter meshes the different aspects of my research together and demonstrates the concrete benefits of phase-based dynamic power management for power-aware computing systems. Last, Chapter 6 presents the final remarks and discusses avenues of future research.


N. AbouGhazaleh, B. Childers, D. Mosse, R. Melhem, and M. Craven. Energy Management for Real-time Embedded Applications with Compiler Support. In Proceedings of the Proceedings of the 2003 ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 2003.
A. R. Alameldeen and D. A. Wood. Variability in Architectural Simulations of Multi-threaded Workloads. In Proceedings of 9th International Symposium on High Performance Computer Architecture (HPCA-9), Feb. 2003.
D. Albonesi, R. Balasubramonian, S. Dropsho, S. Dwarkadas, E. Friedman, M. Huang, V. Kursun, G. Magklis, M. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. Cook, and S. Schuster. Dynamically Tuning Processor Resources with Adaptive Processing. IEEE Computer, 36(12):43-51, 2003.
M. Anis, S. Areibi, and M. Elmasry. Design and Optimization of Multi-Threshold CMOS (MTCMOS) Circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(10):1324-1342, Oct. 2003.
M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's Law Through EPI Throttling. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA-32), 2005.
M. Annavaram, R. Rakvic, M. Polito, J.-Y. Bouguet, R. Hankins, and B. Davies. The Fuzzy Correlation between Code and Performance Predictability. In Proceedings of the 37th International Symp. on Microarchitecture, 2004.
A. Azevedo, I. Issenin, R. Cornea, R. Gupta, N. Dutt, A. Veidenbaum, and A. Nicolau. Profile-based Dynamic Voltage Scheduling using Program Checkpoints. In Proceedings of the conference on Design, automation and test in Europe (DATE'02), Mar. 2002.
R. I. Bahar and S. Manne. Power and Energy Reduction Via Pipeline Balancing. In Proceedings of the 28th International Symposium on Computer Architecture (ISCA-28), June 2001.
R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In International Symposium on Microarchitecture, pages 245-257, 2000.
H. Bao, J. Bielak, O. Ghattas, L. F. Kallivokas, D. R. O'Hallaron, J. R. Shewchuk, and J. Xu. Large-scale Simulation of Elastic Wave Propagation in Heterogeneous Media on Parallel Computers. Computer Methods in Applied Mechanics and Engineering, 152(1-2):85-102, Jan. 1998.
R. D. Barnes, E. M. Nystrom, M. C. Merten, and W. mei W.Hwu. Vacuum packing: extracting hardware-detected program phases for post-link optimization. In Proceedings of the 35th International Symp. on Microarchitecture, Nov. 2002.
L. A. Barroso. The Price of Performance. ACM Queue, 3(7):48-53, Sept. 2005.
F. Bellosa. The benefits of event-driven energy accounting in power-sensitive systems. In Proceedings of 9th ACM SIGOPS European Workshop, September 2000.
F. Bellosa, A. Weissel, M. Waitz, and S. Kellner. Event-Driven Energy Accounting for Dynamic Thermal Management. In Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP'03), New Orleans, Sept. 2003.
B. Bentley. Validating the Intel Pentium 4 microprocessor. In Design Automation Conference, pages 244-248, 2001.
R. Berrendorf and B. Mohr. PCL - The Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors (Version 2.0).
R. Bianchini and R. Rajamony. Power and energy management for server systems. IEEE Computer, 37(11), November 2004.
W. Bircher, J. Law, M. Valluri, and L. K. John. Effective Use of Performance Monitoring Counters for Run-Time Prediction of Power. Technical Report TR-041104-01, University of Texas at Austin, Nov. 2004.
W. L. Bircher, M. Valluri, J. Law, and L. K. John. Runtime identification of microprocessor energy saving opportunities. In Proceedings of the 2005 International Symposium on Low Power Electronics and Design (ISLPED), 2005.
B. Brock and K. Rajamani. Dynamic Power Management for Embedded Systems. In Proceedings of the IEEE International SOC Conference, Sept. 2003.
D. Brooks, P. Bose, V. Srinivasan, M. K. Gschwind, P. G. Emma, and M. G. Rosenfield. New Methodology for Early-Stage, Microarchitecture-Level Power-Performance Analysis of Microprocessors. IBM J. of Research and Development, 46(5/6):653-670, 2003.
D. Brooks and M. Martonosi. Dynamically exploiting narrow width operands to improve processor power and performance. In Proceedings of the 5th International Symposium on High Performance Computer Architecture, Jan. 1999.
D. Brooks and M. Martonosi. Dynamic thermal management for high-performance microprocessors. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA-7), January 2001.
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications, 14(3):189-204, 2000.
A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. W. Cook, and D. H. Albonesi. An Adaptive Issue Queue for Reduced Power at High Performance. In Proceedings of the First International Workshop on Power-Aware Computer Systems (PACS'00), 2001.
B. Calder, T. Sherwood, E. Perelman, and G. Hamerly. SimPoint web page.
A. P. Chandrakasan and A. Sinha. JouleTrack: A Web Based Tool for Software Energy Profiling. In Proceedings of the 38th Design Automation Conference (DAC'01), June 2001.
F. Chang, K. Farkas, and P. Ranganathan. Energy driven statistical profiling: Detecting software hotspots. In Proceedings of the Proceedings of the Workshop on Computer Systems, 2002.
J. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle. Managing energy and server resources in hosting centers. In Proceedings of the 18th Symposium on Operating Systems Principles (SOSP), October 2001.
M. Chin. Desktop CPU Power Survey. In SPCR Forum, 2006.
C.-B. Cho and T. Li. Complexity-based Program Phase Analysis and Classification. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), Sept. 2006.
K. Choi, R. Soma, and M. Pedram. Dynamic Voltage and Frequency Scaling based on Workload Decomposition. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2004.
G. Contreras and M. Martonosi. Power Prediction for Intel XScale Processors Using Performance Monitoring Unit Events. In Proceedings of the 2005 International Symposium on Low Power Electronics and Design (ISLPED), 2005.
J. Cook, R. L. Oliver, and E. E. Johnson. Examining performance differences in workload execution phases. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC-4), 2001.
N. Corporation. NVIDIA GeForce 8800 GPU Architecture Overview. Technical Brief TB-02787-001_v01, NVIDIA Corporation, Nov. 2006.
M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online Power-Performance Adaptation of Multithreaded Programs using Event-Based Prediction. In Proceedings of the 20th ACM International Conference on Supercomputing (ICS), June 2006.
A. Das, J. Lu, and W.-C. Hsu. Region Monitoring for Local Phase Detection in Dynamic Optimization Systems. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), Mar. 2006.
P. J. Denning. The working set model for program behavior. Communications of the ACM, pages 323-333, May 1968.
A. Dhodapkar and J. Smith. Comparing Program Phase Detection Techniques. In 36th International Symp. on Microarchitecture, 2003.
A. Dhodapkar and J. Smith. Managing multi-configurable hardware via dynamic working set analysis. In 29th Annual International Symposium on Computer Architecture, 2002.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Second Edition. Wiley Interscience, New York, 2001.
A. Dudani, F. Mueller, and Y. Zhu. Energy Conserving Feedback EDF Scheduling for Embedded Systems with Real-time Constraints. In LCTES/SCOPES '02: Proceedings of the joint conference on Languages, compilers and tools for embedded systems, 2002.
E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and Predicting Program Behavior and its Variability. In IEEE PACT, pages 220-231, 2003.
D. Economou, S. Rivoire, C. Kozyrakis, and P. Ranganathan. Full-system Power Analysis and Modeling for Server Environments. In Proceedings of the Workshop on Modeling Benchmarking and Simulation (MOBS), June 2006.
L. Eeckhout, R. Sundareswara, J. Yi, D. Lilja, and P. Schrater. Accurate Statistical Approaches for Generating Representative Workload Compositions. In Proceedings of the IEEE International Symposium on Workload Characterization, Oct. 2005.
D. Ernst, S. D. Nam Sung Kim, S. Pant, T. Pham, R. Rao, C. Ziesler, D. Blaauw, T. Austin, and T. Mudge. Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation. In Proceedings of the 36th International Symp. on Microarchitecture, Dec. 2003.
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
K. Flautner and T. Mudge. Vertigo: Automatic Performance-Setting for Linux. In Proceedings of the Fifth Symposium on Operating System Design and Implementation OSDI'02, 2002.
J. Flinn. Extending Mobile Computer Battery Life through Energy-Aware Adaptation. PhD thesis, Computer Science Department, Carnegie Mellon University, Dec. 2001.
J. Flinn and M. Satyanarayanan. Powerscope: a tool for profiling the energy usage of mobile applications. In Second IEEE Workshop on Mobile Computing Systems and Applications, pages 2-10, Feb. 1999.
B. B. Fraguela, R. Doallo, J. Tourino, and E. L. Zapata. A Compiler Tool to Predict Memory Hierarchy Performance of Scientific Codes. Parallel Computing, 30(2):225-228, 2004.
J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, G. Mittal, E. Chan, Y. Chan, D. Plass, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti. Design of the POWER6 Microprocessor. In IEEE International Solid-State Circuits Conference (ISSCC 2007), Feb. 2007.
S. Ghosh, M. Martonosi, and S. Malik. Cache Miss Equations: A Compiler Framework for Analyzing and Tuning Memory Behavior. ACM Transactions on Programming Languages and Systems (TOPLAS), 21(4):703-746, 1999.
S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. The Intel Pentium M Processor: Microarchitecture and Performance. Intel Technology Journal, Q2, 2003, 7(02), 2003.
M. Golden, S. Arekapudi, G. Dabney, M. Haertel, S. Hale, L. Herlinger, Y. Kim, K. McGrath, V. Palisetti, and M. Singh. A 2.6GHz Dual-Core 64b x86 Microprocessor with DDR2 Memory Support. In IEEE International Solid-State Circuits Conference (ISSCC 2006), Feb. 2006.
M. Gschwind. Chip Multiprocessing and the Cell Broadband Engine. IBM Researh Report RC-23921, IBM T. J. Watson Research Center, Feb. 2006.
S. P. Gurrum, S. K. Suman, Y. K. Joshi, and A. G. Fedorov. Thermal Issues in Next-Generation Integrated Circuits. IEEE Transactions on Device and Materials Reliability, 4(4):709-714, Dec. 2004.
S. Gurumurthi, A. Sivasubramaniam, M. J. Irwin, N. Vijaykrishnan, M. Kandemir, T. Li, and L. K. John. Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2002.
S. Gurumurthi, A. Sivasubramaniam, M. Kandemir, and H. Franke. DRPM: Dynamic Speed Control for Power Management in Server Class Disks. Computer Architecture News, 31(2):169 - 181, May 2003.
S. Gurun and C. Krintz. A Run-Time, Feedback-Based Energy Estimation Model For Embedded Devices. In Proceedings of the International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), Oct. 2006.
J. Haid, G. Kafer, C. Steger, R. Weiss, , W. Schogler, and M. Manninger. Run-time energy estimation in system-on-a-chip designs. In Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2003.
T. Heath, A. P. Centeno, P. George, L. Ramos, Y. Jaluria, and R. Bianchini. Mercury and freon: Temperature emulation and management in server systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.
T. Heath, B. Diniz, E. V. Carrera, W. Meira Jr., and R. Bianchini. Energy conservation in heterogeneous server clusters. In Proceedings of the 10th Symposium on Principles and Practice of Parallel Programming (PPoPP), 2005.
T. Heath, E. Pinheiro, J. Hom, U. Kremer, and R. Bianchini. Code Transformations for Energy-Efficient Device Management. IEEE Transactions on on Computers, 53(8):974- 987, Aug. 2004.
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufman Publishers, 2003. Third Edition.
S. Heo, K. Barr, and K. Asanovic. Reducing Power Density through Activity Migration. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Seoul, Korea, Aug. 2003.
M. J. Hind, V. T. Rajan, and P. F. Sweeney. Phase Shift Detection: A Problem Classification. IBM Researh Report RC-22887, IBM T. J. Watson, Aug. 2003.
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, First Quarter 2001, 2001.
J. Hom and U. Kremer. Inter-program Compilation for Disk Energy Reduction. In Workshop on Power-Aware Computer Systems (PACS'03), 2003.
C.-H. Hsu and U. Kremer. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 38-48, 2003.
C. Hu, D. Jimenez, and U. Kremer. Toward an Evaluation Infrastructure for Power and Energy Optimizations. In Workshop on High-Performance, Power-Aware Computing, 2005.
S. Hu, M. Valluri, and L. K. John. Effective Adaptive Computing Environment Management via Dynamic Optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), Mar. 2005.
M. Huang, J. Renau, and J. Torrellas. Profile-Based Energy Reduction in High-Performance Processors. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, December 2001.
M. Huang, J. Renau, and J. Torrellas. Positional Adaptation of Processors: Application to Energy Reduction. In Proceedings of the International Symp. on Computer Architecture, 2003.
C. Hughes, J. Srinivasan, and S. Adve. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO-34), Dec. 2001.
IBM. PMAPI structure and function Reference. pmapi.h.htm.
Intel Corporation. VTuneTM Performance Analyzer 1.1.
Intel Corporation. Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual, 2002.
Intel Corporation. Intel Pentium 4 Processor in the 423 pin package / Intel 850 chipset platform, 2002.
Intel Corporation. Intel 64 and IA-32 Architectures Software Developerís Manual, Volume 3B: System Programming Guide, 2006.
C. Isci, G. Contreras, and M. Martonosi. Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences and Proposals. In Proceedings of the Hardware Performance Monitor Design and Functionality Workshop in the 11th International Symposium on High-Performance Computer Architecture (HPCA-11), Feb. 2005.
C. Isci, G. Contreras, and M. Martonosi. Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management. In Proceedings of the 39th ACM/IEEE International Symposium on Microarchitecture (MICRO-39), 2006.
C. Isci and M. Martonosi. Identifying Program Power Phase Behavior using Power Vectors. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC-6), 2003.
C. Isci and M. Martonosi. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data. In Proceedings of the 36th International Symp. on Microarchitecture, Dec. 2003.
C. Isci and M. Martonosi. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data. Technical report, Princeton University Electrical Eng. Dept., Sep 2003.
C. Isci and M. Martonosi. Detecting Recurrent Phase Behavior under Real-System Variability. In Proceedings of the IEEE International Symposium on Workload Characterization, Oct. 2005.
C. Isci and M. Martonosi. Phase Characterization for Power: Evaluating Control-Flow-Based and Event-Counter-Based Techniques. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA-12), 2006.
C. Isci, M. Martonosi, and A. Buyuktosunoglu. Long-term Workload Phases: Duration Predictions and Applications to DVFS. IEEE Micro: Special Issue on Energy Efficient Design, 25(5):39-51, Sep/Oct 2005.
A. Iyer and D. Marculescu. Power aware microarchitecture resource scaling. In Proceedings of Design Automation and Test in Europe, DATE, Mar. 2001.
R. Jenkins. Hash functions. Dr. Dobb's Journal, 9709, Sept. 1997.
R. Joseph, D. Brooks, and M. Martonosi. Control techniques to eliminate voltage emergencies in high performance processors. In Proc. of the 9th International Symposium on High Performance Computer Architecture (HPCA-9), February 2003.
R. Joseph and M. Martonosi. Run-time power estimation in high performance microprocessors. In International Symposium on Low Power Electronics and Design, pages 135-140, 2001.
P. Juang, Q. Wu, L.-S. Peh, M. Martonosi, and D. Clark. Coordinated, Distributed, Formal Energy Management of Chip Multiprocessors. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED'05), Aug. 2005.
I. Kadayif, T. Chinoda, M. T. Kandemir, N. Vijaykrishnan, M. J. Irwin, and A. Sivasubramaniam. vEC: virtual energy counters. In Workshop on Program Analysis for Software Tools and Engineering, pages 28-31, 2001.
S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: Exploiting generational behavior to reduce cache leakage power. In Proceedings of the 28th International Symposium on Computer Architecture (ISCA-28), June 2001.
A. Keshavarzi, S. Ma, S. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar, and V. De. Effectiveness of Reverse Body Bias for Leakage Control in Scaled Dual Vt CMOS ICs. In Proceedings of the 2001 International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2001.
C. H. Kim and K. Roy. Dynamic Vth Scaling Scheme for Active Leakage Power Reduction. In Proceedings of the conference on Design, automation and test in Europe (DATE'02), Mar. 2002.
J. Kim, S. V. Kodakara, W.-C. Hsu, D. J. Lilja, and P.-C. Yew. Dynamic Code Region (DCR) Based Program Phase Tracking and Prediction for Dynamic Optimizations. Lecture Notes in Computer Science, 3793:203-217, 2005.
T. Kistler and M. Franz. Continuous Program Pptimization: A Case Study. ACM Transactions on Programming Languages and Systems (TOPLAS), 25(4):500-548, 2003.
A. KleinOsowski, J. Flynn, N. Meares, and D. J. Lilja. Adapting the SPEC2000 benchmark suite for simulation-based computer architecture research. In Workshop on Workload Characterization, International Conference on Computer Design, Sept. 2000.
U. Kremer, J. Hicks, and J. Rehg. Compiler-Directed Remote Task Execution for Power Management. In Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP'00), 2000.
R. Kumar, K. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In Proceedings of the 36th International Symp. on Microarchitecture, Dec. 2003.
E. Kursun, S. Ghiasi, and M. Sarrafzadeh. Transistor Level Budgeting for Power Optimization. In Proceedings of the 5th International Symposium on Quality Electronic Design (ISQED'05), 2004.
P. E. Landman. High-level power estimation. In Proceedings of the 1996 International Symposium on Low Power Electronics and Design (ISLPED), Oct. 1996.
P. E. Landman and J. M. Rabaey. Black-box Capacitance Models for Architectural Power Analysis. In Proceedings of the International Workshop on Low Power Design, Apr. 1994.
P. E. Landman and J. M. Rabaey. Activity-sensitive Architectural Power Analysis for the Control Path. In Proceedings of the International Workshop on Low Power Design, Apr. 1995.
J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The Strong Correlation between Code Signatures and Performance. In IEEE International Symposium on Performance Analysis of Systems and Software, Mar. 2005.
J. Lau, S. Schoenmackers, and B. Calder. Transition Phase Classification and Prediction. In 11th International Symposium on High Performance Computer Architecture, 2005.
A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power Aware Page Allocation. ACM SIGOPS Operating Systems Review, 34(5):105 - 116, Dec. 2000.
B. Lee and D. Brooks. Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XII), October 2006.
K. Lee and K. Skadron. Using Performance Counters for Runtime Temperature Sensing in High-Performance Processors. In Workshop on High-Performance, Power-Aware Computing, 2005.
S. Lee, A. Ermedahl, S. L. Min, and N. Chang. An accurate instruction-level energy consumption model for embedded RISC processors. In LCTES/OM, pages 1-10, 2001.
S. Lee and T. Sakurai. Run-time Voltage Hopping for Low-power Real-time Systems. In Proceedings of the 37th Design Automation Conference (DAC'00), 2000.
J. Li and J. Martinez. Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA-12), 2006.
T. Li and L. K. John. Run-time Modeling and Estimation of Operating System Power Consumption. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2003.
X. Li, Z. Li, F. David, P. Zhou, Y. Zhou, S. Adve, and S. Kumar. Performance Directed Energy Management for Main Memory and Disks. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI), 2004.
M. Liu, W.-S. Wang, and M. Orshansky. Leakage Power Reduction by Dual-Vth Designs Under Probabilistic Analysis of Vth Variation. In Proceedings of the 2004 International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2004.
J. R. Lorch and A. J. Smith. Improving Dynamic Voltage Scaling Algorithms with PACE. In Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2001.
J. Lu, H. Chen, P. Yew, and W. Hsu. Design and Implementation of a Lightweight Dynamic Optimization System. The Journal of Instruction-Level Parallelism, 6:1-24, 2004.
C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Programming Language Design and Implementation (PLDI), June 2005.
G. Magklis, M. Scott, G. Semeraro, D. Albonesi, and S. Dropsho. Profile-based Dynamic Voltage and Frequency Scaling for a Multiple Clock Domain Microprocessor. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA-30), 2003.
S. Manne, A. Klauser, and D. Grunwald. Pipeline gating: Speculation control for energy reduction. In Proceedings of the 25th International Symposium on Computer Architecture, pages 132-41, June 1998.
C. McNairy and R. Bhatia. Montecito: A Dual-Core, Dual-Thread Itanium Processor. IEEE Micro, 25(2):10-20, Mar/Apr 2005.
H. Mehta, R. M. Owens, and M. J. Irwin. Energy characterization based on clustering. In Proceedings of the 33rd Design Automation Conference (DAC'96), 1996.
H. Mehta, R. M. Owens, and M. J. Irwin. Instruction Level Power Profiling. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), May 1996.
A. Merkel. Balancing Power Consumption in Multiprocessor Systems. PhD thesis, Sept. 2005. System Architecture Group, University of Karlsruhe, Diploma Thesis.
M. C. Merten, A. R. Trick, R. D. Barnes, E. M. Nystrom, C. N. George, J. C. Gyllenhaal, and W. mei W. Hwu. An architectural framework for runtime optimization. IEEE Transactions on Computers, 50(6):567-589, 2001.
G. E. Moore. Cramming more components onto integrated circuits. In Electronics, pages 114-117, Apr. 1965.
J. Moore, J. Chase, P. Ranganathan, and R. Sharma. Making scheduling cool: Temperature-aware workload placement in data centers. In Proceedings of USENIX `05, June 2005.
P. Nagpurkar, C. Krintz, M. Hind, P. Sweeney, and V. Rajan. Online Phase Detection Algorithms. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), Mar. 2006.
P. Nagpurkar, C. Krintz, and T. Sherwood. Phase-Aware Remote Profiling. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), Mar. 2005.
K. Olukotun and L. Hammond. The Future of Microprocessors. ACM Queue, 3(7):27-34, Sept. 2005.
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang. The Case for a Single-Chip Multiprocessor. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), Oct. 1996.
H. H. Padmanabhan. Design and Implementation of Power-aware Virtual Memory. In Proceedings of USENIX, 2003.
V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini. DMA-Aware Memory Energy Management. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA-12), Feb. 2006.
H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In Proceedings of the 37th International Symp. on Microarchitecture, 2004.
C. Poirier, R. McGowen, C. Bostak, and S. Naffziger. Power and Temperature Control on a 90nm Itanium-Family Processor. In IEEE International Solid-State Circuits Conference (ISSCC 2005), Feb. 2005.
D. Ponomarev, G. Kucuk, and K. Ghose. Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO-34), Dec. 2001.
M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar. Reducing Leakage in a High-Performance Deep-Submicron Instruction Cache. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1):77-90, 2001.
R. Ronen, A. Mendelson, K. Lai, S.-L. Lu, F. Pollack, and J. P. Shen. Coming Challenges in Microarchitecture and Architecture. Proceedings of the IEEE, 89(3):325-340, Mar. 2001.
J. Russell and M. Jacome. Software power estimation and optimization for high performance, 32-bit embedded processors. In Proceedings of the International Conference on Computer Design, October 1998.
D. G. Sachs, W. Yuan, C. J. Hughes, A. Harris, S. V. Adve, D. L. Jones, R. H. Kravets, and K. Nahrstedt. Grace: A hierarchical adaptation framework for saving energy. Technical report, Computer Science, University of Illinois Technical Report UIUCDCS-R-2004-2409, 2004.
N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, and A. Kovacs. Implementation of the 65nm Dual-Core 64b Merom Processor. In IEEE International Solid-State Circuits Conference (ISSCC 2007), Feb. 2007.
T. Sato, M. Nagamatsu, and H. Tago. Power and Performance Simulator: ESP and Its Applications for 100 MIPS/W Class RISC Design. In Proceedings of the 1994 International Symposium on Low Power Electronics and Design (ISLPED), Oct. 1994.
R. Schmidt. Liquid Cooling is Back. Electronics Cooling, 11(3), Aug. 2005.
G. Semeraro, G. Magklis, R. Balasubramonian, D. Albonesi, S. Dwarkadas, and M. Scott. Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-8), 2002.
J. S. Seng and D. M. Tullsen. The effect of compiler optimizations on Pentium 4 power consumption. In 7th Annual Workshop on Interaction between Compilers and Computer Architectures, Feb. 2003.
Server System Infrastructure (SSI) consortium. Power Supply Management Interface Design Guide, Rev. 2.12, Sept. 2005.
X. Shen, Y. Zhong, and C. Ding. Locality Phase Prediction. In Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XI), Oct. 2004.
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In International Conference on Parallel Architectures and Compilation Techniques, Sept. 2001.
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct 2002.
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of the 28th International Symposium on Computer Architecture (ISCA-30), June 2003.
D. Shin, J. Kim, and S. Lee. Low-Energy Intra-Task Voltage Scheduling Using Static Timing Analysis. In Proceedings of the 38th Design Automation Conference (DAC'01), June 2001.
K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. Temperature-aware microarchitecture. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003.
L. Spracklen and S. G. Abraham. Chip Multithreading: Opportunities and Challenges. In 11th International Symposium on High Performance Computer Architecture (HPCA-11), 2005.
E. Sprangle and D. Carmean. Increasing Processor Performance by Implementing Deeper Pipelines. In Proceedings of the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
B. Sprunt. Brink and Abyss Pentium 4 Performance Counter Tools For Linux, Feb. 2002.
B. Sprunt. Pentium 4 Performance-Monitoring Features. IEEE Micro, 22(4):72-82, Jul/Aug 2002.
B. Sprunt. Managing The Complexity Of Performance Monitoring Hardware: The Brink and Abyss Approach. International Journal of High Performance Computing Applications, 20(4):533-540, 2006.
A. Srivastava and D. Sylvester. Minimizing Total Power by Simultaneous Vdd/Vth Assignment. In ASPDAC: Proceedings of the 2003 conference on Asia South Pacific design automation, Jan. 2003.
P. Stanley-Marbell, M. S. Hsiao, and U. Kremer. A Hardware Architecture for Dynamic Performance and Energy Adaptation. In Proceedings of the Workshop on Power-Aware Computer Systems, 2002.
D. Talkin. A robust algorithm for pitch tracking (RAPT). Speech Coding and Synthesis. Elsevier Science B. V., New York, 1995.
T. K. Tan, A. Raghunathan, and N. K. Jha. Software Architectural Transformations: A New Approach to Low Energy Embedded Software. In Proceedings of the conference on Design, Automation and Test in Europe (DATE'03), Mar. 2003.
The Standard Performance Evaluation Corporation. SPEC CPU2000 Results.
The Standard Performance Evaluation Corporation. SPEC CPU2000 Suite.
G. Theocharous, S. Mannor, N. Shah, P. Gandhi, B. Kveton, S. Siddiqi, and C.-H. Yu. Machine Learning for Adaptive Power Management. Intel Technology journal, 10(4):299-311, 2006.
V. Tiwari, S. Malik, and A. Wolfe. Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on VLSI Systems, 2(4):437-445, December 1994.
R. Todi. Speclite: using representative samples to reduce spec cpu2000 workload. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC-4), 2001.
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd International Symp. on Computer Architecture, pages 392-403, June 1995.
United States Environmental Protection Agency. ENERGY STAR Program Requirements for Computers, Version 4.0. Oct. 2006.
P. Unnikrishnan, G. Chen, M. Kandemir, and D. R. Mudgett. Dynamic Compilation for Energy Adaptation. In Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design (ICCAD), 2002.
O. Unsal and I. Koren. System-Level Power-Aware Design Techniques in Real-Time Systems. Proceedings of the IEEE, 91(7), July 2003.
V. Venkatachalam and M. Franz. Power Reduction Techniques for Microprocessor Systems. ACM Computing Surveys (CSUR), 37(3):195-237, 2005.
N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, and W. Ye. Energy-Driven Integrated Hardware-Software Optimizations Using SimplePower. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
A. Weissel and F. Bellosa. Process cruise control: Event-driven clock scaling for dynamic power management. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2002), Grenoble, France,, Aug. 2002.
A. Weissel, B. Beutel, and F. Bellosa. Cooperative I/O-A Novel I/O Semantics for Energy-Aware Applications. In Proceedings of the Fifth Symposium on Operating System Design and Implementation OSDI'02, 2002.
Q. Wu, P. Juang, M. Martonosi, and D. W. Clark. Voltage and Frequency Control with Adaptive Reaction Time in Multiple-Clock-Domain Processors. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA-11), 2005.
Q. Wu, V. Reddi, Y. Wu, J. Lee, D. Connors, D. Brooks, M. Martonosi, and D. W. Clark. A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance. In Proceedings of the 38th International Symp. on Microarchitecture, 2005.
F. Xie, M. Martonosi, and S. Malik. Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2003), June 2003.
T. Y. Yeh and Y. N. Patt. Alternative implementations of two-level adaptive branch prediction. In 19th Annual International Symposium on Computer Architecture, May 1992.
J. J. Yi, D. J. Lilja, and D. M. Hawkins. A Statistically Rigorous Approach for Improving Simulation Methodology. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA-9), Feb. 2003.
J. J. Yi, R. Sendag, L. Eeckhout, A. Joshi, D. J. Lilja, and L. K. John. Evaluating Benchmark Subsetting Approaches. In Proceedings of the IEEE International Symposium on Workload Characterization, Oct. 2006.
H. Zeng, X. Fan, C. Ellis, A. Lebeck, and A. Vahdat. ECOSystem: Managing energy as a first class operating system resource. In Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X), Oct. 2002.
M. T. Zhang. Powering Intel Pentium 4 generation processors. In IEEE Electrical Performance of Electronic Packaging Conference, pages 215-218, 2001.
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic Tracking of Page Miss Ratio Curve for Memory Management. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XI), 2004.

File translated from TEX by TTH, version 3.77.
On 18 Jun 2007, 22:28.