gem5 RISC-V BOOM Processor Evaluation
Evaluating gem5's RISC-V ISA models using the BOOM processor
As part of a team project, we conducted a comprehensive evaluation of gem5’s RISC-V ISA models using the Berkeley Out-of-Order Machine (BOOM) processor. This research project focused on analyzing the performance characteristics, architectural trade-offs, and simulation accuracy of different processor configurations within the gem5 architectural simulator framework.
The project employed advanced computer architecture simulation techniques, performance benchmarking methodologies, and statistical analysis to evaluate processor designs across multiple workload scenarios. By utilizing the BOOM processor as our primary evaluation platform, we gained insights into out-of-order execution, pipeline optimization, and the impact of various architectural parameters on overall system performance.
Our primary goal was to investigate whether computer architecture simulation tools like gem5 are sufficient for determining the power, performance, or area (PPA) effects of custom processors, or if the development effort of a concrete processor core is necessary. We compared gem5’s simulation capabilities against the Chipyard framework and Verilator for BOOM processor modeling.
Table of Contents
- Main Goals
- Technical Background
- Experimental Platform
- Methodology
- Implementation and Results
- Discussion and Limitations
- Conclusion and Future Work
- References
Main Goals
- Evaluate gem5’s RISC-V ISA models using BOOM processor
- Analyze performance characteristics of different processor configurations
- Assess simulation accuracy and reliability of gem5 framework compared to Chipyard/Verilator
- Compare architectural trade-offs in out-of-order execution
- Implement comprehensive benchmarking methodology
- Design systematic evaluation framework for processor performance using riscv-tests and CoreMark
- Develop statistical analysis techniques for simulation results
- Create reproducible methodology for architectural research
- Analyze cache and memory system performance
- Evaluate different cache configurations (32B vs 64B block sizes) and their impact on performance
- Study memory hierarchy optimization strategies including prefetching mechanisms
- Assess the effectiveness of different prefetcher configurations (AMPM, next-line, tagged)
- Investigate pipeline optimization techniques
- Analyze out-of-order execution efficiency using custom matrix computation benchmarks
- Evaluate branch prediction accuracy using TAGE predictor configurations
- Study instruction-level parallelism and its limitations in 4-wide decode configurations
Technical Background
gem5 Architectural Simulator
The gem5 simulator is a modular, open-source computer architecture simulator that provides a flexible framework for evaluating different processor designs. It supports multiple instruction set architectures (ISAs) including RISC-V, ARM, x86, and others.[5] The simulator consists of several key components:
- CPU Models: Various CPU models including simple, in-order, and out-of-order processors
- Memory System: Configurable cache hierarchies and memory controllers
- Interconnect: Flexible network-on-chip and bus models
- I/O Systems: Support for various I/O devices and interfaces
gem5’s modular design allows researchers to easily configure different architectural parameters and evaluate their impact on performance, power consumption, and other metrics.
BOOM Processor Architecture
The Berkeley Out-of-Order Machine (BOOM) is a high-performance, out-of-order RISC-V processor implementation designed for research and education. Key architectural features include:
- Out-of-Order Execution: Dynamic instruction scheduling for improved performance
- Speculative Execution: Branch prediction and speculative instruction execution
- Register Renaming: Eliminates false dependencies and enables better instruction-level parallelism
- Load/Store Queue: Manages memory operations and maintains memory consistency
- Reorder Buffer: Ensures correct program execution order
SonicBOOM (BOOMv3) Configuration:
Parameter | Value |
---|---|
Fetch Width | 8 |
Decode Width | 4 |
ROB Entries | 128 |
Integer Registers | 128 |
Floating-point (FP) Registers | 128 |
Load Queue (LDQ) Size | 32 |
Store Queue (STQ) Size | 32 |
Maximum Number of Branches | 20 |
Fetch Buffer Entry Size | 32 |
Prefetching | TRUE |
Table 1. SonicBOOM (MegaBoomConfig) Tile Parameters.
BOOM’s design provides a realistic baseline for evaluating modern processor architectures and understanding the complexities of out-of-order execution.[7]
%20High-level%20Architecture.png)

RISC-V ISA Models
RISC-V is an open-source instruction set architecture that provides a foundation for processor design and evaluation.[14] The ISA includes:
- Base Integer Instructions: Core computational and control flow instructions
- Floating-Point Extensions: Support for single and double precision operations
- Vector Extensions: SIMD operations for data-parallel workloads
- Privileged Architecture: Support for different privilege levels and virtualization
The modular nature of RISC-V allows for flexible processor implementations and enables detailed analysis of different architectural decisions.
Feature | Sonic BOOM | Risky-OOO | RSD | FabScalar |
---|---|---|---|---|
ISA | RV64GC | RV64G | RV32IM | PISA (sub-set) |
DMIPS/MHz | 3.98 | ? | 2.04 | ? |
SPEC2006 IPC | 0.86 | 0.48 | N/A | ? |
Dec Width | 1-5 | 2 | 2 | 2-5 |
Mem Width | 16b/cycle | 8b/cycle | 4b/cycle | 4-8b/cycle |
HDL | Chisel3 | BSV+CMD | System Verilog | FabScalar toolset with CPSL |
Table 2. Comparison of alternative academic or open-source out-of-order processors.
Component | Present in BaseO3CPU? | Comments |
---|---|---|
TAGE-L Branch Predictor | ✘ | |
MEM Issue Queue | ✘ | |
L1/L2 Caches | ✘ | Dual-ported Data Cache Present. Needs to be expanded. |
L0 and Dense L1 BTB | ✘ | |
Instruction Fetch and PreDecode | ✓ | Needs to be integrated into 16-byte window. |
RAR ReOrder Buffer | ✘ | |
Decoder Blocks | ✓ | Singular decoder needs to be expanded to 4-wide. |
Execution Units | ✓ | Requires multithreading support. Need AGU. |
Table 3. Significant BOOM hardware modules required for modeling in gem5.
Experimental Platform
Hardware Configuration
Our experimental platform consisted of high-performance computing resources capable of running complex architectural simulations:
- Multi-core CPU: High-performance processor for simulation execution
- Large Memory Capacity: Sufficient RAM for complex simulation workloads
- High-speed Storage: Fast storage for simulation data and results
- Network Connectivity: Access to distributed computing resources
The hardware configuration was designed to support long-running simulations and handle the computational demands of architectural evaluation.
Software Framework
The software framework included several key components:
- gem5 Simulator: Latest stable release with RISC-V and BOOM support
- RISC-V Toolchain: Complete development environment for RISC-V applications
- Benchmark Suite: Standardized workloads for performance evaluation
- Analysis Tools: Custom scripts for data processing and visualization
- Statistical Software: Tools for statistical analysis and result validation
The software stack was carefully configured to ensure reproducibility and enable detailed analysis of simulation results.
Methodology
Benchmark Selection
We selected a comprehensive set of benchmarks to evaluate different aspects of processor performance:
Custom C Matrix Computation Benchmark:
- Computes determinant
- Computes sum
- Evaluates if identity matrix
riscv-tests Repository Benchmarks:
- 1D 3 element median filter (median)
- Two input stream multiplication (multiply)
- QuickSort (qsort)
- Reverse QuickSort (rsort)
- Sparse matrix-vector multiplication (spmv)
- Towers of Hanoi (towers)
- Vector-vector add (vvadd)
riscv-coremark Repository:
- Ultra-low power and IoT tests
- Heterogeneous Compute tests
- Single-core Performance tests
- Multi-core Performance tests
- Phone and Tablet tests
These benchmarks were chosen because they show good general-purpose compute capability and provide stress testing for functional units within the BOOM processor.
Performance Metrics
Our evaluation focused on several key performance metrics:
- Instructions Per Cycle (IPC): Primary measure of processor efficiency
- Cache Miss Rates: Evaluation of memory system performance
- Branch Prediction Accuracy: Assessment of control flow prediction
- Memory Bandwidth: Analysis of memory system utilization
- Power Consumption: Energy efficiency considerations
These metrics provided a comprehensive view of processor performance and architectural trade-offs.
Statistical Analysis
We employed rigorous statistical methods to ensure the reliability of our results:
- Multiple Runs: Each configuration was evaluated across multiple simulation runs
- Confidence Intervals: Statistical analysis to quantify result uncertainty
- Outlier Detection: Identification and handling of anomalous results
- Correlation Analysis: Understanding relationships between different metrics
The statistical framework ensured that our conclusions were based on reliable and reproducible data.
Implementation and Results
Implementation Progress
We successfully set up the Chipyard framework in the local environment and created RISC-V configurations for gem5 along with a repository of our progress and gem5 outputs. The RISC-V GNU Toolchain was set up on an ISU ETG virtual machine to allow for compilation of programs that will run on the BOOM core.
Key Implementation Achievements:
- Created custom gem5 configuration scripts in Python based on
fs_linux.py
andriscv-ubuntu-run.py
[4] - Successfully tested static compilation of matrix computation programs to RISC-V
- Configured BOOM processor under Chipyard framework using “MegaBoomConfig”[7]
- Established GitHub repository for configuration, test, and result files
Technical Challenges Overcome:
- Resolved lockfile issues by switching to WSL environment
- Fixed JVM memory allocation challenges in Chipyard framework
- Solved gem5 segmentation faults by removing HiFive() platform from system configuration
- Corrected memory size validation errors by properly configuring IO and memory buses with Bridge components
CoreMark Benchmark Analysis
Our CoreMark benchmark analysis revealed several key insights into processor performance:
- Baseline Performance: Established performance baseline for BOOM processor
- Configuration Impact: Evaluated the effect of different architectural parameters
- Scalability Analysis: Studied performance scaling with different workload sizes
- Optimization Opportunities: Identified areas for architectural improvement
The CoreMark results provided a standardized comparison point for evaluating processor efficiency and identifying optimization opportunities.
Cache Performance Evaluation
Cache performance analysis focused on understanding memory system behavior across three different gem5 configurations:
Configuration 1 (32B Block Size with AMPM Prefetchers):
- Average cycle count difference: ~8.76%
- Closest matches: rsort (10.85%) and multiply (11.62%)
- Largest differences: spmv (79.01%) and towers (46.21%)
- Host execution time: ~197.51% faster in gem5
Configuration 2 (64B Block Size, No iCache Prefetcher):
- Average cycle count difference: ~12.26%
- Improved performance for rsort (8.34%) and qsort (10.28%)
- Maintained similar host execution time differences (~197%)
- Best overall accuracy for most benchmarks
Configuration 3 (64B Block Size with Next-Line Prefetchers):
- Average cycle count difference: ~6.56%
- Best match: multiply benchmark (4.46%)
- Most accurate model with next-line prefetchers and modified latencies
- Host execution time: ~197.59% faster in gem5
Cache Configuration Comparison:
Configuration | Block Size | Prefetcher | Avg Cycle Diff % |
---|---|---|---|
Config 1 | 32B | AMPM | 8.76% |
Config 2 | 64B | None | 12.26% |
Config 3 | 64B | Next-Line | 6.56% |
Table 6. Cache configuration comparison showing the impact of block size and prefetcher settings on simulation accuracy.
Key Findings:
- Cache block size significantly impacts performance (32B vs 64B)
- Prefetcher configuration has substantial effect on memory system behavior
- Host execution time differences consistently ~198% (gem5 much faster than Verilator)
- Complex benchmarks like CoreMark would take weeks/months on Verilator vs hours on gem5
- Configuration 3 (64B, next-line prefetcher) provided the best overall accuracy
Detailed Performance Analysis
Our comprehensive performance analysis across multiple benchmarks revealed critical insights into processor behavior and simulation efficiency:
Benchmark Performance Summary:
Benchmark | gem5 Cycles | BOOM Cycles | Cycle Diff % | Host Time Diff % |
---|---|---|---|---|
matrix_prog | 56,679 | 45,525 | 21.83% | 198.78% |
median | 45,970 | 60,254 | 26.89% | 198.46% |
multiply | 60,776 | 68,277 | 11.62% | 197.83% |
qsort | 386,717 | 339,421 | 13.03% | 196.77% |
rsort | 274,268 | 246,044 | 10.85% | 196.48% |
spmv | 102,578 | 236,561 | 79.01% | 198.95% |
towers | 26,666 | 42,691 | 46.21% | 198.58% |
vvadd | 45,281 | 51,664 | 13.17% | 197.39% |
Average | 124,867 | 136,305 | 27.7% | 197.51% |
Table 4. Detailed performance comparison between gem5 and BOOM simulations across riscv-tests benchmarks.
CoreMark Benchmark Analysis:
Configuration | gem5 Host Hours | Expected BOOM Hours | Time Ratio |
---|---|---|---|
coremark config1 | 19.62 | 3,182.59 | 162.2x |
coremark config2 | 30.78 | 4,992.81 | 162.2x |
Average | 25.20 | 4,087.70 | 162.2x |
Table 5. CoreMark benchmark performance showing dramatic simulation time differences.
Key Performance Insights:
Branch Prediction Performance:
- BTB Hit Ratio: Average 93.1% across benchmarks, indicating effective branch target prediction
- Conditional Prediction Accuracy: High accuracy with minimal mispredictions
- RAS Usage: Effective return address stack utilization with very low incorrect predictions
Cache Performance Analysis:
- Data Cache Miss Rates: Varied significantly by benchmark (2.6% to 13.1%)
- Instruction Cache Miss Rates: Generally low (1.6% to 7.2%) indicating good spatial locality
- L2 Cache Performance: Effective second-level cache with reasonable miss rates
Pipeline Efficiency:
- Issue Rate: Average 0.81 instructions per cycle across benchmarks
- CPI (Cycles Per Instruction): Average 1.72, indicating good instruction-level parallelism
- IPC (Instructions Per Cycle): Average 0.74, showing effective out-of-order execution
Simulation Efficiency:
- Host Time Difference: Consistent ~198% faster simulation in gem5 vs Verilator
- Complex Benchmark Handling: CoreMark would require 162x longer simulation time on Verilator
- Scalability: gem5 demonstrates excellent scalability for large-scale architectural evaluation
Microarchitectural Performance Analysis
Branch Prediction Performance:
- BTB Hit Ratio: 93.1% average across all configurations, indicating excellent branch target prediction
- Conditional Prediction Accuracy: High accuracy with minimal mispredictions across benchmarks
- RAS Usage: Effective return address stack utilization with very low incorrect predictions (<1%)
- Indirect Branch Handling: Low misprediction rates for indirect branches
Cache Hierarchy Performance:
Data Cache Analysis:
- Miss Rates: Varied significantly by benchmark (2.6% to 13.1%)
- Average Miss Latency: 59,547 cycles across configurations
- MSHR Performance: Effective miss status holding register utilization
- Prefetcher Impact: AMPM prefetchers showed 8.3% accuracy, 10.3% coverage
Instruction Cache Analysis:
- Miss Rates: Generally low (1.6% to 7.2%) indicating good spatial locality
- Average Miss Latency: 78,390 cycles across configurations
- Prefetcher Effectiveness: Next-line prefetchers showed 28.6% accuracy, 36.1% coverage
L2 Cache Performance:
- Overall Miss Rate: 28.3% average across configurations
- Instruction vs Data: Balanced miss rates between instruction and data streams
- Prefetcher Coverage: 48.8% average coverage across configurations
Pipeline Efficiency Metrics:
- Issue Rate: 0.81 average instructions per cycle across benchmarks
- CPI (Cycles Per Instruction): 1.72 average, indicating good instruction-level parallelism
- IPC (Instructions Per Cycle): 0.74 average, showing effective out-of-order execution
- Functional Unit Utilization: 0.4% average busy rate, indicating efficient resource usage
Memory System Behavior:
- Read vs Write Performance: Read requests showed higher miss rates but lower latencies
- Memory Bandwidth: Effective utilization of available memory bandwidth
- Cache Coherency: Minimal overhead from cache coherency protocols
Pipeline Optimization Results
Pipeline analysis provided insights into instruction execution efficiency using custom gem5 configurations:
gem5 Configuration Details:
- Base Model: O3CPU with RISC-V architecture[3]
- Pipeline Stages: 4-stage fetch + 6 additional stages
- Branch Predictor: LTAGE with RAS of 32 entries
- Functional Units: Configured according to BOOM documentation[15]
- Decode Width: 4-wide to match BOOM’s MegaBoomConfig[7]
Key Pipeline Optimizations:
- Instruction-Level Parallelism: Measured effectiveness of out-of-order execution across benchmarks
- Branch Prediction: Evaluated TAGE predictor accuracy and impact on performance
- Pipeline Stalls: Analyzed causes and frequency using custom matrix computation benchmarks
- Resource Utilization: Studied efficiency of different pipeline resources
Performance Comparison Results:
- Simulation Speed: gem5 consistently ~198% faster than Verilator under Chipyard
- Cycle Accuracy: Achieved 4.46% difference for multiply benchmark in best configuration
- Complex Benchmark Handling: CoreMark would require 100s of days on Verilator vs hours on gem5
The pipeline analysis helped identify bottlenecks and optimization opportunities in the processor design while demonstrating gem5’s superior simulation performance.
Discussion and Limitations
Our evaluation revealed several important findings and limitations:
Key Findings:
- gem5 Superiority: gem5 demonstrates enormous capability in terms of statistical breadth compared to Chipyard’s Verilator simulation framework
- Simulation Speed: gem5 is consistently ~198% faster than Verilator under Chipyard for host execution time
- Configuration Flexibility: gem5 provides extensive configurability and automatic tracking of statistics
- Cycle Accuracy: Achieved 4.46% cycle count difference for multiply benchmark in best configuration
Technical Challenges:
- Incomplete BOOM Documentation: Limited information on exact latencies and composition of functional units for BOOM parameterizations[15]
- Configuration Complexity: Required assumptions for gem5 configuration knobs not defined in BOOM documentation
- Chisel/Scala Knowledge Gap: Better understanding of Chisel and Scala would have aided in deciphering hardware unit properties
- Statistics Limitations: Lack of detailed statistics from Chipyard made refining configurations more difficult
Limitations:
- CoreMark Execution: Unable to complete CoreMark benchmark on Chipyard due to complexity (would take weeks/months)
- Configuration Accuracy: Difficult to achieve one-to-one cycle count match between gem5 and Chipyard simulations
- Hardware Resource Constraints: Limited by Intel Xeon Gold 6140 CPU with 4 cores and 8GB RAM
- Time Constraints: Limited time for refining and tuning gem5 configuration for optimal accuracy
Methodological Considerations:
- Statistical Significance: Results were carefully evaluated across multiple configurations
- Simulation Parameters: Validated against known benchmarks and BOOM documentation
- Result Reproducibility: Ensured through careful experimental design and GitHub repository
Conclusion and Future Work
Our evaluation of gem5’s RISC-V ISA models using the BOOM processor provided valuable insights into modern processor architecture design and evaluation methodologies. The project successfully demonstrated the effectiveness of the gem5 framework for architectural research and highlighted important trade-offs in processor design.
Key Contributions:
- Comprehensive Evaluation Methodology: Developed systematic approach for comparing gem5 vs Chipyard/Verilator simulations
- Performance Analysis: Detailed analysis of cache configurations, prefetching mechanisms, and pipeline optimizations
- Simulation Framework: Established reproducible methodology using riscv-tests and CoreMark benchmarks
- Technical Infrastructure: Created GitHub repository with custom gem5 configurations and results
Key Conclusions:
- gem5 Superiority: gem5 demonstrates enormous capability in terms of statistical breadth and simulation speed
- Simulation Efficiency: gem5 is consistently ~198% faster than Verilator under Chipyard
- Configuration Flexibility: gem5 provides extensive configurability and automatic tracking of statistics
- Research Viability: gem5 is highly capable of accurately modeling hardware-level implementations of RISC-V architectures[3][4]
Future Work:
- VCD Analysis: Use generated VCD (waveform dump files) from Verilator to reverse engineer processor models
- Micro-architectural Event Tracking: Develop comprehensive framework to generate more performance statistics within Chipyard
- Configuration Refinement: Given more time, refine and tune gem5 configuration for optimal accuracy
- Additional Benchmarks: Run benchmarks that stress each functional unit and specific pipeline parts
- Architectural Knowledge: Develop deeper understanding of Chisel HDL to streamline configuration process
The project established that properly configured simulators like gem5 are appropriate for modeling complex processor architectures in detail and could become the de facto standard for simulation across implementation levels.
References
[1] A. Akram and L. Sawalha, "A Comparison of x86 Computer Architecture Simulators," 2016.
[2] F. A. Endo, D. Couroussé and H.-P. Charles, "Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5," in 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014.
[3] A. Roelke and M. R. Stan, "RISC5: Implementing the RISC-V ISA in gem5," 2017.
[4] P. Y. H. Hin, X. Liao, J. Cui, A. Mondelli, T. M. Somu and N. Zhang, "Supporting RISC-V Full System Simulation in gem5," in Proceedings of Computer Architecture Research with RISC-V (CARRV 2021), 2021.
[5] J. Lowe-Power, A. M. Ahmad, A. Akram, M. Alian, R. Amslinger, M. Andreozzi, A. Armejach, N. Asmussen, B. Beckmann, S. Bharadwaj, G. Black, G. Bloom, B. R. Bruce, D. R. Carvalho, J. Castrillon, L. Chen, N. Derumigny, S. Diestelhorst, W. Elsasser, C. Escuin, M. Fariborz, A. Farmahini-Farahani, P. Fotouhi, R. Gambord, J. Gandhi, D. Gope, T. Grass, A. Gutierrez, B. Hanindhito, A. Hansson, S. Haria, A. Harris, T. Hayes, A. Herrera, M. Horsnell, S. A. R. Jafri, R. Jagtap, H. Jang, R. Jeyapaul, T. M. Jones, M. Jung, S. Kannoth, H. Khaleghzadeh, Y. Kodama, T. Krishna, T. Marinelli, C. Menard, A. Mondelli, M. Moreto, T. Mück, O. Naji, K. Nathella, H. Nguyen, N. Nikoleris, L. E. Olson, M. Orr, B. Pham, P. Prieto, T. Reddy, A. Roelke, M. Samani, A. Sandberg, J. Setoain, B. Shingarov, M. D. Sinclair, T. Ta, R. Thakur, G. Travaglini, M. Upton, N. Vaish, I. Vougioukas, W. Wang, Z. Wang, N. Wehn, C. Weis, D. A. Wood, H. Yoon and É. F. Zulian, The gem5 Simulator: Version 20.0+, arXiv, 2020.
[6] W. Heirman, T. E. Carlson and L. Eeckhout, "Sniper: scalable and accurate parallel multi-core simulation," 2012.
[7] J. Zhao, "SonicBOOM: The 3rd Generation Berkeley Out-of-Order Machine," 2020.
[8] C. Celio, D. A. Patterson and K. Asanović, "The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor," 2015.
[9] N. K. Choudhary, S. V. Wadhavkar, T. A. Shah, H. Mayukh, J. Gandhi, B. H. Dwiel, S. Navada, H. H. Najaf-abadi and E. Rotenberg, "FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template," in 2011 38th Annual International Symposium on Computer Architecture (ISCA), 2011.
[10] B. H. Dwiel, N. K. Choudhary and E. Rotenberg, "FPGA modeling of diverse superscalar processors," in 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012.
[11] S. Mashimo, A. Fujita, R. Matsuo, S. Akaki, A. Fukuda, T. Koizumi, J. Kadomoto, H. Irie, M. Goshima, K. Inoue and R. Shioya, "An Open Source FPGA-Optimized Out-of-Order RISC-V Soft Processor," in 2019 International Conference on Field-Programmable Technology (ICFPT), 2019.
[12] S. Zhang, A. Wright, T. Bourgeat and A. Arvind, "Composable Building Blocks to Open up Processor Design," in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018.
[13] C. Celio, P.-F. Chiu, B. Nikolic, D. A. Patterson and K. Asanović, "BOOM v2: an open-source out-of-order RISC-V core," 2017.
[14] C. Atwell, "RISC-V Serves Up Open-Source Possibilities for the Future," ElectronicDesign, 12-Jul-2022. [Online]. Available: https://www.electronicdesign.com/technologies/embedded-revolution/article/21246374/electronic-design-riscv-serves-up-opensource-possibilities-for-the-future. [Accessed: 06-Nov-2022].
[15] "Welcome to RISCV-Boom's Documentation!" RISCV-BOOM, https://docs.boomcore.org/en/latest/index.html.
This project was completed as part of CPRE 581 (Advanced Computer Architecture) at Iowa State University.