Analyzing the Effect of Predictability of Memory References on WCET

 

CSC 714 Real Time Systems - Spring 2014

Amir Bahmani,  Vishwanathan Chandru

abahman@ncsu.edu,vchandr6@ncsu.edu


 

 

Abstract

 

In a commercial multi-core system, multiple cores share many resources including DRAM. In such a multi core system, two completely unrelated tasks can cause cross interference, especially in terms of memory access. This variable and unpredictable delay poses a significant challenge when it comes to predictability and isolation. Given the weighing of predictability over performance in real time systems, it is a highly undesirable characteristic. In this project we target a particular NoC (Network on Chip) architecture for analysing memory contention and unpredictability. TileraPro64 has 64 core connected by a mesh interconnect which is used for communication between tiles and between memory controllers. Platforms consisting of several cores (multi-cores) and more than a dozen of cores (many-cores) have nowadays become the mainstream in many scientific areas, most notably high performance computing, while are the new frontier technology in others like real-time embedded systems. Along with positives of whole lot of processing power and increased system availability, this comes with multiple latencies, especially in terms of memory accesses as multiple cores try to access the memory at the same time. This becomes even worse with a multi-processor board like Tilera which is having processors in order of 50’s and above, NoC added to that unpredictability. As a part of this project we propose a controller aware memory controller which reduces the serialization of memory accesses resulting into a better predictability and performance isolation. As a part of this project we also try to quantify the impact of memory bound tasks on WCET.


Tilera

 

Introduction

 


When it comes to multi core systems, all cores share the same memory. Thus inefficient use of memory can easily become a performance bottleneck and a source of unpredictable behaviour. A short literature survey to understand the delay or unpredictability added due to contention of resources especially memory. First was the contention to access the same memory when memory is not divided into banks or bank aware allocation is not performed. Two analysis procedures found in this context request driven analysis and job driven analysis. It was also found about various inter-bank and intra bank interferences result to unpredictability [1]. We found one of the bank aware allocator called PALLOC which alleviated this problem to a certain extent [2]. But static/dynamic partitioning alone cannot solve this problem as when we have a board like Tilera with large number of cores and NoC packets being sent across for memory request and communication predicting becomes tricky [3]. We need to consider worst case latency in this scenario using per pattern analysis. Also if we consider involvement of multiple memory controllers and live process migration for load balancing situation becomes even trickier. So we concluded that we need a multi faced solution which involves distributing memory requests across controllers to reduce latency, static/dynamic banking of memory and optimal placement of processes among cores to ensure minimum n/w traffic and minimum latency, thus a more predictable and reliable WCET. So we concluded that we need a multi faced solution which involves distributing memory requests across controllers to reduce latency, static/dynamic banking of memory and optimal placement of processes among cores to ensure minimum n/w traffic and minimum latency, thus a more predictable and reliable WCET.
      In this project we target TileraPro64 hardware platform. This board features 64 identical cores (alternatively called tiles), each connected via mesh interconnect. It features various networks for communication namely IDN (I/O dynamic network for OS usage and streaming data), MDN (Memory Dynamic Network for loads/stores/pre-fetches/cache misses/DMA), UDN (User Dynamic Network, used in BME), CDN (Coherence Dynamic Network for L3 invalidations), TDN (Tile Dynamic Network for used by cache for core-tocore block transfers). Each tile/core is fully fledged processor consisting of L1 and L2 cache. It features a soft L3 cache composed by sharing L2 caches of all processors. It features 4 memory controllers each controlling upto 16 GB of memory. It has 32 bit virtual address space anda 36 bit global physical address space. In this project we try to analyze and quantify the impact of memory contention on WCET and isolation of tasks in terms of memory. There are two possible approaches to be considered. First approach is a kernel level memory aware allocator and alternative approach is user space memory allocator.



 

Project Status

 

Task

Status

Ramp up on tilera architecture and APIs (both): 24th March 2014

 Completed

Figuring out base parameters for colored malloc (Vishwanathan): 24th March 2014

 Completed

Figure out way to remap the memory accesses (Vishwanathan): 14th April 2014

 Deferred

Figure out a way to figure out the corresponding MC from physical address (Vishwanathan): 14th April 2014

 Completed

Memory Mapping Strategy (Vishwanathan) : 14th April 2014

 Completed

Strategy for Contention Analysis (Vishwanathan) : 14th April 2014

 Completed

Malloc Implementation (Vishwanathan): 14th April 2014

 Completed

Test Cases Design and Implementation(Vishwanathan): 14th April 2014

 Completed

Task Mapping Algorithm and Implementation (Amir): 21st April 2014

Deferred

Task Mapping Algorithm Validation (Vishwanathan): 24th April 2014

Deferred

Final Report (Vishwanathan): 01st May 2014

 Completed



 

Deliverables

Proposal

Interim Report 1

Interim Report 2

Final Report

Source Code

     Git : use the following command to clone the repo(mail me ssh public key of system before cloning the repository

      git clone git@github.ncsu.edu:vchandr6/rtcs.git rtcs

     OR

tar.gz

 

 

 




 

References

[1] Hyoseung Kim , Dionisio de Niz, Bj ̈ orn Andersson† , Mark Klein†, Onur Mutlu, Ragunathan (Raj) Rajkumar , Bounding Memory Interference Delay in COTS-based Multi-Core Systems in COTS-based Multi-Core Systems
[2] Heechul Yun , Renato Mancuso , Zheng-Pei Wu , Rodolfo Pellizzoni, PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms
[3] Borislav Nikoli ́ c, Patrick Meumeu Yomsi and Stefan M. Petters, Worst-Case Memory Traffic Analysis for Many-Cores using a Limited Migrative Model
[4] https://android.googlesource.com/kernel/common.git/+/android-3.0/
[5] Tilera Architecture documentation
[6] Porting Barrelfish to the Tilera TILEPro64 Architecture, ROBERT RADKIEWICZ and XIAOWEN WANG, KTH Information and Communication technology
[7] Cacheaware Parallel Programming for Manycore Processors, Ashkan Tousimojarad and Wim Vanderbauwhede, School of Computing Science, University of Glasgow, Glasgow, UK
[8] http://fivelinesofcode.blogspot.com/2014/03/how-to-translate-virtual-to-physical.html
[9] Many-Core Key-Value Store, Mateusz Berezecki, Eitan Frachtenberg, Mike Paleczny , Facebook, Kenneth Steele, Tilera
[10] TILE-Gx100 ManyCore Processor: Acceleration Interfaces and Architecture, Carl Ramey ,Principal Architect, Tilera Corp
[11] UG104-IO-Device-Guide
[12] Architectures for Multimedia Systems, TILERA – TILE64™ PROCESSOR, Mondello Filippo
[13] UG101-User-Architecture-Reference