python缓存层次结构模拟器
pycachesim的Python项目详细描述
pycachesim
用python编写的单核缓存层次结构模拟器。
目标是精确模拟现代处理器中所有缓存级别的缓存(分配/命中/未命中/替换/逐出)行为。它是作为kerncraft的后端开发的,但也计划引入一个命令行接口来重放加载/存储指令。
- 当前支持的功能:
- 包含缓存层次结构
- LRU、MRU、RR和FIFO策略
- N向缓存关联性
- 使用写回缓存进行写分配
- 使用直写缓存进行非写分配
- 结合子块写入
- 跟踪缓存线状态(例如,使用脏位)
- 速度(核心在C中实现)
- python 2.7+和3.4+支持,没有其他依赖项
- 计划功能:
- 报告所有级别的缓存线(通过backend.verbosity > 0提供初步支持)
- 报告缓存事件的时间线(通过backend.verbosity > 0提供初步支持)
- 可视化事件(HTML文件?)
- 访问历史回放的valgrind基础设施接口(请参见Lackey)。
- (不确定)指令缓存
- 可选分类为强制/容量和冲突未命中(通过并行模拟其他缓存配置)
- (不确定)多核支持
许可证
Pycachesim是根据AGPLv3授权的。
使用量
fromcachesimimportCacheSimulator,Cache,MainMemorymem=MainMemory()l3=Cache("L3",20480,16,64,"LRU")# 20MB: 20480 sets, 16-ways with cacheline size of 64 bytesmem.load_to(l3)mem.store_from(l3)l2=Cache("L2",512,8,64,"LRU",store_to=l3,load_from=l3)# 256KBl1=Cache("L1",64,8,64,"LRU",store_to=l2,load_from=l2)# 32KBcs=CacheSimulator(l1,mem)cs.load(2342)# Loads one byte from address 2342, should be a miss in all cache-levelscs.store(512,length=8)# Stores 8 bytes to addresses 512-519,# will also be a load miss (due to write-allocate)cs.load(512,length=8)# Loads from address 512 until (exclusive) 520 (eight bytes)cs.force_write_back()cs.print_stats()
这应该返回:
CACHE*******HIT***************MISS**************LOAD*************STORE*******L11(8B)2(65B)3(73B)1(8B)L20(0B)2(128B)2(128B)1(64B)L30(0B)2(128B)2(128B)1(64B)MEM2(128B)0(0B)2(128B)1(64B)
每行表示一个内存级别,从l1开始,到主内存结束。l1中的3个负载是对缓存层次结构的所有单独访问的总和。1(从第一次加载开始)+1(从具有写分配的存储开始)+1(从第二次加载开始)=3。
第一次命中是针对已经缓存的字节。在内部,pycachesim在缓存线上运行,所有地址都转换到缓存线上。因此,贯穿所有缓存级别的两个未命中实际上是两个完整的缓存线,并且在加载缓存线之后,对同一缓存线的连续访问将作为命中处理。这也是数据大小从l1增加到l2的原因。一级是按字节访问的,二级只使用缓存线粒度。
所以:l1中的命中、未命中、存储和加载是按字节进行的。所有其他统计信息都基于缓存线。
使用受害者缓存时,将受害者设置为受害者缓存级别,将导致pycachesim在替换时将未修改的缓存线转发到此级别。在未命中期间,将检查受害者的可用性,只有找到缓存线时才会命中。这意味着,在受害者缓存中,加载状态将等于命中状态,而未命中应始终为零。
与其他缓存模拟器的比较
在为kerncraft搜索更通用的缓存模拟器时,我偶然发现了以下几点:
- gem5: Very fully-featured full system simulator. Complex to extract only the memory subsystem
- dineroIV: Nice and simple code, but does not support exclusive caches and not available under open source license.
- cachegrind: Maintained and stable code of a well established open source project, but only supports inclusive first and last level caches.
- callgrind: see cachegrind
- SMPcache: Only supports one single cache and runs on Windows with GUI. Also not freely available.
- CMPsim: Was only academically published and source code never made available.
- CASPER: Was only academically published and source code never made available.
Package | instructions [0] | blocks [1] | sub-blocks [2] | associtivity [3] | LRU [4] | MRU [4] | FIFO [4] | RR [4] | CCC [5] | 3+ levels [6] | exclusive [7] | victim [8] | multi-core [9] | API [10] | open source [11] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gem5 | x | x | ? | x | x | x | x | ? | ? | x | ? | ? | ? | python, ruby, c++ | yes, BSD-style |
dineroIV | x | x | x | x | x | x | x | x | x | c | no, free for non-comercial use | ||||
cachegrind | x | x | x | x | cli | yes, GPLv2 | |||||||||
callgrind | x | x | x | x | cli | yes, GPLv2 | |||||||||
SMPcache | x | x | x | x | x | ? | Windows GUI | no, free for education und research | |||||||
CMPsim | x | x | x | x | x | x | x | ? | ? | x | ? | no, source not public | |||
CASPER | x | x | x | x | x | x | x | x | x | x | x | perl, c | no, source not public | ||
pycachesim | x | x | x | x | x | x | x | x | x | x | python, C backend | yes, AGPLv3 |
[0] | Instruction cache support (typically L1I) |
[1] | Cacheline/block granular caching |
[2] | Sub-blocking/sectoring for in cache-storage |
[3] | Support for n-way associativity |
[4] | (1, 2, 3, 4) Support least-recently-used (LRU), most-recently-used (MRU), first-in-last-out (FIFO), random (RR) replacement policy |
[5] | Classification of misses into: compulsory (first time access), capacity (access after replacement), conflict (would have been a hit with full-associativity) |
[6] | Combining of at least three cache levels |
[7] | Exclusive cache relations (two levels may not share the same cacheline) |
[8] | Victim caches, where only evicted lines endup(e.g., AMD Bulldozer L3) |
[9] | Multi-core cache hierarchies with private and shared caches and cache coherency protocol |
[10] | Supported interfaces (cli = command-line-interface) |
[11] | Published under an Open Source Initiative approved license? |