CPU cache引发的性能血案

在__为什么转置一个512x512的矩阵,会比513x513的矩阵慢很多?__一文中,作者引用了一个矩阵转置的例子,来讲解由于CPU cache的失效而带来的性能损失。

上面的文章对问题的解释与讨论都非常的透彻。我的这篇文章只是对上面文章的一篇读后感和实验报告。就酱。

CPU cache 之 组相联

组相联

组相联的实现和原理不必再赘述了。我想讨论的是,如何在编程中优化CPU的cache性能。

查看CPU信息

我的系统是Ubuntu 12.04,CPU是i5-3230M。(屌丝机 :D)

wizmann@Wichmann:~$ ls /sys/devices/system/cpu/cpu0/cache/index0/
coherency_line_size  physical_line_partition  size
level                shared_cpu_list          type
number_of_sets       shared_cpu_map           ways_of_associativity

使用cat命令查看文件内容,可以获得CPU L1 cache的一些信息。

Key Value
level …
more ...

玩玩算法题1:Sherlock and Queries

题目大意

给你三个数组:A[N], B[M], C[M]。让你按如下pseudo-code给出的规则计算,求出最终A[N]每一项的值。

for i = 1 to M do
    for j = 1 to N do
        if j % B[i] == 0 then
            A[j] = A[j] * C[i]
        endif
    end do
end do

数据范围

1≤ N,M ≤ 10^5

1 ≤ B[i …

more ...

Codeforces Round #253 Tutorial


443A - Anton and Letters

Simple and easy, solved by two lines of python code.

ls = filter(lambda y: y, map(lambda x: x.strip(), raw_input()[1:-1].split(",")))
print len(set(ls))

443B - Kolya and Tandem Repeat

Brute force. Just enumerate the beginning and the end of the substring, and …

more ...

A simple problem - World at War

Background

This problem is from the book "Algorithm 4th edition" (Exersise 4.1.10)

There are N cities and M undirected roads between those cities. People can travel to any city along the roads.

One day, a war breaks out. Our cities are under attack! As we can't defend all …

more ...

How to implement a queue with stack(s)?

This problem is from the book Algorithms, 4th Edition.

Queue with three stacks. Implement a queue with three stacks so that each queue operation takes a constant (worst-case) number of stack operations.

Warning : high degree of difficulty.

When I search the Internet to find a solution, I find varieties of …

more ...

逗比带你读论文之Barrier

知识共享许可协议

逗比带你读论文之BarrierWizmann 创作

采用 知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议进行许可。

原文地址

Barrier February 17th, 2007

前言:我艹!好多核!

虽然80核心的浮点数运算巨兽离我们还有些遥远,但是多核处理器已经走进了我们的生活。

其实多核处理器已经不是新鲜名词了,在Power Macintosh 9500中,就使用了多核处理器。

现在让我们深入了理解一下多核心处理器吧。

线程技术

一些名词定义

线程

线程只是 抢占调度的(pre-emptively scheduled) 共享地址空间的 执行上下文。

多线程

用来简化控制流和绕开阻塞的系统调用的方法,并非用来实现程序的并行化

并发多线程

在物理上并发的线程,用来利用多核处理器来优化系统性能

为什么“并发多线程”是个神坑

人人都在说”并发多线程“是个神坑,但这种老生常谈并不是因为自然原因 …

more ...

Codeforces Round #242 (Div. 2) Tutorials and Solutions

A. Squats

Trun x => X or X => x to make the number of 'x' is equal to the number of 'X'.

n = int(raw_input())
hamsters = [c for c in raw_input()]

sits = hamsters.count('x')
stands = hamsters.count('X')

if sits == stands:
    print 0
    print ''.join(hamsters)
else:
    if sits > stands …
more ...


"alloca" vs "placement new"

WHAT?!

For most time, we use malloc or new for memory allocation, which will get it on heap.

However, access memory on heap is not as effective as the memory on stack, because the heap is "free-floating region of memory". To the contrary, memory on stack is managed by CPU …

more ...

Quartile and Normal Distribution

What is Quartile?

Quartile is a concept of descriptive statics. The quartiles of a ranked data set are the three points that divide the data set into four equal groups, each group comprising a quarter of data.

Definitions

first quartile (Q1)

First quartile, also known as lower quartile, splits off …

more ...