陈彦志のBlog

Rice’s Theorem

“Any non-trivial property of the behavior of programs in a r.e. language is undecidable.”

r.e.(recursively enumerable) = recognizable by a Turing-machine

所有我们感兴趣的，需要静态分析的部分，都是non-trivial的。
这句话的意思就是，没有一个完美的静态分析算法可以对一个non-trivial的程序给出一个绝对准确的结果。
完美的静态分析要求既Sound，又Complete。Sound可以理解为全面，Complete可以理解为准确。

注：这里与原作者的图稍有出入，原作者将False Negatives放在了Truth下边，我认为是不对的，本图是按我自己的理解绘制的

虽然根据Rice’s Theorem我们无法实现一个perfect的静态分析，但我们还是可以实现一个userful的静态分析算法。
一般来说，如果说某一个静态分析算法是userful的，那他要么Compromise soundness,要么Compromise completeness。

Compromise soundness(false positives)
Compromise completeness(false negatives)

A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test.

而市面上的静态分析，大多数都是sound的。

Static Analysis: ensure(or get close to)soundness, while making good trade-offs between analysis precision and analysis speed.

Abstraction + Over-approximation

静态分析可以分为may analysis和must analysis。
may analysis可以用两个单词来概括，分别是abstraction和over-approximation
abstraction是may analysis和must analysis两者公有的特性，既然over-approximation是may analysis的特性，那参考上图，under-approximation就是must-analysis的特性呗。

Abstraction

抽象就是将程序里的具体的值转变为特定的几个符号，这样做的意义是减少状态空间的域，加快分析的速度。举个栗子。

这个栗子定义了五个符号，分别代表了正数、负数、零、unknown(一般称为top)和undefined(一般称为bottom)，上图前四条比较好理解，在此不再过多赘述，主要解释一下第五和第六条吧。
第五条的变量v有两种可能性，要么是正数1要么是负数-1，所以我们无法把它归入正数、负数、零这三类中的任何一类，只好新增一类unknown来归入那些具有多种可能性的语句。
第六条的变量v会产生除0异常，类似第五条，我们新增一个undefined来归入这些会产生异常的语句。

Over-approximation

over-approximation包括Transfer functions和Control flows。

Transfer functions

在静态分析中，transfer functions就是根据分析目的和不同程序语句的语义制定的一套转换规则。
transfer functions作用在抽象域(Abstract Domain)。
比如，我们可以制定下图的八个规则。

通过上图这八个规则，我们就可以做一些除0检测或是数组负数下标检测了。

通过观察上图经过transfer function转换的结果，我们就可以很轻易的看出最下边的三条语句可能会产生异常，但如果想要知道具体是哪条语句除0异常，哪条是数组负数下标异常，那就需要更进一步的静态分析了。
另外还需要指出的是，抽象不考虑具体数值，在本例中，最后一条语句是不会产生异常的，但在我们的over-approximated静态分析中，它会false positive。也就是说它会误报，但总体来说它是useful的。

Control Flows

可以说transfer function是语句的over-approximation，而control flow则是语句和语句之间关系的over-approximation。

在实际应用中，枚举所有的路径几乎是不可能的事情，所以在绝大多数的静态分析中，都会采取control-flow的flow merging来加快分析的速度，即多个箭头指向同一语句时进行归并，如上图中绿框所示。

reference: https://www.bilibili.com/video/BV1b7411K7P4

Exploit!

静态分析的基本概念