Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | kory-gilmore |
View: | 213 times |
Download: | 0 times |
Positive and Negative Randomness
Paul Vitanyi CWI, University of Amsterdam
Joint work with Kolya Vereshchagin
Non-Probabilistic Statistics
Classic Statistics--Recalled
Probabilistic Sufficient Statistic
Kolmogorov complexity
K(x)= length of shortest description of x K(x|y)=length of shortest description of x
given y.
A string is random if K(x) ≥ |x|.
K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)
Randomness Deficiency
Algorithmic Sufficient Statistic where model is a set
Algorithmic suficient statistic where model is a total computable function
Data is binary string x;Model is a total computable function p ;Prefix complexity is K(p) (size smallest TM computing p);Data-to-model code length l_x(p)=min_d {|d|:p(d)=x.
x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small.p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p.
p is minimal ss (sophistication) for x if K(p) minimal.
Graph Structure Function
h_x(α)
α
log |S| Lower boundh_x(α)=K(x)-α
Minimum Description Length estimator, Relations between estimators
Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}.
MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}.
Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.
Individual characteristics: More detail, especially for meaningful (nonrandom) Data
We flip the graph so that log|.|is on the x-axis and K(.) is on the y-axis. This is essentally theRate-distortion graph for list (set)distortion.
Primogeniture of ML/MDL estimators
•ML/MDL estimators can be approximatedfrom above;•Best-fit estimator cannot be approximatedEither from above or below, up to anyPrecision.•But the approximable ML/MDL estimatorsyield the best-fitting models, even thoughwe don’t know the quantity of goodness-of-fit ML/MDL estimators implicitlyoptimize goodness-of-fit.
Positive- and Negative Randomness,
and Probabilistic Models
Precision of following given function h(α)
h(α)
d
h_x(α)
Model cost α
Data-to-Model cost log |S|
Logarithmic precision is sharp
Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those arethe strings of high complexity K(x) > n.
For strings of low complexity, say K(x)< n/2,The number of appropriate functions is muchgreater than the number of strings. Hence there cannot be a string for every such function. Butwe show that there is a string for every approximate shape of function.
All degrees of neg. randomness
Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n(up to a log term)
Proof. All shapes of the structure function are possible, as long as it starts from n-k anddecreases monotonicallyand is 0 at k for some k ≤ n.(Up to the precision in the previous slide).
Are there natural examples of negative randomness
Question: Are there natural examples of strings ofwith large negative randomness. Kolmogorov didn’tThink they exist, but we know the are abundant..
Maybe information distance between strings xand y yields large negative randomness.
Information Distance:
• Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98)
D(x,y) = min { |p|: p(x)=y & p(y)=x}
Binary program for a Universal Computer(Lisp, Java, C, Universal Turing Machine)
Theorem (i) D(x,y) = max {K(x|y),K(y|x)}
Kolmogorov complexity of x given y, definedas length of shortest binary ptogram thatoutputs x on input y.
(ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x.
≤ 1
(iii) D(x,y) is a metric.
Not between random strings
• The information distance between random strings x and y of length n doesn’t work.
• If x,y satisfy K(x|y),K(y|x) > n then
p=x XOR y where XOR means bitwise
exclusive-or serves as a program to
translate x too y and y to x. But if x and y
are positively random it appears that p
is so too.
T
Selected BibliographyN.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.