Bayesian Statistics and Massive Data Streams
Yoshiyuki Kabashima, Tokyo Institute of Technology

1. Introduction
Given data, inference in Bayesian statistics is carried out by evaluating the posterior probability of unobserved variables. This requires one to take all possible realizations of the variables into account. As a consequence, the cost of the necessary computation grows exponentially in relation to the number of the variables in general. Therefore, the Bayesian approach to massive information processing has been regarded as “unfeasible” for a long time.

2. Statistical mechanics: Bayesian statistics of material objects
Statistical mechanics is a branch of physics which relates the microscopic properties of many elements to the macroscopic behavior of a system that the elements constitute. In this framework, a system in thermal equilibrium is modeled by a probability distribution with respect to the many constituents, and macroscopic behavior is predicted by determining averages of relevant physical quantities with respect to the distribution. This evaluation is parallel to carrying out inference in Bayesian statistics. However, physicists have been utilizing this framework as a “feasible” theory for studying material objects, overcoming the computational difficulty by using numerous approximation techniques and notions inspired by natural phenomena. Recently, such techniques and notions have begun to be employed in research on various inference problems concerning massive data streams.

3. Example: digital wireless communication
As a representative example, let us consider the model of digital wireless communication shown in Fig. 1. This is relevant to communication schemes of cellular phones and wireless LANs in use. Multiple discrete messages are transmitted to a receiver; these messages are linearly transformed and undergo degradation due to channel noise. The receiver has to infer the transmitted messages from the multiple received signals (data). Bayesian statistics offers an optimal inference scheme, which is, unfortunately, computationally difficult to carry out due to the discreteness of the messages. However, statistical mechanics makes it possible to develop nearly optimal and computationally feasible approximate inference algorithms based on “mean field approximation” techniques, accurately assessing the potential abilities and limitations of this system utilizing the notion of “phase transitions” (Fig. 2).

Conclusion
Statistical mechanics can offer practical solutions to the problem of the intrinsic computational difficulty of Bayesian statistics when it is applied to massive data streams. The statistical mechanical approach has been successfully applied not only to the problem of digital wireless communication as shown here, but also to various topics which make use of Bayesian inference, including error correcting codes, cryptography, associative memory, and pattern recognition.

References

Y. Kabashima, A CDMA multiuser detection algorithm on the basis of belief propagation, J. Phys. A 36, 11111-11121 (2003)

T. Tanaka, A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors, IEEE Trans. Inf. Theory 48, 2888-2910 (2002)

Y. Kabashima, T. Murayama and D. Saad, Typical performance of Gallager-type error-correcting codes, Phys. Rev. Lett. 84, 1355-1358 (2000)

Y. Kabashima, T. Murayama and D. Saad, Cryptographical properties of Ising spin systems, Phys. Rev. Lett. 84, 2030-2033 (2000)

S. Uda and Y. Kabashima, Statistical Mechanical Development of a Sparse Bayesian Classifier, J. Phys. Soc. Jpn. 74, 2233-2242 (2005)

Loading more stuff…

Hmm…it looks like things are taking a while to load. Try again?

Loading videos…