Some time ago in June 2013 I gave a lab tutorial on Monte Carlo methods at Microsoft Research. These tutorials are seminar-talk length (45 minutes) but are supposed to be light, accessible to a general computer science audience, and fun.
In this tutorial I explain and illustrate a number of Monte Carlo methods (rejection sampling, importance sampling, sequential Monte Carlo, Markov chain Monte Carlo, and simulated annealing) on the same problem. Although I am not exactly a comedian, in order to keep the tutorial fun I peppered the talk with lots of historical anecdotes from the inventors of the methods.
This is the first of three parts.
Part 1
The first part (17 minutes) covers the history of modern Monte Carlo methods, their use in scientific computation, and one of the most basic Monte Carlo methods, rejection sampling.
The video files are also available for offline viewing in MP4/H.264, WebM/VP8, and WebM/VP9 formats.
(Click on the slide to advance, or use the previous/next buttons.)
Transcript
(This is a slightly edited and link-annotated transcript of the audio in the above video. As this is spoken text, it is not as polished as my writing.)
Speaker: Thank you all for coming to this lab tutorial. I know many of you have used Monte Carlo techniques in your research or in your projects. And still I decide to keep the level of this tutorial very basic and I try to show you a few different Monte Carlo methods and how they may be useful in your research. I hope that after the talk you basically understand how these methods can be applied and what different limitations the different methods have. And I will introduce these different methods in chronological order and also say a little about the interesting history, how these methods have been invented.
But before I get to that, I first want to ask you, do you like to play solitaire? I certainly do sometimes play solitaire and when you play a couple of games, you realize that some games are actually not solvable. So some games are just, no matter what you try, no matter what you do, they are just provably not solvable. And so if you shuffle a random deck of 52 cards and put it out as a solitaire deck, it's a valid question to ask is what probability do you get a solvable game? That's the question. And it's precisely this question, precisely this question for the game of Canfield Solitaire that has led to the invention of the modern Monte Carlo methods.
One way to attack this problem would be instead of trying analytic or mathematical approaches, basically having to take into account all the rules of the game, is to just take a random set of cards, play a hundred times after randomly shuffling the cards and just looking at how many times you come up with a solvable game. And that would give you a ballpark estimate on the probability. And that's precisely what this man, Stanislaw Ulam, has recognized, that this is possible.
I want to say a few words about Stanislaw Ulam because he's so crucial to the invention of Monte Carlo methods. So he was born in today's Ukraine in a town called Lviv (formerly Lemberg, in Austria-Hungary). And he was enjoying a very good education. His family had a good background. And he very early discovered in his life that he likes to do mathematics. He was part of the Lviv School of Mathematics who has done many contribution to the more abstract mathematics, vector spaces. So he's known for some of the mathematical results. But then he had to flee to the United States in the 1930s and there became professor in Mathematics and was recruited to Los Alamos to do research on the Hydrogen bomb. Not the first nuclear weapon but on the second Hydrogen bomb design.
During that time in 1946, working at Los Alamos, he had a breakdown. For couple of days he had a headache and he had a breakdown and was delivered to a hospital. The doctors performed an emergency surgery, removed part of his skull, because it turned out he had a brain infection, the brain has swollen and he would have died if the doctors didn't perform his operation. And the doctors told him "You have to recover, you have to stay at home for half a year and don't do any mathematics."
He was obsessed with Mathematics for his whole life. So instead of doing mathematics, he tried to pass the time playing Canfield Solitaire. And while playing Canfield Solitaire he asked the question, "Okay, what's probability to solve this game?" and with his quite broad knowledge of Mathematics he tried a few different analytic attempts to come up with the answer to that question. But ultimately he realized that it is much easier to get an estimate by just playing games randomly.
And at that time he was already doing research in Los Alamos. He recognized that this has applications as well for studying different scientific problems such as Neutron Transport, which is essential to understand when designing nuclear weapons. So he is also the inventor of the first working Hydrogen bomb design together with Edward Teller. And the inventor of the Monte Carlo method, published a few years later in 1949. And also he's known for having performed probably the most laborious manual computation ever undertaken (with Cornelius Everett) to disprove Edward Teller's earlier nuclear weapon design, to show that it is not possible. So very interesting history. I will talk a little bit later about him some more.
So nowadays, Monte Carlo methods, and with Monte Carlo methods, really, I mean, any method where you perform some random experiment, which is typically quite simple, and you aggregate this results into some inferences about a more complex system. Today, Monte Carlo methods are very popular in simulating complex systems. For example, models of physical or biological or chemical processes, for example, weather forecasting, and of course, nuclear weapon design. But also just last week, it was used to simulate the HI Virus capsid. A simulation of 64 million atoms, a major breakthrough in understanding the HI Virus. So it has huge applications in scientific simulations, it also has applications in doing inference in probabilistic models. The most famous system there would be the BUGS system also developed here in Cambridge at the University, initially developed in the early '90s. Infer.NET also supports Monte Carlo inference and here at MSRC also the Filzbach system does. Also there's a quite popular system now, from the University of Columbia, called STAN. It's actually named STAN because of Stanislaw Ulam.
Monte Carlo methods can also be used for optimization. So not just for simulating but also for optimizing a cost function. We will see an example later, but typically it is often used where very complicated systems are optimized. So something like the circuit layout that has many interdependent constraints. And it is also used for planning, for games, and for robotics, where it is essential to approximate intractable quantities, to perform planning under uncertainty or where measurement noise makes it essential to represent uncertainty in a representative way. So these are many, many different applications, too many to really list. I want to pick out one application for the rest of the talk and illustrate this application with different Monte Carlo methods.
And that application is protein folding. So protein folding happens right now in your body, in every cell of your body. In every cell you have a structure called the Ribosome and that's basically the factory in your cell. It transforms information, encoded in the DNA into one linear long structure, the protein. And that structure is such a long chain that folds itself into very intricate three dimensional structures. Very beautiful structures arise, and it is really the three-dimensional shape that this long chain folds into that determines the functional properties. It is really essential to understand in order to make predictions about what these molecules do. This can take anywhere between a few milliseconds and a few hours. And I think the state of the art on a modern machine is to be able to simulate accurately something like 60 nanoseconds per computer day. So we are nowhere in reach of being able to accurately fold these structures. But there is the Stanford Folding@home project which uses Monte Carlo methods. And I think right now, they have something like a hundred fifty thousand computers working right now on the problem of protein folding. So it is quite essential to understand a couple of different diseases.
We are not going to solve protein folding in this talk but I am going to use a slightly more simplified model. One thing to simplify is you still have a chain. But you say, "Okay, first the chain does not live in three dimensions, it only lives on the plane." And we do not have many different amino acids, we only have two: the black ones and the white ones. And the white ones repel water, the white ones like water and the black ones repel water and so they fold into something that has a black core and a white surrounding. In fact, I am going to make it even simpler. I say, "Okay, it lives on the plane but it lives on the grid". So it is a further simplification. And now for the next few slides, I even simplify this one step further: we only have the white bonds.
So that is a so-called 2D lattice self-avoiding random walk model. So you have a certain length. Say 48 bonds, 48 elements, and you have a self-avoiding walk, so this walk is not allowed to cross onto itself. And this is a very simplified model but already some questions which are interesting become very hard or actually intractable. For example, if I fix a number of elements in this walk, one question is, how many self-avoiding walks are there on the plane? Another question is, okay the number is finite, while there are many but finitely many possible combinations, how do I uniformly generate such random walk? And the third question would be, okay, I am interested in some average properties, for example, the average end-point distance between the two ends, how do I compute an approximation to that average quantity?
These are really typical problems that can be addressed with Monte Carlo methods: - average quantities, - counting problems, - random sampling problems. So that's what's going to be with us in the next few slides.
The first method is a very simple one. It's called rejection sampling and the idea is really very simple to explain. While we have this complicated set, the set of all self-avoiding random walks of a certain lengths and we want to generate one element uniformly at random from the set. This is hard. So what we do is we instead consider a super set, the set of all random walks of a certain lengths, and this is allowed to cross onto each other. And it is very easy to simulate from that set. So we just simulate from this orange set, from this larger set, and whenever we end up outside the blue set we discard that sample. And whenever we are inside the blue sample set, then we keep the sample. And because we uniformly generate samples, we can just keep doing this and collect whenever we reach an element in the blue set.
In practice this would work as follows: we start and we just keep appending in a randomly chosen direction, one out of three say, and if we happen to cross on ourselves we can already discard that sample and start over. And we would keep all the sample set that we can grow to the full lengths we want and we keep them and we maybe collect a thousand of them. And compute whatever property we want from that sample set.
Attendee 1: May I ask a question?
Speaker: Yes, sure.
Attendee 1: What happens when instead you say, "Oh Dear, I shouldn't have gone down, I should have gone-in in a different direction." Did you just get a biased sample or something?
Speaker: You are anticipating the future. We are still in 1949.
Attendee 1: But I thought this was the-- you said, right to begin with, generating the ones that don't cross themselves is hard.
Speaker: Yes and it would still be hard-- just bear with me for a few slides, this is actually where it's leading to.
Attendee 1: Okay.
Speaker: But this is a simple method. And we can, once we have generated the set of samples, we can compute average properties. For example, this squared extension there where you compute the distance in the plane between the two end points, that is a model problem that people considered. And more generally, what we would like to do is to compute expectations. So we have a distribution \(\pi\) of \(X\) over some state \(X\) and we would like to evaluate some quantity \(\phi\) of \(X\). For example, this distance between the two endpoints of a certain state and we want to compute some sum and the sum contains exponentially many terms in this case. We want to compute the sum as an expectation and average quantity. And the Monte Carlo idea is to simply replace it, approximate that huge sum with exponentially many terms with something that has only say a thousand or 10,000 terms which is the samples we generated.
When we do that, when we actually do this rejection sampling here as a function of the chain lengths. I do that and I generate here 10 million samples, 10 million times I try and I keep all the samples at length that helped me to be self-avoiding. Then I can plot this average distance and because it is an average of many terms I know that the central limit theorem applies so I can also plot confidence intervals. So I not only get the inferences that I am interested in now, I also get an estimate, a confidence interval that captures with a certain probability the two value.
Okay and it works until a chain length of thirty so already quite large chain lengths. Then the confidence intervals become larger because I get less and less samples accepted. I use 10 million attempts here but actually similar methods are very useful even for a few hundred attempts. This is a picture around that time in Los Alamos where they performed the simulation manually by drawing with this drawing device, basically, on a sheet of paper, and whenever they cross from one type of material to another type of material, they would change the wheels and roll a new random number and then move it and turn it in random directions and they do it a few hundred times and get a global picture on how the neutrons are scattered in this matter. Because everything was named MANIAC, ENIAC, etc., and this idea was from Enrico Fermi they called this device a FERMIAC.
But anyway, another thing we could do is solve the counting problem. So we can estimate the acceptance rate. We have the number of attempts that we made and the number of attempts that were accepted that happened to be self-avoiding. It gives us the acceptance rate and we can estimate the number of self-avoiding walks simply as a product of this acceptance rate with a total number of 2D walks that are not necessarily self-avoiding. That is easy to calculate as well because the first step was into right direction and we had three possibilities in each step, so we could just have a formula for that one and this gives an estimate of the number of self-avoiding walks. So here is a plot of that and in this paper I found from 2005 where people have exhaustively computed that with clever enumeration methods up to a length of 59, but beyond that the exact number is unknown. But it happens to agree very well with these known ground truths.
Attendee 1: Quick question. Is that what even with your early rejection business?
Speaker: Yes, that's the one thing.
Attendee 1: Okay.
Speaker: It's exactly with the rejection sample here. So the acceptance rate is from the rejection sample here. \(P\) is estimated from the rejection sample.
Attendee 1: What is the acceptance rate when you get to 30?
Speaker: Again, next slide here.
Speaker: I am impressed. One second. Let us first enjoy what we have achieved, let us take a look at Monte Carlo, enjoy some sunshine. So the name Monte Carlo, I mean, what first comes to mind is all the casinos, right? And the gambling and that is indeed one of the origins of the name. But the particular reason and the person who suggested this was Nicholas Metropolis, the colleague of Stanislaw Ulam, was very much amused about the stories Stan was telling about his uncle, Michael Ulam, who was a wealthy businessman in his hometown in Lviv. And then switched to the finance industry and spent the rest of his life gambling away his fortune in Monte Carlo and Nicholas Metropolis found this so amusing that he insisted the method being called, Monte Carlo method. So that is the real reason why it is called Monte Carlo method.
And it is not all sunny and that is where we come to this slide, which is the acceptance rate as a function of the chain lengths. And you see the simpler rejection sample for long enough chains. I mean, intuitively you can understand when you grow the chain very long, the probability to cross onto yourself when you walk randomly becomes higher and higher. The acceptance rate is very, very small. So I think for a million samples I had only like 15 walks accepted at the lengths of 30. And that's why the confidence intervals have been blowing up because the estimates become unreliable.
The next part is available.