## Homework 1

**Monte Carlo method** is a method to compute the expectation of a random variable.

The basic steps follow like this:

draw random samples ${x_1, x_2, \cdots, x_n}$ according to $p(x)$.

derive the

*monte carlo estimator*of $E(f)$ from

$$

\hat{f}*{\text{mc}}=\frac{1}{n}\sum*{i=1}^n f(x_i)

$$

### Problem 1

Suppose $x_1, x

2, \cdots, x{10}$ satisfy independent identical distributions i.e. $x_i\sim N(0,1)\quad i=1,2,\cdots,10$Set $f(x)=\sum_{i=1}^{10}x_i^2$, use

Monte Carlo methodto compute $E[f(x)]$.Compare your results with the exact value.

**Solution**:

From basic statistics, we can deduce that

$$

\begin{align*}Ef(x)&=E\sum_{i=1}^{10}x_i^2=10\,Ex_i^2\&=10\,[Dx_i+(Ex_i)^2]\&=10\times(1+0)\&=10\end{align*}

$$

Thus, the exact value of $E[f(x)]$ equals 10.

Continuely, we derive a Monte Carlo method for evaluating $E[f(x)]$

1 | using Distributions |

1 | 9.98580160514502 |

A billion Monte Carlo steps yields an even better estimate:

1 | MC_mean(10^9) |

1 | 10.000288570895224 |

Now, let’s take a brief look at the time cost.

1 | 10^6) MC_mean( |

1 | 0.013395715 |

1 | 10^9) MC_mean( |

1 | 50.939757386 |

When it comes to Monte Carlo simulation, our goal is to get good data as fast as possible. For now, let’s look at how to speed up our code. First of all, we can easily notice the time cost grows out of all proportion. The calculation for 1 million Monte Carlo cost is over 1000 times faster than the billion’s.

1 | (10^9))/( MC_mean(10^6)) MC_mean( |

```
1286.3957285556933
```

Of course, we can do better than this. In preparation for these calculations, let us introduce a **parallel computation** for Monte Carlo simulation. We build a function that takes `MC_mean`

and evenly splits the jobs to our processors:

1 | function parallel_calc(f::Function, N::Int, proc::Int=4) |

```
parallel_calc (generic function with 2 methods)
```

Now, we can quickly and efficiently calculate a billion MC samples:

1 | 10^9) parallel_calc(MC_mean, |

```
21.05725137
```

There is also an easier way to see the speed up of parallel processing by plotting the elapsed time. Below, we define a simple code to plot the elapsed times of `MC_mean`

(in blue) and parallel (in red) vs. the number of MC samplings:

1 | using PyPlot |

```
mean_time_plotter (generic function with 1 method)
```

1 | mean_time_plotter(10^9) |

```
PyObject <matplotlib.legend.Legend object at 0x132777510>
```

We can see that the elapsed time of simulation is approximately equal around 0.7 billion MC samplings, after which parallel processing begins to win in favor of the single processor. This trend is continued when we look at a billion MC samples.

Now, we turn to present this MC data and visualize the error. To do this, let’s write a simple program to plot the output of parallel vs. the number of Monte Carlo samplings. Because we already know what the mean is, let’s also plot the accepted value as a straight line in red. Finally, let’s also calculate the percent error and display it in the title. The format used here will be useful in later code where we have to compare with some exact analytical value. The following tedious codes (if I remember to present) are quite similar to each other.

However, one thing unfortunate is that the parallel efficiency does **not** high within one billion samples. In proper case, for example, we need fast result, single processor is fairly enough.

1 | function MC_mean_plotter(f::Function, N::Int, m::Int) |

```
MC_mean_plotter (generic function with 1 method)
```

To get a nice plot, we take a maximum of a million Monte Carlo samples and plot a total of a hundred points:

1 | MC_mean_plotter(MC_mean, 10^9, 100) |

```
PyObject <matplotlib.legend.Legend object at 0x1326e2cd0>
```

Although we appear to obtain a reasonable value for 10(below 0.00421%!), it is difficult to see how well the data is converging. Therefore, let’s build a new function that calculates a list of $M$ Monte Carlo estimates of 10 at a given value of $N$. From this list, we can obtain the average and standard deviation:

1 | function stat_calc(f::Function, N::Int, M::Int) |

```
stat_calc (generic function with 1 method)
```

Now, we can modify `MC_mean_plotter`

to include error bars to better visualize the convergence of our data:

1 | function MC_errorbar(f::Function, N::Int, m::Int, M::Int) |

```
MC_errorbar (generic function with 1 method)
```

With `MC_mean_plot`

, we can easily generalize the program in future modification.

Now, we plot a hundred values of MC samples with a maximum of a billion, with each set of samples performed fifteen times for statistical analysis. The resulting data shows a clear convergence of error.

1 | MC_errorbar(MC_mean, 10^6, 100, 15) |

```
PyObject <matplotlib.legend.Legend object at 0x1262c3fd0>
```

We can test our result by looking at the convergence of the standard deviation with increasing sample size. Recall the Monte Carlo estimation of the integral of some general function $f(x)$

$$

\int*a^b f(x)dx = \frac{b-a}{N}\sum*{i=0}^{N-1}E(f(x_i))

$$

The st.dev. squared can easily be calculated:

$$

\sigma*{int}^2 = \frac{b-a}{N^2}\left( \sum*{i=0}^{N-1}E(f(x_i)) \right)

$$

If the random variables $x_i$ are uncorrelated, we can easily simplify the above further:

$$

\begin{split}

\sigma*{int}^2 &= \frac{b-a}{N^2}\sum*{i=0}^{N-1}\sigma^2(E(f(x_i))) \

&= \frac{b-a}{N}\sigma^2(E(f(x)))

\end{split}

$$

Therefore, we find that the st.dev. goes as the inverse of the square of the sample size:

$$

\sigma_{int}\propto\frac{1}{\sqrt{N}}

$$

To test this result, we plot the numerically calculated st.dev. alongside $1/\sqrt{N}$

1 | function MC_errorbar_visual(f::Function, N::Int, m::Int, M::Int) |

```
MC_errorbar_visual (generic function with 1 method)
```

Running a maximum of a million samples in steps of a hundred, we find roughly the expected behavior, with each sample repeated fifteen times for statistical analysis.

1 | MC_errorbar_visual(MC_mean, 10^6, 100, 15) |

### Problem 2

Here we want to estimate the probability of a point location, i.e. $P[x\in\Omega]$

we have already learned that

$$

P[x\in\Omega] = \int_{\Omega}p(x)\,dx=\int I(x)p(x)\,dx = E_p[I]

$$

Now, set $\Omega: (x_1-5)^2+(x_2-5)^2\leq 1 $ , $p(x)$ to be the PDF (probability density function) of a binormal distribution i.e. $x_1, x_2 \sim N(0,1)$

Use Importance Sampling method (

choose best IS distribution$q(x)$) to get the best result.

The **importance sampling method** comes from a single idea like this

$$

\begin{align*}P[x\in\Omega]&= E_p[I]\&= \int I(x)p(x)\,dx \&= \int I(x)\frac{p(x)}{q(x)}\cdot q(x)\,dx = \int I(x)w(x)\cdot q(x)\,dx\&= E_q[I\cdot w]\end{align*}

$$

Thus

$$

\hat{I}*\text{IS} = \frac1n \sum*{i=1}^n I(x_i)w(x_i) \qquad x_i\in q(x)

$$

where

$\hat{I}_\text{IS}: $ IS estimator ; $q(x): $ IS distribution ; $w(x) :$ IS weight ;

Assume that $q(x)$ is the PDF of a binormal distribution i.e. $q\sim N({\mu_1,\mu_2},1)$

the problem should be solve in following steps:

- choose samples according to $q(x)$ ;
- use these samples to get the value of weight function $w(x)$ ;
- apply monte carlo method to the calculation of $E[I\cdot w]$ ;
- compare our result with the exact probability and filter the parameters ;

To get the best result, the fourth step is always needed.

Actually, we can directly calculate the exact probability

1 | p=Integrate[ |

which gives

1 | 2.23465*10^-10 |

1 | using Distributions |

A natural thought is to set $\mu1=\mu2=5$, $N=10^5$, but the simulated probability seems to be flawed:

1 | MC_prob(5.0, 5.0, 10^6) |

```
2.2415303094407912e-10
```

Indicated by the parameter choosing rules, we can set $\mu1=\mu2$ and make a simple plot of errors on the specific line $y=x$.

1 | using PyPlot |

```
error_plotter (generic function with 1 method)
```

1 | error_plotter(10^6) |

1 | PyObject <matplotlib.legend.Legend object at 0x123259fd0> |

As shown in the plot, $\mu1=\mu2=3.9$ is the best choice, though it is not always perfect.

For fixed parameters, we can do parallel calculation by setting a new function:

1 | function fixed_MC_prob(N::Int) |

```
fixed_MC_prob (generic function with 1 method)
```

1 | fixed_MC_prob(10^6) |

1 | 2.3208602460224826e-10 |

Recall the `parallel_calc`

function before, it will be much more convenient to get the result in large number simulations by running:

1 | parallel_calc(fixed_MC_prob, 10^9) |

Also, we can obtain the elapsed time in the similar way with `mean_time_plotter`

. (and not explained here)

## Homework 2

### Problem 3

Redo Problem 2 by Cross Entropy method. Find the best distribution $q(x)$ to get the persuasive result.

**CE Method for rare probability estimation**

Consider a rare probability

$$

P(x\in\Omega) \ll 1 \quad\Omega={x\mid g(x)>r }

$$

While applying CE method to solve the optimization problem

$$

\max*{q’\in Q}\frac1n \sum*{i=1}^nf(x_i)\ln q’(x_i)u

$$

we can easily find that $x_i\sim q\approx p$

Thus, $I(x_i)=0$ almost for all $x_i$. Obviously, it does nothing to our work.

Pratically, we usually use the variant of CE method i.e.

**Multilevel / Multistage CE Method**

Denote

$$

I_r(x)=\begin{cases}

1 & g(x)>r\

0 & g(x)<r

\end{cases}

$$

First we do sampling work, then choose $r$ according to the samples. Thus, we can fetch the specific number of $g(x)$ satisfying $g(x)< r$

$$

g(x_1)\leq g(x_2)\cdots\leq g(x_n)

$$

Set $0< \rho\leq 1$ to be the confidence coefficient, define that the $(1-\rho)$-th quantile of a r.v. $y$ is the $r$ satisfying that

$$

P[y\geq r] \geq \rho, \quad P[y\leq r]\leq 1-\rho

$$

Hence, we can derive an estimation of $r$

$$

r=y[(1-\rho)N]

$$

if ${y_i}$ is ordered assendingly $y_i=g(x_i)$

The basic algorithm goes like below:

**Algorithom**

- let $q=p$
- draw samples from $q:{ x_1,\cdot,x_n }$
- compute $\hat{r} = (1-\rho)$-th quantile of ${g(x
*i)}*{i=1}^n$ if $\hat{r}>r$, let $\hat{r}=r$ - solve $\max
*{q’\in Q}\frac1n \sum*{i=1}^nI_{\hat{r}}(x_i)\ln q’(x_i)w(x_i)$; let $q=q’$ - if $\hat{r}<r$, return to step 2, otherwise proceed to step 6
- estimate $\hat{P}=\frac1n \sum
*{i=1}^n I*{r}(x_i)w(x_i)$ with $x_i\sim q(x)$

As for Problem 2, the corresponding functions and parameters should be:

$$

p\sim N(0,I_2) \quad g(x)=-(x-5)^2-(y-5)^2 \quad r=-1

$$

For fixed variance $\Sigma=I_2$, we can directly give the optimal mean value in step 4 as taught in lecture 2:

$$

\mu=\frac{E_q[xIw]}{E_q[Iw]}

$$

Hence, it is sufficient enough to give an implement:

1 | function CE_prob(ρ::Float64, N::Int) |

Let’s make single simulation by setting $\rho = 0.2$ and $N=10^6$, the probability is shown below.

1 | CE_prob(0.2,10^6) |

1 | 2.9121974873535745e-10 |

The result is fair enough, but may raise the question in parameter choice of $\rho$. Now, let’s consider the quantile effect by program below:

1 | using PyPlot |

```
quantile_effect (generic function with 1 method)
```

1 | quantile_effect(10^6) |

```
PyObject <matplotlib.legend.Legend object at 0x136c6e050>
```

As shown in the picture, it is resonable for us to choose $\rho$ in $[0,0.5]$. For clarity, here we set $\rho=0.2$

Rewrite the code in `CE_prob`

by appending $\mu1, \mu2$ to the return command, we will obtain the corresponding distribution and probability:

1 | using Distributions |

1 | CE(10^6) |

```
(4.426701996662324, 4.423183271573601, 2.8762783318578916e-10)
```

## Reference

- Blogs of Monte Carlo work in Github: http://onlookerliu.leanote.com/post/5c355c764900
- Gonçalo, Abecasis. “Monte Carlo Integration”, Biostatistics 615/815, Lecture 22: http://csg.sph.umich.edu/abecasis/class/2006/615.22.pdf
- Ho, Shirley. “Introduction to Monte Carlo”, Astro 542 at Princeton University: http://www.phys.ubbcluj.ro/~zneda/edu/mc/mcshort.pdf
- Julia language guide: http://julia-cn.readthedocs.io/zh_CN/latest/

*Kai Liu, wrote on 2017/10/28*

*Student ID: 116071910015*

*Mail: liu1995@sjtu.edu.cn*