
The posterior is proportional to prior times likelihood: $p(\lambda\g r) \,\propto\, e^{-\lambda}\,\frac{\lambda^{r-1}}{r!}.$ Define an ‘energy’, equal to the negative log posterior up to a constant: $E \,=\, \lambda - (r\tm1)\log\lambda.$ First find the MAP estimate, by minimizing the energy: $\pdd{E}{\lambda} = 1 - \frac{r-1}{\lambda}, \qquad \Rightarrow \lambda_\mathrm{MAP} = r-1.$ Then measure the curvature at the minimum: $\left.\pdd{^2E}{\lambda^2}\right|_{\lambda=\lambda_\mathrm{MAP}} = \frac{r-1}{\lambda^2_\mathrm{MAP}} = \frac{1}{r-1}.$ Unless $$r>1$$ we don’t have a mode at $$\lambda\!>\!0$$ to expand around sensibly.

$\Rightarrow \fbox{p(\lambda\g r) \approx \N(\lambda;\; r\tm1,\; r\tm1),~~~~~r>1.}$

The posterior over $$\ell \te \log\lambda$$ is proportional to prior times likelihood: $p(\log\lambda\g r) \,\propto\, e^{-\lambda}\,\frac{\lambda^r}{r!}.$ Define an ‘energy’, equal to the negative log posterior up to a constant: $E = \lambda - r\log\lambda = e^\ell - r\ell.$ First find the MAP estimate of $$\ell$$, by minimizing the energy: $\pdd{E}{\ell} = e^\ell - r, \quad \Rightarrow~ \ell_\mathrm{MAP} = \log r.$ We now only need $$r\!>\!0$$ for the approximation to make sense. Then measure the curvature at the minimum: $\left.\pdd{^2E}{\ell^2}\right|_{\ell=\ell_\mathrm{MAP}} = e^\ell\big|_{\ell=\ell_\mathrm{MAP}} = r.$ Writing $$\ell$$ as $$\log\lambda$$ again, we get: $\Rightarrow \fbox{p(\log\lambda\g r) \approx \N(\log\lambda;\; \log r,\; \frac{1}{r}),~~~~~r>0.}$

Which approximation is better? Putting the rate on a log scale means that the distribution is less skewed and unbounded. Both of these properties are a better match for a Gaussian approximation. To plot the distribution of $$\log\lambda$$, I put $$\lambda$$ on the $$x$$-axis, but with a log scale. The numbers are then directly comparable between the plots.

$$r\te2$$: for low counts, the Laplace approximation on $$\lambda$$ (dashed, red) puts lots of mass on negative (impossible) values. The approximation based on $$\log\lambda$$ is better, although neither can fit the true distribution (solid, blue) perfectly:

$$r\te20$$: for higher counts, the Laplace approximation on $$\lambda$$ (dashed, red) works better than before. However the approximation based on $$\log\lambda$$ is still slightly better as the true distribution (solid, blue) is less skewed in this parameterization:

The Matlab/Octave code that produced the plots is available online.