Muon pT resolution

Posted on Jul 22, 2019 in math | Tagged muon, track fitting

The muon $p_{\mathrm{T}}$ is measured from the curvature of its trajectory in the magnetic field. The curvature is proportional to $1/p_{\mathrm{T}}$. So, the error in the curvature measurement is related to the $p_{\mathrm{T}}$ resolution:

\[\begin{align} \Delta\left(\frac{1}{p}\right) &= -\frac{\Delta p}{p^2} \\ \frac{\Delta p}{p} &= -\left[\Delta\left(\frac{1}{p}\right)\right] \cdot p \\ &= k(p) \cdot p \end{align}\]

If the error in the curvature measurement is independent of $p_{\mathrm{T}}$, i.e. $k(p) = k$, then we find that the fractional $p_{\mathrm{T}}$ resolution is proportional to $p_{\mathrm{T}}$:

\[\frac{\Delta p}{p} \propto p\]

Softplus and softminus

Posted on Jul 11, 2019 in math | Tagged machine learning, neural network, python

The softplus function is a smooth approximation to the ReLU activation function, and is sometimes used in the neural networks in place of ReLU.

\[\operatorname{softplus}(x) = \log(1 + e^{x})\]

It is actually closely related to the sigmoid function. As $x \to -\infty$, the two functions become identical.

\[\operatorname{sigmoid}(x) = \frac{1}{1 + e^{-x}}\]

The softplus function also has a relatively unknown sibling, called softminus.

\[\operatorname{softminus}(x) = x - \operatorname{softplus}(x)\]

As $x \to +\infty$, it becomes identical to $\operatorname{sigmoid}(x) - 1$. In the following plots, you can clearly see the similarities between softplus & softminus and sigmoid.

Furthermore, there is also an inverse softplus function that does the transformation $x = \operatorname{softplusinv}(\operatorname{softplus}(x))$.

\[\operatorname{softplusinv}(x) = \log(e^{x} - 1)\]

Using $\operatorname{softplusinv}(x)$ as an additive constant allows you to adjust the $y$-intercept of the softplus function. For instance, $\operatorname{softplus}(x + \operatorname{softplusinv}(1))$ returns a function with $y$-intercept = 1.

The inverse of the sigmoid function is called logit. As such, $x = \operatorname{logit}(\operatorname{sigmoid}(x))$.

\[\operatorname{logit}(p) = \log\left(\frac{p}{1-p}\right)\]

As these functions involve $\exp(x)$ and $\log(x)$, sometimes you might run into numerical stability issues. For instance, this happens to $\exp(x)$ when $x$ is too large; for $\log(x)$ when $x$ is close to zero. The following are the safer expressions of softplus and softminus that should help avoid those issues.

\[\operatorname{softplus}(x) = \max(0, x) + \log(1 + e^{-|x|})\] \[\operatorname{softminus}(x) = \min(0, x) - \log(1 + e^{-|x|})\]

While for the sigmoid function, you can simply call the hyperbolic tangent function, because $\tanh(x)$ is just a scaled $\operatorname{sigmoid}(x)$.

\[\operatorname{sigmoid}(x) = \frac{1}{2} \left[1 + \tanh\left(\frac{x}{2}\right)\right]\]

As a reminder, $\tanh(x)$ is defined as:

\[\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} = \frac{1 - e^{-2x}}{1 + e^{-2x}}\]

All these functions are easily written with NumPy.

import numpy as np

softplus = lambda x: np.log1p(np.exp(x))

softminus = lambda x: x - softplus(x)

sigmoid = lambda x: 1 / (1 + np.exp(-x))

one_minus_sigmoid = lambda x: 1 / (1 + np.exp(x))

logit = lambda x: np.log(x) - np.log1p(-x)

softplusinv = lambda x: np.log(np.expm1(x))

softminusinv = lambda x: x - np.log(-np.expm1(x))

safe_softplus = lambda x: x * (x >= 0) + np.log1p(np.exp(-np.abs(x)))

safe_softminus = lambda x: x * (x < 0) - np.log1p(np.exp(-np.abs(x)))

safe_sigmoid = lambda x: 0.5 * (1 + np.tanh(0.5 * x))

safe_one_minus_sigmoid = lambda x: 0.5 * (1 + np.tanh(0.5 * -x))

Softplus is also used to compute the log probabilities used in the binary cross-entropy loss function.

\[\operatorname{log prob}_{1}(x) = \log(p(x)) = -\operatorname{softplus}(-x)\] \[\operatorname{log prob}_{0}(x) = \log(1-p(x)) = -\operatorname{softplus}(x)\]

The subscripts “0” and “1” are the class labels, and $p(x)$ is the probability of being class “1”. Substitute $p(x) = \operatorname{sigmoid}(x)$ to get the above results. The following plot shows the log prob curves.

TensorFlow in CMSSW

Posted on Jul 3, 2019 in notes | Tagged CMS, CMSSW, cpp, machine learning, tensorflow

Recently I had to make a neural net run in CMSSW. Apparently this is possible in CMSSW 10_X_Y thanks to the interface to TensorFlow C++ library that has been implemented in PhysicsTools/TensorFlow. I converted the NN into a TensorFlow “constant graph” and loaded it into the CMSSW environment and ran NN inference. It worked!

How to use the CMSSW TensorFlow interface is described in these slides. Documentation can be found in this repo: https://gitlab.cern.ch/mrieger/CMSSW-DNN (mirror: https://github.com/riga/CMSSW-DNN).

CRAB: Rerun failed jobs

Posted on Apr 19, 2019 in howto | Tagged CMS, CMSSW, crab, grid computing

Sometimes when you submit 1,000 jobs for a particular dataset, and get back 999 jobs successfully except 1. No matter how many times you resubmit the failed job, it always fails. What can you do? crab preparelocal comes to rescue! See here for documentation.

Basically you can call crab preparelocal -d PROJDIR, which creates a subdirectory called local under PROJDIR. Go in there and call sh run_jobs JOBID to run the job locally, and try to debug it.

After the rerun, you can collect the job outputs by doing tar czf cmsRun.log.tar.gz cmsRun-stdout.log cmsRun-stderr.log FrameworkJobReport.xml.

Bayesian statistics

Posted on Mar 29, 2019 in math | Tagged probability, statistics

This is just a note to self.

From the famous Bayes’s theorem:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]

$P(A|B)$ is called the posterior probability.
$P(B|A)$ is called the likelihood.
$P(A)$ is called the prior probability.
$P(B)$ is called the marginal likelihood.

Maximum likelihood estimation is based on maximizing $\mathcal{L} = P(B|A)$, or equivalently, minimizing $-\log \mathcal{L}$. Maximum a posteriori (MAP) estimation is based on minimizing $-\log \mathcal{P} = -\log P(A|B)$, including the prior during the minimization.