There are a number of “Don’t do this” when working with the FNAL LPC EOS disk. They are described here. For example, don’t merge root files that are on EOS, because the EOS disk is mounted via FUSE, so it can cause trouble if there are heavy I/O. Instead, one should use the dedicated EOS or Xrootd commands.

Recently I had to merge root files (using hadd) in multiple directories on EOS, and it turned out to be not so straight forward using the EOS or Xrootd commands. So I had to do some python, listed below.

#!/usr/bin/env python

directories = [
]

outfile = '/tmp/jiafu/ntuple.root'

def call_cmd(cmd):
import shlex, subprocess
p = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE)
return lines

def list_input_files(directories):
all_lines = []
for directory in directories:
cmd = 'xrdfs root://cmseos.fnal.gov ls -u {0}'.format(directory)
lines = call_cmd(cmd)
lines = [line for line in lines if line.endswith('.root')]
all_lines += lines
return ' '.join(all_lines)

# Main
if __name__ == '__main__':
infiles = list_input_files(directories)
cmd = 'hadd -f {0} {1}'.format(outfile, infiles)
lines = call_cmd(cmd)
#print '\n'.join(lines)


The CRAB project directory is the directory that is created when you make a new CRAB project (i.e. when you do crab submit). Sometimes you might have removed the project directory too quickly, before you realize that you want to resubmit some of the jobs. But without the project directory, you cannot call crab resubmit.

If you know the “task name”, which looks like YYMMDD_HHMMSS:request_name, then it’s possible to recreate the project directory. The timestamp is the time when you call crab submit, whereas the request_name is config.General.requestName from your crab.py. If you don’t remember the task name, you can always check the Task Monitoring dashboard to find out.

First, make an empty directory to be used as the CRAB project directory:

mkdir PROJDIR


Then, do the following in python:

from CRABClient.UserUtilities import config
from CRABClient.ClientUtilities import createCache

requestarea = PROJDIR

host = 'cmsweb.cern.ch'
port = ''
voRole = ''
voGroup = ''
instance = 'prod'
originalConfig = config()
createCache(requestarea, host, port, uniquerequestname, voRole, voGroup, instance, originalConfig)


Please replace PROJDIR and TASKNAME in the above with the project directory and the task name.

In Tensorflow, the binary cross entropy loss function is implemented in a way to ensure stability and avoid overflow. The formulation can be found in the official doc. But it’s not very easy to follow when it’s written in pseudo-code. So I decided to type it in TeX (replacing the notation $z$ by $y$).

The logistic loss is

\begin{align*} \mathcal{L} &= - y \log(p) - (1 - y) \log(1-p) \\ &= - y \log(\operatorname{sigmoid}(x)) - (1 - y) \log(1-\operatorname{sigmoid}(x)) \\ &= - y \log \left(\frac{1}{1+e^{-x}} \right) - (1 - y) \log \left(1-\frac{1}{1+e^{-x}} \right) \\ &= - y \log \left(\frac{1}{1+e^{-x}} \right) - (1 - y) \log \left(\frac{e^{-x}}{1+e^{-x}} \right) \\ &= y \log({1+e^{-x}}) + (1 - y)\left[- \log(e^{-x}) + \log({1+e^{-x}}) \right] \\ &= y \log({1+e^{-x}}) + (1 - y)\left[x + \log({1+e^{-x}}) \right] \\ &= (1 - y)(x) + \log({1+e^{-x}}) \\ &= x - x \times y + \log({1+e^{-x}}) \end{align*}

For $x < 0$, to avoid overflow in $e^{-x}$, we reformulate the above

\begin{align*} \mathcal{L} &= x - x \times y + \log({1+e^{-x}}) \\ &= \log(e^{x}) - x \times y + \log({1+e^{-x}}) \\ &= - x \times y + \log(e^{x} \times ({1+e^{-x}})) \\ &= - x \times y + \log(1 + e^{x}) \end{align*}

Hence, to ensure stability and avoid overflow, the implementation uses this equivalent formulation

\begin{align*} \mathcal{L} &= \max(x,0) - x \times y + \log({1+e^{-|x|}}) \\ &= \operatorname{ReLU(x)} - x \times y + \log({1+e^{-|x|}}) \end{align*}

(To be more clear, the last formulation is used to combine $x - x \times y + \log({1+e^{-x}})$ when $x \geq 0$ and $- x \times y + \log(1 + e^{x})$ when $x < 0$).

The following links provide very useful tips to help speed up your Python codes, some are even useful beyond Python:

I ran into a strange issue related to Python virtualenv and pip in CMSSW_9_3_X. Python version 2.7.11 and Virtualenv version 15.1.0. Doing the following will cause an error:

virtualenv venv
source venv/bin/activate
pip install -U pip


The error message reads:

Traceback (most recent call last):
File "/tmp/venv/bin/pip", line 7, in <module>
from pip._internal import main
ImportError: No module named _internal


Apparently it is due to the environment variable $PYTHONPATH not set properly. I fixed it by patching the file venv/bin/activate. Here’s the patch file: diff --git a/venv/bin/activate b/venv/bin/activate index 03fa903..c104cf0 100644 --- a/venv/bin/activate +++ b/venv/bin/activate @@ -11,6 +11,11 @@ deactivate () { export PATH unset _OLD_VIRTUAL_PATH fi + if ! [ -z "${_OLD_PYTHONPATH+_}" ] ; then
+        PYTHONPATH="$_OLD_PYTHONPATH" + export PYTHONPATH + unset _OLD_PYTHONPATH + fi if ! [ -z "${_OLD_VIRTUAL_PYTHONHOME+_}" ] ; then
PYTHONHOME="$_OLD_VIRTUAL_PYTHONHOME" export PYTHONHOME @@ -47,6 +52,10 @@ _OLD_VIRTUAL_PATH="$PATH"
PATH="$VIRTUAL_ENV/bin:$PATH"
export PATH

+_OLD_PYTHONPATH="$PYTHONPATH" +PYTHONPATH="$VIRTUAL_ENV/lib/python2.7/site-packages:$PYTHONPATH" +export PYTHONPATH + # unset PYTHONHOME if set if ! [ -z "${PYTHONHOME+_}" ] ; then
_OLD_VIRTUAL_PYTHONHOME="\$PYTHONHOME"


To apply, download it as mypatch.txt in the same directory where virtualenv venv was called. Then do:

patch -p1 < mypatch.txt


Now pip install -U pip should work.