Major changes in PyROOT

PyROOT is the Python binding of the CERN ROOT library. Apparently, major changes have been introduced for the ROOT v6.22 release — see the Release Note and blog post, using modern technology such as cling and cppyy. It also adds the ability to build PyROOT for both Python 2 and 3 (see manual). A lot of old hacks for PyROOT might become obsolete at this point. But I believe the modernization of PyROOT is very much desired.


C++ enable_if via return type

I found SFINAE or “Substitution Failure Is Not An Error” quite fascinating. At first, it looked kind of cryptic to me (what do all these “typenames” mean?), so I tended to avoid it. But when used right, std::enable_if (which leverages SFINAE) really helps simplifying the code. So I started to depend on it.

Recently I wrote a function based on the example provided by Cppreference std::void_t article. Basically, I wanted to reset a variable, which can be either a scalar type (int, float, etc) or a container. If it is a container, I wanted to call Container::clear(). Otherwise, I can simply set it to zero.

template <typename T, typename = void>
struct is_clearable : std::false_type {};

template <typename T>
struct is_clearable<T, std::void_t<decltype(std::declval<T>().clear())> > : std::true_type {};

template <typename T>
inline constexpr bool is_clearable_v = is_clearable<T>::value;

template <typename T>
typename std::enable_if_t<is_clearable_v<T> > reset(T* t) {  // #1
    t->clear();
}

template <typename T>
typename std::enable_if_t<!is_clearable_v<T> > reset(T* t) {  // #2
    *t = 0;
}

In my example, std::void_t is used to detect whether T has the member function clear(). is_clearable<T>::value yields true or false based on the result. is_clearable_v<T> is defined as a compile-time boolean constant that takes the is_clearable<T>::value.

Then, the reset(T* t) function is defined separately for the two cases. The first version is enabled (via the return type) when T has the member function clear(); the second version is enabled when T does not.

It turns out to work as advertised for me. But to apply this enable_if idiom, one would have to figure out what any of these (decltype, declval, constexpr, void_t, enable_if) are, and I think that’s not trivial without the help of some good examples.


Simple multiprocessing queue in Python

This is a very simple version of how to work with multiprocessing queue that I wrote while learning. There are two multiprocessing Queues task_queue and done_queue that are used to submit and receive the tasks. Typically we should tell the Processes to start() and join(). But I use sentinels to mark the end of task_queue so I do not have to call join(). For the done_queue, I use the fact that I know the exact number of items to get(). Usually, if we know the exact num of items, it’s better to use a multiprocessing Pool. But I use queues since I’m interested to implement the worker as an iterator (which does not assume the num of items).

import multiprocessing

class Sequence(object):
    """
    A simple sequence that iterates over files obtained from a queue.
    """
    SENTINEL = None

    def __init__(self, files):
        self._files = files

    def __iter__(self):
        while True:
            filename = self._next_file()
            if filename is None:
                break
            yield filename

    def _next_file(self):
        filename = self._files.get()
        if filename == self.SENTINEL:
            return None
        return filename

def worker(task_queue, done_queue):
    seq = Sequence(task_queue)
    for x in seq:
        done_queue.put(x)

# Main
if __name__ == '__main__':
    num_workers = 4
    num_entries = 1000

    task_queue = multiprocessing.Queue()
    done_queue = multiprocessing.Queue()

    for _ in range(num_workers):
        multiprocessing.Process(
            target=worker, args=(task_queue, done_queue)).start()

    # task_queue is supposed to take filenames, but for the purposes
    # of this exercise, it is easier to do integers
    for i in range(num_entries):
        task_queue.put(i)

    for _ in range(num_workers):
        task_queue.put(Sequence.SENTINEL)

    result = []
    for _ in range(num_entries):
        result.append(done_queue.get())

    print('Done: {0}/{1} entries'.format(len(result), num_entries))

    # Sanity check
    assert sum(result) == sum(range(num_entries))

C++ stringizing and token-pasting

Macro expansion is an important thing to know when trying to do metaprogramming in C++. Specifically, the stringizing (#) and token-pasting (##) operators. They are also explained in this Cppreference article.

If the argument(s) used in the stringizing and token-pasting operators is a macro, then two levels of macro expansion are needed.

#define STRINGIFY_DETAIL(x) #x
#define STRINGIFY(x) STRINGIFY_DETAIL(x)

#define PASTER(x,y) x ## y
#define EVALUATOR(x,y) PASTER(x,y)

Check out this StackOverflow answer to understand how it works.


Useful paths in /cvmfs/cms.cern.ch

CernVM-FS (or CVMFS) is developed by CERN and used in various HEP experiments for software distribution. CMSSW, along with its dependencies, is distributed via CVMFS in the namespace /cvmfs/cms.cern.ch on the CMS Tier-1, 2, and 3 machines. There are a few special paths that are useful to know.

  • To source the environment setup script:
source /cvmfs/cms.cern.ch/cmsset_default.sh
  • To find the available $SCRAM_ARCH environment variables:
ls -d /cvmfs/cms.cern.ch/slc*
  • To list the available CMSSW releases for a given $SCRAM_ARCH:
export SCRAM_ARCH=slc7_amd64_gcc900
scram list CMSSW
# or:
#   ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/
  • To setup a particular CMSSW release:
cmsrel CMSSW_11_3_0_pre6
cd CMSSW_11_3_0_pre6/src
cmsenv
# or:
#   scramv1 project CMSSW CMSSW_11_3_0_pre6
#   cd CMSSW_11_3_0_pre6/src
#   eval `scramv1 runtime -sh`
  • To find the source code for a particular CMSSW release:
ls $CMSSW_RELEASE_BASE/src
# or:
#   ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_3_0_pre6/src
  • To find the C++ header files from external libraries (e.g. GCC) used by a particular CMSSW release:
    • Identify the XML config file that belongs to the library under $CMSSW_RELEASE_BASE/config/toolbox/$SCRAM_ARCH/tools/selected/;
    • Figure out the path from the XML config file.
cat $CMSSW_RELEASE_BASE/config/toolbox/$SCRAM_ARCH/tools/selected/gcc-cxxcompiler.xml
# Found the path
ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/
# Navigate `include`
ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/include/
# Found the header files
ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/include/c++/9.3.0/
  • To find the Python packages (e.g. NumPy) used by a particular CMSSW release:
    • Identify the XML config file that belongs to the library under $CMSSW_RELEASE_BASE/config/toolbox/$SCRAM_ARCH/tools/selected/;
    • Figure out the path from the XML config file.
cat $CMSSW_RELEASE_BASE/config/toolbox/$SCRAM_ARCH/tools/selected/py3-numpy.xml
# Found the path
ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-numpy/1.17.5-ljfedo2/
# Navigate `lib` or `lib64`
ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-numpy/1.17.5-ljfedo2/lib/
# Found the source files
ls /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/py3-numpy/1.17.5-ljfedo2/lib/python3.8/site-packages/numpy/