How to cast to NumPy arrays

This post is written for people like me who can never remember how to convert an array-like object back to a NumPy array. An array-like object refers to the following (not an exhaustive list):

pandas: DataFrame
tensorflow: Tensor
h5py: Dataset
dask: Array

General

If you don’t want to look up the answers on StackOverflow, just try: np.array(). There is a good chance that it will work.

import numpy as np
import pandas as pd
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
arr = np.array(df)
print(arr.__class__)
# <class 'numpy.ndarray'>

Pandas

The recommended way to convert a pd.DataFrame to np.ndarray is to_numpy().

import pandas as pd
arr = pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
print(arr.__class__)
# <class 'numpy.ndarray'>

TensorFlow

For TensorFlow v2 in the eager mode, use numpy() to convert a tf.Tensor (it must be of type EagerTensor).

import tensorflow as tf
x = tf.constant([[1, 2],
                 [3, 4]])
arr = x.numpy()
print(arr.__class__)
# <class 'numpy.ndarray'>

h5py

You can slice a h5py.Dataset with an empty tuple (i.e. ()) to get a np.ndarray. Actually, any supported slicing operations should return a np.ndarray.

import h5py
f = h5py.File('dummy.h5', 'w')
dset = f.create_dataset('dset', (10,10,10), 'f')
arr = dset[()]
print(arr.__class__)
# <class 'numpy.ndarray'>
arr = dset[:]
print(arr.__class__)
# <class 'numpy.ndarray'>

Dask

Dask arrays are lazy. To convert it into a NumPy array, a Dask array needs to be computed via the compute() method.

import dask
import dask.array as da
x = da.ones(10, chunks=(5,))
arr = x.compute()
print(arr.__class__)
# <class 'numpy.ndarray'>
y = da.ones(10, chunks=(5,))
arr0, arr1 = dask.compute(x, y)
print(arr0.__class__, arr1.__class__)
# <class 'numpy.ndarray'> <class 'numpy.ndarray'>

As a footnote, to convert a scalar np.ndarray back to a built-in Python object, use item(). For instance, np.array([1]).item() will return 1.