HLS Arbitrary Precision data types

Posted on Aug 2, 2020 in notes | Tagged firmware, fpga, hls, programming, xilinx

Xilinx provides an arbitrary precision (AP) data types library for use in Vivado HLS projects:

https://github.com/Xilinx/HLS_arbitrary_Precision_Types (C++ header-only library)

It allows the specification of any number of bits for data types, beyond what is provided by the standard C++ data types:

char (8-bit integer)
short (16-bit integer)
int (32-bit integer)
long (32-bit integer)
long long (64-bit integer)

The number of bits of the data types can affect the resource usage. For instance, a DSP48 multiplier is 18-bit. If the data width is more than 18 bits, multiple DSP48s are required. Examples of how to define arbitrary precision integers:

ap_int<5> (5-bit signed integer)
ap_uint<65> (65-bit unsigned integer)

The library also supports fixed-point data types. Examples of how to define them (the two template arguments denote the total num of bits and the num of integer bits; the difference being the num of fractional bits):

ap_fixed<11, 6> (11-bit signed word, 6 integer bits, 5 fractional bits)
ap_ufixed<12, 11> (12-bit unsigned word, 11 integer bits, 1 fractional bit)

The bit widths can be accessed at compile time by ap_[u]int<W>::width and by ap_[u]fixed<W,I>::width and ap_[u]fixed<W,I>::iwidth.

When assigning a value from a narrower word to a wider one, the value is sign-extended if the source variable is signed; the value is zero-extended if the source variable is unsigned. When assigning a value from a wider word to a narrower one, the bits beyond the most significant bit (MSB) of the destination variable are truncated. It doesn’t matter if the destination variable is signed or unsigned.

In addition, the library also provides useful bit manipulation methods such as:

length() returns the number of bits.
sign() returns true if negative; false if positive.
operator [] (int bit) returns the specified bit. The least significant bit (LSB) has index 0, the most significant bit (MSB) has index W - 1.
range(unsigned Hi, unsigned Lo) or operator () (unsigned Hi, unsigned Lo) returns the value represented by the specified range of bits. If Hi has a value less than Lo, the bits are returned in reverse order.
test(unsigned i) returns true if the specified bit is 1; false otherwise.
set(unsigned i, bool v) sets the specified bit to the boolean value.
set(unsigned i) sets the specified bit to the value 1.
clear(unsigned i) sets the specified bit to the value 0.
invert(unsigned i) inverts/toggles the specified bit.

(all of the above work for ap_[u]int types but not necessarily for ap_[u]fixed types).

The full reference guide for how to use the AP data types is provided at:

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug902-vivado-high-level-synthesis.pdf

Install TensorFlow 2 with Miniconda

Posted on Jul 23, 2020 in howto | Tagged anaconda, centos, conda, linux, machine learning, python, rhel, tensorflow

The official TensorFlow installation page no longer features instructions on how to install it with Anaconda (or Miniconda). But Anaconda still provides the instructions.

First, if Anaconda/Miniconda has not been installed yet, select the installer (see the list), and run it according to the Linux-specifc instructions. Then, simply follow the instructions provided by Anaconda to install TensorFlow. Note that I’m using Scientific Linux release 7.8 (Nitrogen).

The following is a quick recipe (using Python 3.6):

# Install Miniconda into ~/miniconda
wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda

miniconda/bin/conda init

# Create TensorFlow env
conda create -n tf tensorflow python=3.6

# Activate TensorFlow env
conda activate tf

# Try it
python -c "import tensorflow as tf; print(tf.__version__)"
# -> 2.2.0

If GPU support is needed, you should do this instead:

conda create -n tf-gpu tensorflow-gpu python=3.6
conda activate tf-gpu

Note that the command miniconda/bin/conda init will make modifications to your ~/.bashrc. If you prefer to modify it manually, skip the command and add these lines to your ~/.bashrc:

if [ -f "$HOME/miniconda/etc/profile.d/conda.sh" ]; then
    . "$HOME/miniconda/etc/profile.d/conda.sh"
else
    export PATH="$HOME/miniconda/bin:$PATH"
fi

To deactivate conda:

conda deactivate

To uninstall the TensorFlow environment (see Managing environments):

conda remove --name tf --all

To update conda:

conda update -n base conda

The conda equivalent of pip freeze > requirements.txt:

conda env export > environment.yml

And the conda equivalent of pip install -r requirements.txt:

conda env create -f environment.yml

Create gh-pages branch in existing repo

Posted on Jul 9, 2020 in howto | Tagged github, github pages, static sites

It’s easy to serve a website using GitHub Pages by creating the gh-pages branch in a GitHub repo. The instructions can be found here.

In my case, I have an existing repository that has some stuff. I want to use GitHub Pages to serve some .md files, but I don’t want to include the stuff from my master branch. What I had to do was:

Create/checkout an orphan gh-pages branch.
- An orphan branch is not connected to the other branches and commits, and its working tree has no files at all. See here for more info.
Commit .md files to the branch.

To create the orphan gh-pages branch (based on instructions from Hugo):

git checkout --orphan gh-pages
git reset --hard
git commit --allow-empty -m "Initializing gh-pages branch"
git push origin gh-pages
git checkout master

Once the branch is pushed to GitHub, you have to go to the Settings page of the repository. In the section “GitHub Pages”, select gh-pages as the source. The step is described in more details here. If successful, you will see a message saying “Your site is published at https://your-username.github.io/your-repository/”.

Now you can add files to the gh-pages branch, and they will show up on your new website:

git checkout gh-pages
# Adding files ...
git commit -m "Add files"
git push origin gh-pages
git checkout master

Install Docker on Linux Mint 19.3

Posted on Jun 30, 2020 in howto | Tagged docker, linux, linux mint, ubuntu

This is a simple log of how I installed Docker on my laptop running Linux Mint 19.3 (Tricia). I followed the instructions from the official Docker Documentation website. As Linux Mint 19.3 is based on Ubuntu 18.04 LTS (Bionic Beaver), I followed the instructions in the section “Install on Ubuntu”.

The instructions are straight-forward:

# Uninstall any Docker packages
sudo apt remove docker docker-engine docker.io containerd runc

# Install packages to allow apt to use a repository over HTTPS
sudo apt update
sudo apt install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# Verify that the key has been added
sudo apt-key fingerprint 0EBFCD88

# Set up the Docker "stable" repository
sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"

# Install Docker Engine
sudo apt update
sudo apt install docker-ce docker-ce-cli

# Allow non-privileged users to run Docker commands
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Check the Docker version:

docker --version
# Output: Docker version 19.03.12, build 48a66213fe

Verify that you can run docker commands without sudo:

docker run hello-world

HLS Glossary

Posted on Jun 9, 2020 in notes | Tagged firmware, fpga, hls, programming, xilinx

Glossary for HLS design (taken from Parallel Programming for FPGAs by R. Kastner, J. Matai, and S. Neuendorffer):

High-level synthesis (HLS)

The hardware design process that translates an algorithmic description into a register transfer level (RTL) hardware description.
Logic synthesis

The process of converting a RTL design into a netlist of primitive FPGA logic elements (and the connections between them).
Place and route

The process of converting a netlist of device-level primitives into the configuration of a particular device (which is called a bitstream).
Task latency

The time between when a task starts and when it finishes.
Task interval

The time between when a task starts and when the next starts.
Initiation interval

The time between successive data provided to the pipeline.

From the Vivado HLS tool, these are the different steps:

C Simulation

Compile and validate the C (or C++) code. Also build the test bench.
C Synthesis

Synthesis the C design into an RTL design. Report the performance estimates.
C/RTL Co-Simulation

Verify the RTL design by simulating the RTL design and using it in the C test bench.
Export RTL Design

Export the RTL design as an IP.

Note that the Vivado HLS tool only provides estimates for resource usage. To get the real resource usage, one has to go back to Vivado and do place and route.

Also note that Vivado HLS defines the macro __SYNTHESIS__ when synthesis is performend. This can be used to exclude non-synthesizable code, such as std::cout.

Also note that Vivado HLS (v2020.1) uses gcc 4.6.3, which has support for C++11 via the flag -std=c++0x. By default, the C simulation is performed in debug mode. EDIT: Vivado HLS has been discontinued since v2020.1. It has been replaced by Vitis HLS.

Also note that for C/RTL co-simulation, the default tool is xsim and the default language used for RTL is Verilog.

I also find these resources from Xilinx very useful:

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug871-vivado-high-level-synthesis-tutorial.pdf
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf
https://www.xilinx.com/support/documentation/sw_manuals/ug998-vivado-intro-fpga-design-hls.pdf
https://github.com/Xilinx/HLS-Tiny-Tutorials
https://xilinx.github.io/Vitis_Accel_Examples/master/html/cpp.html
https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/hls_pragmas.html
Examples provided in the installation area: <vivado-hls-installation>/examples/coding/

Some random notes I’ve taken from the UG902 doc:

When a loop or function is pipelined, Vivado HLS unrolls all loops in the hierarchy below the loop or function.
Vivado HLS may perform automatic inlining of small functions. If a function is inlined, the logic is merged into the function above it in the hierarchy, and there is no report or separate RTL file for the inlined function. Also, if the function arguments and interface are incorrect or inaccurate, they can prevent Vivado HLS from applying some optimizations.
To reduce latency, Vivado HLS schedules logic operations and functions to execute in parallel. But it does not schedule loops to execute in parallel. To execute two different loops in parallel, the loops should be captured in separate functions.
Arrays accesses can often create bottlenecks to performance. When implemented as a memory, the number of memory ports limits access to the data. Some care must be taken to ensure arrays that only require read accesses are implemented as ROMs in the RTL.
It is recommended to specify arrays that are intended to be memories with the static qualifier. A static array behaves in an almost identical manner as a memory does in RTL.