Computing Memoir: 2022

Wednesday, July 6, 2022

Cross-compile Armv7 from x86_64 using docker

Cross compiling sounds like a big water difficult to cross. Of course it asks a lot of knowledge but at the deep down, it is just another way of compiling and linking. And in here I am trying to show it by cross-compiling raspberrypi target from ubuntu x86_64 machine.

First, you need toolchain - arm gnu toolchain. I used 9.2 as it happended to be the version I used on work.

Second, part of target file system - /usr and /lib. For raspberry pi 32bit, download raspberry pi OS lite at here. Then extract /usr and /lib only.

With these two, we can make docker image for the build. Build needs x86_64 tools. And cross-compiler tool-chain - gcc-arm-xxx-x86_64-arm-xxx. And target root of raspbian os.


# Dockerfile
FROM  ubuntu:20.04

RUN useradd -s /bin/bash -u 1000 docker_user && \
    apt-get update && \
    apt-get install -y \
    gcc g++ cmake git ninja-build lcov bison flex curl python3 fakeroot libpcre3-dev libcurl4-openssl-dev libxml2-dev \
    cppcheck clang-tools clang-format clang-tidy
RUN apt-get install -y python3-pip 

# https://developer.arm.com/downloads/-/gnu-a
# AArch32 target with hard float (arm-none-linux-gnueabih, )
ADD gcc-arm-9.2-2019.12-x86_64-arm-none-linux-gnueabihf.tar.xz /opt

# SYSROOT from rapsbian buster armhf - usr, lib only
ADD raspios-buster-armhf-lite.usrlib.tar.gz /opt/raspios-buster-armhf-lite
RUN ln -s /opt/raspios-buster-armhf-lite/lib/arm-linux-gnueabihf/ /lib/arm-linux-gnueabihf

... snip ...

Third, we needs cmake cross-compiling toolchain file. Below cmake defines the cross-compiling by specifying compiler - gcc-arm-xxx - and include and lib path at /opt/raspios-xxx. You may wonders why is there no CMAKE_SYSROOT, CMAKE_FIND_ROOT_PATH. I found those generate number of error that ask other changes which seems to be too obscure to pursue. And I think it is succinct to tell the tool-chain where to look at to compile and link instead of using those magic words.


# toolchain_raspberrypi_gcc9.cmake

set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_VERSION arm)
set(CMAKE_SYSTEM_PROCESSOR arm-linux-gnueabihf)
set(CMAKE_C_COMPILER /opt/gcc-arm-9.2-2019.12-x86_64-arm-none-linux-gnueabihf/bin/arm-none-linux-gnueabihf-gcc)
set(CMAKE_CXX_COMPILER /opt/gcc-arm-9.2-2019.12-x86_64-arm-none-linux-gnueabihf/bin/arm-none-linux-gnueabihf-g++)

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -L/opt/raspios-buster-armhf-lite/usr")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -L/opt/raspios-buster-armhf-lite/usr/lib")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -L/opt/raspios-buster-armhf-lite/usr/lib/arm-linux-gnueabihf")

include_directories(BEFORE 
    /opt/raspios-buster-armhf-lite/usr/include
    /opt/raspios-buster-armhf-lite/usr/include/arm-linux-gnueabihf

With above cmake toolchain file, we can define remainging part of docker that prepares google unittest and google benchmark.


# Dockerfile
...
ADD toolchain_raspberrypi_gcc9.cmake .

RUN git clone https://github.com/google/googletest.git --depth 1 --branch release-1.11.0 &&\
    cd googletest &&\
    cmake -S . -B ./build_x86_64 &&\
    cmake --build ./build_x86_64 &&\
    cmake --install ./build_x86_64 &&\
    cmake -S . -B ./build_arm -DCMAKE_TOOLCHAIN_FILE=../toolchain_raspberrypi_gcc9.cmake &&\
    cmake --build ./build_arm &&\
    cmake --install ./build_arm --prefix /opt/raspios-buster-armhf-lite/usr/ &&\
    cd ../ &&\
    rm -rf googletest

ADD CMakeLists.benchmark.txt .

RUN git clone https://github.com/google/benchmark.git --depth 1 --branch v1.6.1 && \ 
    cd benchmark &&\
    mv ../CMakeLists.benchmark.txt ./CMakeLists.txt &&\
    cmake -S . -B ./build_x86_64 -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_GTEST_TESTS=0&&\
    cmake --build ./build_x86_64 --config Release &&\
    cmake --install ./build_x86_64 &&\
    cmake -S . -B ./build_arm -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_GTEST_TESTS=0 -DCMAKE_TOOLCHAIN_FILE=../toolchain_raspberrypi_gcc9.cmake &&\
    cmake --build ./build_arm &&\
    cmake --install ./build_arm --prefix /opt/raspios-buster-armhf-lite/usr/ &&\
    cd ../ &&\
    rm -rf benchmark

After you generate docker image, you can go in the image, and make the host build and cross-compile build.


$ docker run --rm -it -v "$PWD":/du -w /du -u 0 samplecpp_build:0.1

/du# rm -rf ./build_x86_64
/du# cmake -S . -B ./build_x86_64
/du# cmake --build ./build_x86_64

/du# rm -rf ./build_arm
/du# cmake -S . -B ./build_arm -DCMAKE_TOOLCHAIN_FILE=../toolchain_raspberrypi_gcc9.cmake
/du# cmake --build ./build_arm

Cross-compiling means you specify compiler and include and lib path. And that is all.

You can find the code at my gitlab project samplecpp

Sunday, March 27, 2022

Annotated Verilog code for AXI-Stream FIFO interface

At Zynq, there is AXI-Stream FIFO to send/receive data between Processor and FPGA. It has two separate FIFOs - one for transmitting data to FPGA and receiving from FPGA. Here I will explain verilog code that can interface the FIFO with annotation.

First the module named decoder needs ports definitions. FIFO TX ports will be received by RX ports - rx_data, rx_valid, rx_ready. You can define special property - ( * X_INTERFACE_INFO ... *) - to group related ports which allows you to connect group ports - AXI_STR_TXD to RX. And TX to AXI_STR_RXD.

In here, the FIFO data is treated as instruction. And FSM is implemented with FETCH-LOAD-(TX)-DONE cycle.

When the module get reset_n - active low reset, the FSM get initialized. It can receive data from TX FIFO but no data to send to RX FIFO.

At FETCH, when TX FIFO is 1, read rx_data.

At LOAD, if instruction is CMD_WRITE, turn on/off led. If CMD_READ, prepare tx_data for send to TX FIFO

At TX, send tx_data. At DONE, we are ready to receive but nothing to send.

For the complete code, refer below.

Sunday, February 6, 2022

Blocking vs non-blocking assignment on sequential logic

Made counter that hold 1 clock at every 4 clock. First with blocking assignment


`timescale 1ns / 1ps
module ride_edge (
        input wire clk,resetn,
        output wire hold_p
    );
    
    reg hold;
    reg [1:0] count;
    
    assign hold_p = hold;
    
    always @(posedge clk or negedge resetn) begin
        if (!resetn) begin
            hold = 0;
            count = 0;        
        end
        else begin         
            if ( count == 0 ) hold = 1;    
            else if ( hold == 1 ) hold = 0;
            
            count = count + 1;
        end    
    end
    
endmodule

Then probe with oscilloscope.

Then made same code with non-blocking assignment like below.


`timescale 1ns / 1ps
module ride_edge_nonblocking (
        input wire clk,resetn,
        output wire hold_p 
    );
    
    reg hold;
    reg [1:0] count;
    
    assign hold_p = hold;
    
    always @(posedge clk or negedge resetn) begin
        if (!resetn) begin
            hold = 0;
            count = 0;        
        end
        else begin         
            if ( count == 0 ) hold <= 1;    
            else if ( hold == 1 ) hold <= 0;
            
            count <= count + 1;
        end    
    end    
endmodule

Here is Vivado diagram. The FCLK_CLK0 is 100MHz, xslice_1 pick Din[4] make 3.125MHz clock ( = 100MHz/32 ).

Then probe both hold_p.

Puzzled. I expected that non-blocking will be delayed 1 clock. No wonder as schematic shows exactly same nets generated.

But why no difference between blocking and non-blocking assignment ? I guess that when the design involves a register then assignment type doesn't matter as the register will be used in same manner. It seems Vivado synthesis reads between lines.

Then once you put a register in a design, same sequential logic will be generated which means '<=' will be used on every '='. Then better use '<=' on Verilog to avoid confusion ?

Sequential logic needs register. It seems to me that '<=' is the only choice left.

Computing Memoir