essay on programming languages, computer science, information techonlogies and all.

Thursday, July 11, 2024

Fighting aganist optimization

Here is a snippet of verilog code - a simple command decoding module and block ram. It get 32 bit command by rx_data and if the command is memory write, then next command is memory value to store - e.g. command 3100_0004, CAFE_BEEF, 1234_5678 will store CAFE_BEEF, 1234_5678 at block ram address 0x0004.


module top_nerves
    #(  parameter integer CMD_MEM_ADDR_WIDTH = 8
    ) (                        
...
        input [31:0] rx_data,
        input rx_valid,
        output rx_ready,     
...
    );

	wire [23:0] mem_addr;
	wire mem_wstrb;
	wire [31:0] mem_wdata;
	wire mem_wready;
	wire [31:0] mem_rdata;
	wire mem_rstrb;
	wire mem_rbusy;

	cmd_decoder
		cd(
			.clk(clk),
			.rst_n( rst_n ),
		
			.rx_data( rx_data ),
			.rx_valid( rx_valid ),
			.rx_ready( rx_ready ),
...
			.mem_addr( mem_addr ),
			.mem_wstrb( mem_wstrb ),
			.mem_wdata( mem_wdata ),
			.mem_wready( mem_wready ),
			.mem_rdata( mem_rdata ),
			.mem_rstrb( mem_rstrb ),
			.mem_rbusy( mem_rbusy ),

			.io_out( cmd_io_out )
		);

    bram_32bits #(
...        )
        cmd_mem ( 
            .clk( clk ),
            .waddr( mem_addr[CMD_MEM_ADDR_WIDTH-1:2] ), 
            .wdata( mem_wdata ),
            .wenable( mem_wstrb ),
            .raddr( mem_addr[CMD_MEM_ADDR_WIDTH-1:2] ),
            .rdata( mem_rdata ) 
        );        
endmodule


module cmd_decoder (
        input [31:0] rx_data,
        input rx_valid,
        output reg rx_ready,     
		...
        output reg [23:0] mem_addr,
        output reg mem_wstrb,
        output reg [31:0] mem_wdata,
        input mem_wready,
        input [31:0] mem_rdata,
        output reg mem_rstrb,
        input mem_rbusy,
);
	...
    localparam
        CMD_MEM_WRITE = 4'h3,
        ...
    ;
        
    reg [3:0] state;
    reg [31:0] data;
    wire [3:0] cmd;
    assign cmd = data[31:28];
    reg [3:0] cur_cmd;
    
    always @( posedge clk or negedge rst_n ) begin
		...            
            case (state)
                FETCH : begin
                    if (rx_valid) begin 
                        data <= rx_data;
					...
                    end 
                end
                
                LOAD : begin
                    cur_cmd <= cmd;

                    case (cmd)
                        CMD_MEM_WRITE : begin
                            word_count <= data[27:24] + 1; 
                            mem_addr <= data[23:0] - 4;
                            state <= DONE;
                        end
                        ...
                    endcase
                    
                    
                MEM_WRITE : begin
                    if( mem_wready == 1 ) begin
                        mem_addr <= mem_addr + 4;
                        mem_wstrb <= 1;
                        mem_wdata <= data;
                        word_count <= word_count - 1;
                        state <= DONE;
                    end
                end
			
            	...                
            endcase
        end        
    end    


Now when run synthesizing, two missings noted.
- 8 bits of rx_data is missing. [31:0] is mapped to [23:0]. The first 4 bits is for command and second 4 bits for word count.
- mem_wdata is not connected between cmd_decoder and bram. Code has "mem_wdata <= data;" which should connect mem_wdata to bram.








The first puzzle is just wrong assumption that the [31:24] of [31:0] must be removed and mapped to [23:0]. What removed is the address not used - from [23:16] of [31:0]. When hovering over the ... like below, those optimization was revealed.



Second puzzle is from the unassigned wire. The bram in here is such that the mem_wready is not necessary but the top_nerves.v defined it anyway and assigned it.  And optimization figured it out that mem_ready will be 0 always and it removes statements inside. 

module cmd_decoder (
...                  
                MEM_WRITE : begin
                    if( mem_wready == 1 ) begin   // mem_wready is 0 always
                        ...  // optimization just ignore this part completely.
                    end
                end
When put 1 to the wire value, then mem_wdata is correctly synthesized.

module top_nerves
...
	cmd_decoder
		cd(
                        ...
			.mem_addr( mem_addr ),
			.mem_wstrb( mem_wstrb ),
			.mem_wdata( mem_wdata ),
			.mem_wready( 1 ),
			...

Sometimes, the job of programmer is to understand what other program is generating before writing own program. 

No comments: