Which compilers? gcc-arm didn't optimize it "at all" (for CM4), nor does XCode LLVM :-(
00000000 <unpack_u32le>:
0: e5d01002 ldrb r1, [r0, #2]
4: e5d0c001 ldrb ip, [r0, #1]
8: e5d02000 ldrb r2, [r0]
c: e1a03801 lsl r3, r1, #16
10: e183340c orr r3, r3, ip, lsl #8
14: e1830002 orr r0, r3, r2
18: e1800c01 orr r0, r0, r1, lsl #24
1c: e12fff1e bx lr
00000020 <unpack_32be>:
20: e5d02001 ldrb r2, [r0, #1]
24: e5d03000 ldrb r3, [r0]
28: e5d01003 ldrb r1, [r0, #3]
2c: e5d00002 ldrb r0, [r0, #2]
30: e1a02802 lsl r2, r2, #16
34: e1823c03 orr r3, r2, r3, lsl #24
38: e1833001 orr r3, r3, r1
3c: e1830400 orr r0, r3, r0, lsl #8
40: e12fff1e bx lr
which is not optimal, sure, but not absolutely horrible either. get_native_u32:
ldr r0, [r0]
bx lr
get_byteswap_u32:
ldr r0, [r0]
rev r0, r0
bx lr
since ldr on Cortex-M4 can handle unaligned accesses just fine; but, you need something likestatic inline uint32_t get_native_u32(const unsigned char *const src)
{
return *(const uint32_t *)src;
}
static inline uint32_t get_byteswap_u32(const unsigned char *const src)
{
uint32_t result = *(const uint32_t *)src;
result = ((result & 0x0000FFFFu) << 16) | ((result >> 16) & 0x0000FFFFu);
result = ((result & 0x00FF00FFu) << 8) | ((result >> 8) & 0x00FF00FFu);
return result;
}
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define get_u32le(src) get_native_u32(src)
#define get_u32be(src) get_byteswap_u32(src)
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define get_u32le(src) get_byteswap_u32(src)
#define get_u32be(src) get_native_u32(src)
#else
#error Unsupported byte order
#endif
to get ARM-GCC to emit that. Personally, I prefer the first one for readability, but will switch to the latter if it makes a measurable difference at run time.Using || instead of | isn't going to help.
Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?
unsigned int get_bit64(const uint64_t *map, const size_t bit)
{
return !!(map[bit/64] & ((uint64_t)1 << (bit & 63)));
}
unsigned int get_bit8(const uint8_t *map, const size_t bit)
{
return !!(map[bit / 8] & (1 << (bit & 7)));
}
then you always have get_bit64(map, i) == get_bit8(map, i). (Ignore any typos in the above code, if you find one.) Not so with big-endian byte order, where you must use a specific word size to access the bit map. Granted, it only matters in some rather odd cases, like when different operations wish to access the binary data in different-sized chunks.Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?Zero difference.
But it is not about convenience of an implementer. It is about convenience of a programmer.
Does byte order make any difference in the complexity of the VHDL/Verilog code? Especially the ALU, or when loading/storing unaligned multibyte values?
All the things described above are only problems if you think that BE is better. In that case
Xeons are highest performing general purpose processors on the market right now, so why not CERN use them?
U54-MC
The SiFive U54-MC Standard Core is the world’s first RISC-V application processor, capable of supporting full-featured operating systems such as Linux.
The U54-MC has 4x 64-bit U5 cores and 1x 64-bit S5 core—providing high performance with maximum efficiency. This core is an ideal choice for low-cost Linux applications such as IoT nodes and gateways, storage, and networking.
RISC-V system emulator supporting the RV128IMAFDQC base ISA (user level ISA version 2.2, priviledged architecture version 1.10) including:
32/64/128 bit integer registers
32/64/128 bit floating point instructions
4) Technical notes
------------------
4.1) 128 bit support
The RISC-V specification does not define all the instruction encodings for the 128 bit integer and floating point operations. The missing ones were interpolated from the 32 and 64 ones.
Unfortunately there is no RISC-V 128 bit toolchain nor OS now (volunteers for the Linux port ?), so rv128test.bin may be the first 128 bit code for RISC-V !
Is there a document about all the requirements and spec for making Linux run on RISCV?
Is there a RISCV board with ePCI or PCI?
So it's completely ... experimental. But I still wonder WHO needs 128bit registers, and for what
The Linux kernel supports RISC-V. There are I suppose at least half a dozen Linux distributions that support RISC-V, the most heavily used probably being Debian, Fedora, and Buildroot.
The main requirement of a board maker is to implement a bootloader and the SBI (Supervidor Binary Interface). The most commonly used bootloader at the moment is BBL (Berkeley BootLoader, which also implements SBI), but it's pretty crude so there is a lot of work going into others such as Das U-Boot and coreboot.