Products > Programming

Hardware Cryptography with GNU/Linux: anyone?

(1/4) > >>

DiTBho:
Cryptographic operations can be very expensive when performed in software, but - in theory - they can be performed by a Cryptographic hardware accelerator to improve performance.

So, I like the idea behind Crypto Co-ops because, at least, they help with { AES-128, AES-256 },  { SHA-1, SHA-256, SHA-512 } ciphers to perform symmetric key encryption and calculate message digests in hardware.

Love that, but it's all a new kind of experience for me.

I'd like to play with GNU/Linux. OpenSSH, VPN, IPsec, ... That stuff.

First I have to understand
- the kernel overhead to "pass" data from the userspace to the hardware accelerator.
- which applications can really benefit commercial hardware accelerators (there are just a few miniPCI modules).

berke:
What kind of CPU are you using?  On x86's the AES instructions just use the CPU state and there is no need to leave user space, and no overhead.
Also, you may look into Bernstein's ciphers (Chacha etc.) that were specifically designed to be very fast without the need for accelerators.

If I was the NSA, I would talk Intel into adding some gates to the CPU to record a random sampling of keys (or an average?) in some hidden flash cells when those special key scheduling instructions are used, of course dropping 50-60 bits or so of entropy to make sure that you still need beefy hardware.

DiTBho:

--- Quote from: berke on February 10, 2023, 06:21:31 pm ---What kind of CPU are you using?

--- End quote ---

miniPCI crypto Co-op -> { MIPS32R2BE@800Mhz, MIPS32R2LE@400Mhz }
PCI32 crypto Co-op -> { PPC405@400Mhz, PPC7450@1600Mhz, PA8900@1100Mhz, MIPS4-R14000@600Mhz }

None of them has Crypto-instructions  :o :o :o

berke:
It all depends on how the kernel driver is implemented.

For example for a general write() call, normally the driver will first check that the memory segment provided to the call is indeed readable by the calling process.  It will then copy the data from the user space into a kernel-allocated page or maybe an internal buffer, and then later program the card's DMA to fetch memory from that page.  The driver has to make arrangements so that the data will end up in physical RAM before the card attempts to fetch it.  Depending on the CPU, this may be trivial to ugly and nasty.

If the card doesn't support DMA, it might expose a buffer in PCI address space, or maybe even a simple register for FIFO operation.  In that case things can get quite slow and inefficient.

The kernel driver could support mmap(), in which case the hardware could DMA into pages that will be directly provided to the process without the CPU having to do any coping, however that depends on the kernel driver, and you can have a mmap() interface filled by software.

Last time I wrote a kernel driver was maybe 7 years ago and it was on Spartan/Microblaze and Zynq.   I didn't follow things closely since, but I've heard there are new things such as io_uring.

The read() call is the same thing but in reverse order.

Hope this helps.

DiTBho:
is there a crypto accelerator, commodities such as copper and coffee, that implements "blowfish" or "twofish"?  :-//

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod