Does RISC-V implement something that allows the equivalent of ARM's "lazy" Floating point stacking? (that's also "neat, but maybe slower than I'd want.")
I don't know the details of whatever Arm does, but haven't such facilities been standard since ... the 68020 and 80386?
RISC-V provides the mechanisms to implement a number of different policies with regard to FP (and other) state. It does not dictate any particular policy. Saving and restoring FP state can each, independently, be eager or lazy.
Anyway, to quote:
----
3.1.6.6 Extension Context Status in mstatus Register
Supporting substantial extensions is one of the primary goals of RISC-V, and hence we define a standard interface to allow unchanged privileged-mode code, particularly a supervisor-level OS, to support arbitrary user-mode state extensions.
To date, the V extension is the only standard extension that defines additional state beyond the floating-point CSR and data registers.The FS[1:0] and VS[1:0] WARL fields and the XS[1:0] read-only field are used to reduce the cost of context save and restore by setting and tracking the current state of the floating-point unit and any other user-mode extensions respectively. The FS field encodes the status of the floating-point unit state, including the floating-point registers f0–f31 and the CSRs fcsr, frm, and fflags. The VS field encodes the status of the vector extension state, including the vector registers v0–v31 and the CSRs vcsr, vxrm, vxsat, vstart, vl, vtype, and vlenb. The XS field encodes the status of additional user-mode extensions and associated state. These fields can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.
The design anticipates that most context switches will not need to save/restore state in either or both of the floating-point unit or other extensions, so provides a fast check via the SD bit.The FS, VS, and XS fields use the same status encoding as shown in Table 3.3, with the four possible status values being Off, Initial (e.g. zeroed), Clean, and Dirty.
When an extension’s status is set to Off, any instruction that attempts to read or write the corresponding state will cause an illegal instruction exception. When the status is Initial, the corresponding state should have an initial constant value. When the status is Clean, the corresponding state is potentially different from the initial value, but matches the last value stored on a context swap. When the status is Dirty, the corresponding state has potentially been modified since the last context save.
During a context save, the responsible privileged code need only write out the corresponding state if its status is Dirty, and can then reset the extension’s status to Clean. During a context restore, the context need only be loaded from memory if the status is Clean.
----
FS is for the FPU. VS is for the Vector unit. XS is a summary of any additional standard (none now) or custom extensions that add state that needs to be context-switched. Each such extension will add additional status bits elsewhere.
As well as lazy save (only if the state is Dirty), an OS can choose whether to eagerly reload the state when switching back to a process (and set the state to Clean) or lazily (and set the state to Off) so that FP or Vector etc context is loaded only on the first execution of a relevant instruction.
FP state is 128 bytes for a single precision FPU, 256 bytes for a double precision FPU. Typically it is the same size as the integer register state, so no big deal if it is not done lazily.
However vector state is (on an application class processor) a minimum of 512 bytes and the most common size in the next few years is likely to be 1024 bytes, with 2048 bytes (512 bit vector registers) not uncommon.
The vector state is managed more aggressively than the FP state. The ABI specifies that vector registers are caller-save i.e. their contents are undefined after ANY function call, and this includes system calls. This enables the OS to set the vector state to Off or Initial before returning from any syscall. The vector state only needs to be saved and restored if a context switch happens as the result of an interrupt (e.g. 100 Hz system tick), and NOT if a context switch happens as a result of a system call blocking (e.g. I/O).
Also:
----
Changing the setting of FS has no effect on the contents of the floating-point register state. In particular, setting FS=Off does not destroy the state, nor does setting FS=Initial clear the contents. Similarly, the setting of VS has no effect on the contents of the vector register state. Other extensions, however, might not preserve state when set to Off.
----
So, it is also possible to implement a policy of not saving FP or Vector state when you switch away from a process, but simply set the unit to Off and then, when that process is later resumed, if no other process has used the FPU or VPU in the meantime then simply turn the FPU or VPU back on, without ever saving and restoring the state.