Author Topic: The "alternatives to interrupts" thread (Read 14322 times)

Nominal Animal · « **Reply #125 on:** December 29, 2020, 12:56:19 am »

Quote from: brucehoult on December 28, 2020, 09:23:49 pm

Quote from: Nominal Animal on December 28, 2020, 06:47:23 pm
Stacks need to grow upwards, though, and therefore you really need addressing modes where an offset is subtracted from the stack or stack frame pointer; especially with a small literal constant (most common use, referring to a local variable).
Local variables on the stack is very old-fashioned.

Needed only when you have more local variables than you can fit in registers, sure. Caller or callee-saving is obviously part of an ABI, and I agree, having any caller-saved registers is old-fashioned; callee-saving is more efficient.

Quote from: brucehoult on December 28, 2020, 09:23:49 pm

But I don't understand why stacks need to grow upwards?

That is due to the "segmented" paged memory model. When the 2^k possible segments all grow upwards, it suffices to have page descriptors from zero offset up to the limit. The entire address range dedicated for the stack does not need to be fully populated by page descriptors. If the stack is only used for return addresses (or in general, grows at most by a page at a time), then a single-page initial stack suffices, and can be dynamically grown when needed (both at the virtual address level and the page table entry level) by the OS kernel without involving the userspace at all.

If you want the same functionality with downwards-growing stacks, each segment needs a base and a limit, with the address range in between covered by page table entries. However, that adds an extra subtraction operation to MMU: subtracting base from the address being dereferenced.

Alternatively, you could have both "upwards" and "downwards" growing stacks, where the base is always 0, and limit is either the upper or the lower limit of the segment, with page table entries covering the region either starting at the beginning of the segment, or always ending at the end of the segment.
I don't like the complexity, so requiring upwards growing stacks simplifies the scheme significantly.

(Okay, you could also have everything grow downwards, with limit always zero (one past last accessable address), and with the base essentially being just size subtracted from zero, but thinking like this weirds me out.)

Let's say you have a full address A, with k most significant bits containing the "segment", and the rest, N-k bits describing the offset. Accessing memory at A involves the following, assuming 2ⁿ-byte pages:

The k most significant bits (bits N-k through N-1, inclusive) of A contain the segment number, S. It specifies which one of the 2^k segment descriptors is used.
The N-k-n intermediate bits (bits n through N-k-1, inclusive) of A contain the page number, P. This identifies the page descriptor index in the page table for the current segment.
The segment descriptor contains the start address for the page table descriptor array, F.
The page descriptor contains the physical memory base address B for the page. Thus, F+P×(size of page descriptor) is the physical memory address where the page descriptor is in.
The n least significant bits of A specify the offset in the page, L. Thus, B+L is the memory address referred to.
Usually, the n low bits of B are required to be zero, so it is not even an addition, just concatenating the bits.

In parallel to 2., the low N-k bits are compared to the segment limit. If they exceed the segment limit, the access is invalid (segment violation).
In 4, the page descriptor present bit needs to be examined, and a hardware interrupt generated if it is not.
When present, the page descriptor read or write flags should be updated (depending on the instruction), and if initial-interrupt flag is set, a hardware interrupt generated if the flag transitions from false to true. This allows copy-on-write pages.

The memory access mode flags can be segment-specific; I don't even recall having to change page address modes outside examples and self-modifying code, except for a full memory mapped object; so I don't think I need page-level access mode flags. Although, if both segment and page access mode flags are used, then they can be AND'ed (if "allow" type flags) or OR'ed (if "deny" type flags).

Simply put, I can get away with a relatively simple MMU with just 2^k segment registers in CPU dedicated memory. I'm sure you can see how even a small page descriptor cache (using the N-n high bits of the virtual address as the key for the cached page descriptors) can speed it up significantly. Yet, such a simple MMU can still provide basically all the features the application programmer needs.

I do think it ~~would make a lot of~~ might make sense to put the page descriptors in a completely separate memory bus than the actual memory contents. Not only should it speed things up (since it is relatively small, and could then be accessed in parallel to "normal" memory accesses; but would essentially have to be big enough to contain the page tables for all tasks, not just the currently running tasks), but it would be really nice way to protect them against programming errors that allow direct access to memory containing page descriptors. It would be particularly interesting for a multi-core processor. The only downside I can immediately see is the complexity it adds to hibernation (suspend-to-disk; completely unpowered "sleep"). And possibly that memory isn't cheap, so having just one pool that gets divvied up as needed is probably more cost-efficient.

Quote from: brucehoult on December 28, 2020, 09:23:49 pm

Quote
Having a hardware comparison instruction that checks if 0 ≤ offset < limit, with offset a register operand, and limit either a register or a memory operand, would help a lot because then the bounds check would be a single instruction.
Such as "set-less-than-unsigned" or "branch-less-than-unsigned"?

Exactly. I'm an idiot, you see.

DiTBho · « **Reply #126 on:** December 29, 2020, 04:44:41 am »

TLB-Page-Table

?

Nominal Animal · « **Reply #127 on:** December 29, 2020, 03:45:11 pm »

Quote from: DiTBho on December 29, 2020, 04:44:41 am

TLB-Page-Table ?

Yes. I'm not sure how long the instruction pipeline would have to be to do it without any caching, since each memory access involves that extra page descriptor access.

In that simple-MMU case, the TLB would be a set of cached page descriptors, and the N-n-bit key (used to find the cached descriptor) would be the segment number S and the page number P.

One possible implementation uses a hash table with size a power of two, say 2^H, and mixes the key bits into a H-bit index, using a fixed hardware algorithm (usually just a bitwise XOR pattern, with each of the H bits depending on at least three different key bits). Each hash table slot contains both the key and the cached page descriptor. If the cached key in the target does not match, a short stall occurs, as the correct page descriptor is loaded to that slot.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: The "alternatives to interrupts" thread (Read 14322 times)

Nominal Animal

Re: The "alternatives to interrupts" thread

DiTBho

Re: The "alternatives to interrupts" thread

Nominal Animal

Re: The "alternatives to interrupts" thread

Share me