If you have two mutexes, they must never be allocated at contiguous addresses, or they will both end up in the same cache block with disastrous consequences because if you SC that block, it's as if you were SC'ing both mutexes!
It is a good idea to make sure two murexes don't appear in the same block. But a careful reading of the definition of LL/SC will show that there is no architectural commitment as to the size of the address region that is being tracked by the LL address. That is, an implementation that causes an SC to fail if *any* process does an intervening SC within some range that is independent of the cache block size can be compliant.
In other words, for processes A and B running on different processors:
A:: LL [address X]
B:: LL [address Y]
B:: SC [Y]
A:: SC [X]
and
B:: LL [Y]
A:: LL [X]
B:: SC [Y]
A:: SC [X]
in both cases A may be allowed to fail because of the intervening SC by process B, if X and Y are within some contiguous range. (For instance, the initial Alpha architecture said that the region was at least 8 aligned bytes, and at most one page.)
MIPS had an identical requirement -- see the description of "SC" on page 302 of "MIPS64® Architecture For Programmers
Volume II: The MIPS64® Instruction Set"
There are lots of reasons not to put two semaphores in the same cache block, but LL/SC isn't one of them.
I am reading several books, and as far as I understand, I guess it's all implementation defined.
MIPS R2K and R3K did not implement any atomic read-modify-write instructions.
MIPS R4K was the first.
The load-linked instruction performs the first half of an atomic read-modify-write operation by loading a value from memory
and sets a flag in the hardware that indicates that a read-modify-write operation is in progress to that location, and the read-modify-write operation is completed by using the store-conditional instruction to store any desired value back to the memory location loadedfrom, but it does so
only if the hardware flag is still set.
Any stores done to this location by any CPU or IO device since the load-linked instruction was executed will
cause this flag to be cleared. Therefore, if the store-conditional instruction
finds the flag still set, it will be guaranted that the location hasn't changed since the load-linked instruction was done and the entire sequence of instructions starting with the load-linked and ending with the store-conditional have been executed atomically with respect to the associated memory location.
These two basic instructions can be used to construct more sophisticated atomic operations, anyway It all depends on how the flag is handled
The flag is usually (MIPS4K does it this way) maintained by the cache controller and is invisible to the software.
There are other possibilities:
- Cache-Based
- Exclusive Monitor-based
- TrMem-based (my implementation)
- ...
If it doesn't depend on the cache block size, which is a serious problem on MIPS 4K, I know because I've been banging my head against it for months, then I think MIPS64 uses an "
Exclusive Monitor" to implement exclusive access to memory via load-linked/store-conditional.
ARM uses the
Exclusives Reservation Granule technique: when an exclusive monitor tags an address, the minimum region that can be tagged for exclusive access is called the
Exclusives Reservation Granule (ERG). The ERG is implementation defined, in the range 8-2048 bytes, in multiples of two bytes.
Once again, "portable code" must not assume anything about ERG size.
Worse still, ARM uses LDREX/STREX for multi-processors but they are not "scalable" to uniprocessors. These instructions do not do what many folks think they do. They are *ONLY* for multiprocessor systems, uniprocessor systems should consider using "swap".
