Author Topic: shared memory in multi-CPU systems: looking for books, docs, ...  (Read 5594 times)

0 Members and 1 Guest are viewing this topic.

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4851
  • Country: nz
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #50 on: December 09, 2024, 02:33:15 am »
That took work definitely and a fair bit of "reverse-engineering". But it is cheap and pretty powerful, and still has enough documented (even if it's sparse and requires a lot of work) to be able to use it baremetal (which was not the intent of the guys who make it available to the public), so that's still a pretty positive point. Try doing that with any kind of typical SBC out there.

Are you publishing the consolidated information anywhere?
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15967
  • Country: fr
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #51 on: December 09, 2024, 04:05:28 am »
That took work definitely and a fair bit of "reverse-engineering". But it is cheap and pretty powerful, and still has enough documented (even if it's sparse and requires a lot of work) to be able to use it baremetal (which was not the intent of the guys who make it available to the public), so that's still a pretty positive point. Try doing that with any kind of typical SBC out there.

Are you publishing the consolidated information anywhere?

I'm considering that, just have to find enough time.
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4479
  • Country: gb
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #52 on: December 09, 2024, 08:18:36 am »
Don't talk to me about SBC documentation, it's a sore point for me  :o :o :o

Saving my ELTK, 2x68060 VME-board with hw mailbox and hw semaphores, from the hydraulic press cost me only two bottles of good red wine to make the guys paid to destroy those boards turn a blind eye. Officially they didn't see me saving that board, officially the company that commissioned the "cleaning" of the lab knows that it was destroyed.

However, documentation was a bloody sore point, because they had already destroyed everything before I was able to save anything, and there is literally nothing on the web.

This type of boards have also been used in industrial sewing machines and also in other fields, very old stuff, the early 90s was modern-era, when people didn't use to upload anything as big as a scan of several books, also because of the cost of uploading.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4851
  • Country: nz
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #53 on: December 09, 2024, 10:02:48 am »
That took work definitely and a fair bit of "reverse-engineering". But it is cheap and pretty powerful, and still has enough documented (even if it's sparse and requires a lot of work) to be able to use it baremetal (which was not the intent of the guys who make it available to the public), so that's still a pretty positive point. Try doing that with any kind of typical SBC out there.

Are you publishing the consolidated information anywhere?

I'm considering that, just have to find enough time.

If you send me the notes, I could try to find some time to turn it into English.
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4479
  • Country: gb
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #54 on: December 25, 2024, 10:54:34 am »
  • The Art of Multiprocessor Programming, by Maurice Herlihy, Nir Shavit

Interesting book  :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online coppice

  • Super Contributor
  • ***
  • Posts: 10199
  • Country: gb
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #55 on: December 28, 2024, 08:01:55 pm »
  • The Art of Multiprocessor Programming, by Maurice Herlihy, Nir Shavit

Interesting book  :o :o :o
"Written by the world's most revered experts in multiprocessor programming and performance" - so modest.
 

Online radiogeek381

  • Regular Contributor
  • *
  • Posts: 134
  • Country: us
    • SoDaRadio
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #56 on: January 04, 2025, 12:13:18 am »

If you have two mutexes, they must never be allocated at contiguous addresses, or they will both end up in the same cache block with disastrous consequences because if you SC that block, it's as if you were SC'ing both mutexes!


It is a good idea to make sure two murexes don't appear in the same block. But a careful reading of the definition of LL/SC will show that there is no architectural commitment as to the size of the address region that is being tracked by the LL address.  That is, an implementation that causes an SC to fail if *any* process does an intervening SC within some range that is independent of the cache block size can be compliant.

In other words, for processes A and B running on different processors:

A:: LL [address X]
B:: LL [address Y]
B:: SC [Y]
A:: SC [X]

and
B:: LL [Y]
A:: LL [X]
B:: SC [Y]
A:: SC [X]

in both cases A may be allowed to fail because of the intervening SC by process B, if X and Y are within some contiguous range. (For instance, the initial Alpha architecture said that the region was at least 8 aligned bytes, and at most one page.)

MIPS had an identical requirement -- see the description of "SC" on page 302 of "MIPS64® Architecture For Programmers
Volume II: The MIPS64® Instruction Set"

There are lots of reasons not to put two semaphores in the same cache block, but LL/SC isn't one of them.
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4479
  • Country: gb
Re: shared memory in multi-CPU systems: looking for books, docs, ...
« Reply #57 on: January 04, 2025, 03:27:00 pm »

If you have two mutexes, they must never be allocated at contiguous addresses, or they will both end up in the same cache block with disastrous consequences because if you SC that block, it's as if you were SC'ing both mutexes!


It is a good idea to make sure two murexes don't appear in the same block. But a careful reading of the definition of LL/SC will show that there is no architectural commitment as to the size of the address region that is being tracked by the LL address.  That is, an implementation that causes an SC to fail if *any* process does an intervening SC within some range that is independent of the cache block size can be compliant.

In other words, for processes A and B running on different processors:

A:: LL [address X]
B:: LL [address Y]
B:: SC [Y]
A:: SC [X]

and
B:: LL [Y]
A:: LL [X]
B:: SC [Y]
A:: SC [X]

in both cases A may be allowed to fail because of the intervening SC by process B, if X and Y are within some contiguous range. (For instance, the initial Alpha architecture said that the region was at least 8 aligned bytes, and at most one page.)

MIPS had an identical requirement -- see the description of "SC" on page 302 of "MIPS64® Architecture For Programmers
Volume II: The MIPS64® Instruction Set"

There are lots of reasons not to put two semaphores in the same cache block, but LL/SC isn't one of them.

I am reading several books, and as far as I understand, I guess it's all implementation defined.

MIPS R2K and R3K did not implement any atomic read-modify-write instructions.
MIPS R4K was the first.

The load-linked instruction performs the first half of an atomic read-modify-write operation by loading a value from memory and sets a flag in the hardware that indicates that a read-modify-write operation is in progress to that  location, and the read-modify-write operation is completed by using the store-conditional instruction to store any desired value back to the memory location loadedfrom, but it does so only if the hardware flag is still set.

Any stores done to this location by any CPU or IO device since the load-linked instruction was executed will cause this flag to be cleared. Therefore, if the store-conditional instruction finds the flag still set, it will be guaranted that the location hasn't changed since the load-linked instruction was done and the entire sequence of instructions starting with the load-linked and ending with the store-conditional have been executed atomically with respect to the associated memory location.

These two basic instructions can be used to construct more sophisticated atomic operations, anyway It all depends on how the flag is handled

The flag is usually (MIPS4K does it this way) maintained by the cache controller and is invisible to the software.
There are other possibilities:
  • Cache-Based
  • Exclusive Monitor-based
  • TrMem-based (my implementation)
  • ...

If it doesn't depend on the cache block size, which is a serious problem on MIPS 4K, I know because I've been banging my head against it for months, then I think MIPS64 uses an "Exclusive Monitor" to implement exclusive access to memory via load-linked/store-conditional.

ARM uses the Exclusives Reservation Granule technique: when an exclusive monitor tags an address, the minimum region that can be tagged for exclusive access is called the Exclusives Reservation Granule (ERG). The ERG is implementation defined, in the range 8-2048 bytes, in multiples of two bytes.

Once again, "portable code" must not assume anything about ERG size.

Worse still, ARM uses LDREX/STREX for multi-processors but they are not "scalable" to uniprocessors. These instructions do not do what many folks think they do. They are *ONLY* for multiprocessor systems, uniprocessor systems should consider using "swap".

 :-//
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf