Author Topic: Linux systemctl Scheduling query (Read 1869 times)

mag_therm · « **on:** December 20, 2020, 06:15:38 pm »

I am moving an embedded system up from earlier linux versions that used rc.local (just a script by root) for custom settings on boot.

Part of the setting, on a 6 core box is to start a number of parallel processes (5) owned by root (numerical number crunchers).

My service is in /etc/systemd/system
Like this:

--------------------------------
[Unit]
Description=<My> setup
After=getty.target
[Service]
Type=forking

#CPUSchedulingPolicy=rr
#CPUSchedulingPriority=85
#Nice=10

User=root
Group=<MyGroup>
ExecStart=<My Setup.sh>

[Install]
WantedBy=multi-user.target
---------------------------------

I have not been able to find any documentation on internet that correctly details the 3 Policy/Priority/Nice settings shown (#-ed) above
Setting Policy=rr ( round robin) alone, sets actual nice of -18 which is a really strong priority.
In my case it can result in the users' graphics freezing if the processes are heavily loaded

There is a new scale for Priority , according to documentation inverted from 1=low to +99=high. over the actual nice of +20 to -20
But any value I enter here seems always to result in actual nice of -18 with possibility of graphics freezing as above.

The New Nice from my test is a yet another re-map of the actual nice,
From my test Nice = 10 gives actual nice of -8 and Nice of 15 give actual nice of -3 which seems strange.
I can't see any doc on internet which explains the new Nice.

If I can't resolve this I will just proceed with Nice=10 which seems to give good results overall.

Is anybody on here doing this kind of priority setting in systemctl, who can comment? ..Thanks

andersm · « **Reply #1 on:** December 20, 2020, 07:41:03 pm »

The Linux realtime scheduling policies use a different range of priority values from non-realtime scheduling policies, and they don't map directly to the normal "nice" range. Look for an explanation of how scheduling is implemented in Linux, starting eg. with the sched manpage.

If the intent is to run your number-crunching processes at a low priority, using any of the real-time scheduling policies is probably not what you want.

mag_therm · « **Reply #2 on:** December 20, 2020, 08:00:50 pm »

Hi Anders,
Actually the processes need to run at the highest priority without upsetting the users too much, as there is a master running a nice =0, owned by the user.
Some jerkiness to the user is OK because the machine is dedicated to just solving and running servers..
There are 6 cores, and with the settings I have the cpu loading as displayed at 83% to 100 % which is good.
I will change from RR to FIFO after reading your link.

PKTKS · « **Reply #3 on:** December 21, 2020, 05:12:08 pm »

Quote from: mag_therm on December 20, 2020, 08:00:50 pm

Hi Anders,
Actually the processes need to run at the highest priority without upsetting the users too much, as there is a master running a nice =0, owned by the user.
Some jerkiness to the user is OK because the machine is dedicated to just solving and running servers..
There are 6 cores, and with the settings I have the cpu loading as displayed at 83% to 100 % which is good.
I will change from RR to FIFO after reading your link.

Totally sounds like a typical CGROUP case.
https://en.wikipedia.org/wiki/Cgroups

I run some real time processes on my server and
with the proper kernel scheduler and proper cgroups
isolation those real time processes run at full priority
without a single notice or interference in others (procs.)

For the record... It is not relevant if you use
systemV rc.local to manage priority or those systemd
crappy textual scripts..

What matters is the proper implementation rules of
your cgroups ( and your scheduler... of course)

Sound typical case of benefit

Paul

mag_therm · « **Reply #4 on:** December 21, 2020, 05:51:50 pm »

Thanks PK
The first deployments will be on Fedora 32 and it looks like Red Hat introduced "CGroups v2" in Fedora 31, saying that there is no user experience yet.

I think I will stay with the systemctl services. The problem I have is, those directives are not well described anywhere, that I can find.
I think there may also be a bug in the Priority directive as functionality seems to be the same regardless of the setting.

Is there any graphical UI better than Task Mmanager to see what is going on?
I could use a tracer to see what process ran on what core and at what time etc.

PKTKS · « **Reply #5 on:** December 21, 2020, 06:25:41 pm »

from my own experience with real time issues...

I would say the following:
- you definitely need a custom kernel
- your kernel should match your needs
- that being: cgroups namespaces and scheduler

My tool to have a clue about process niceness is HTOP
anything can virtually be customized there.. as needed

Paul

mag_therm · « **Reply #6 on:** December 21, 2020, 07:01:10 pm »

htop is just what I needed.
Thanks.

ejeffrey · « **Reply #7 on:** December 21, 2020, 09:01:44 pm »

You don't want SCHED_RR or SCHED_FIFO. Those are for realtime tasks. All SCHED_RR and SCHED_FIFO tasks have higher priority than all SCHED_OTHER tasks, and will always get priority if runnable (i.e., if not waiting on IO). That means that an infinite loop, or even just any CPU bound computation can be a complete DoS for the whole system. These should only be used by applications that need good latency and frequently yield the CPU on their own. The RR vs. FIFO just controls how tasks in the same scheduling priority are handled.

Nice, and the newer more flexible methods like CGROUPS are all within the SCHED_OTHER scheduling class where the OS tries to make sure that everyone gets enough time.

The default scheduler calculates a dynamic priority that takes into account how long a process has been waiting to run, whether it used up its last timeslice or blocked waiting for IO, and other factors to try to acheive the following:

* IO bound processes (including interactive applications) are expected to run for a short period of time and then block so they are run as soon as possible. They don't use much CPU time and generally don't use less if they are made to wait, so it makes sense to run them first
* Processes that need lots of CPU time receive a fair share
* Try not to sabotage overall performance by excessively trashing caches in the pursuit of the above

Old school nice just worked by adding or subtracting a number from the calculated dynamic priority before deciding who to run. Normally "interactive" processes already get prioritized over long-running CPU bound processes, but it is also common for an interactive application to do something CPU bound for a few seconds. Without "nice", after a short burst these processes would have to share time equally with background processes. But making those background process nice will reduce the impact. The overall computation is the same, so when the process has been waiting long enough its priority will eventually become high enough to overcome the nice penalty.

CGROUPS adds a lot more flexibility both in terms of the type of control and to do it by a group of processes such as a container, but the idea is the same.

Quote

Actually the processes need to run at the highest priority without upsetting the users too much, as there is a master running a nice =0, owned by the user.
Some jerkiness to the user is OK because the machine is dedicated to just solving and running servers..

Prioritizing number crunching over interactive user tasks will normally not result in the number crunching getting done materially faster. The reason is that when the user takes an action that requires X CPU seconds to process, it consumes the mostly the same resources whether it completes in X seconds or is spread out over 5*X seconds at 20% CPU. Unless you make things slow enough that the users stop using the system as much, you aren't going to make more time available for number crunching. Increasing the priority really only works if you have multiple number crunching tasks that should share time in an unequal ratio.

mag_therm · « **Reply #8 on:** December 21, 2020, 09:44:34 pm »

Thanks Jeff

There are 6 cores on the Intel 8700 and 5 processes that solve the numbers. They have to run in parallel,
and typically take 15 seconds each and iterate 10 to 20 times to a solution in 200 seconds or so.
The whole computer is dedicated to this. The users' process controls the parallel processes, by reading them sequentially and restarting them about 50 millisec staggered, until a solution is obtained.
I have htop running now and with the systemctl set at fifo I can see the 5 cores running up to 100% each, which is what is needed for fastest solution.
I have the Intel hyperthreading OFF in the UEFI.
I think I can see on htop that linux is cycling the lowest duty core (Doing the user's process, ssh, vnc and graphics etc) over the 6 cores.
Edit , By the way, the 5 processes are owned by root,are not children of user.

andersm · « **Reply #9 on:** December 21, 2020, 11:46:21 pm »

You can get 100% processor utilization running your processes at idle priority, that's not a good marker for anything. Have you measured the performance without playing around with scheduling or priorities?

ejeffrey · « **Reply #10 on:** December 22, 2020, 04:41:25 am »

You will always have 100% CPU utilization as long as there are processes able to run, so that doesn't tell you much of anything. You should see the same if you nice +20 the processes. Do keep in mind at various times there have been oddities in how the CPU usage is reported, so even if you see something odd there it doesn't necessarily mean what you think. YOu really need to be looking at your application performance not system status.

Do you have any ability to change the # of workers, or is 5 set by the underlying problem? Does one batch have to completely finish before the next one starts? Can you run more than 1 at a time?

It probably makes sense to turn off HT. That could hurt performance in older CPUs. I have heard newer processors have minimal downsides, but it isn't likely to help you so leaving it off makes sense.

I think it is expected behavior in a system like this for the hot threads to slowly migrate between cores. It should be slow enough to avoid major hits from the cache misses, but I'm not sure. The more important case here is when you have 7 processes on 6 cores you don't want to have 2 of them get only 50% CPU time while the other 5 get 100%. In any case if you want to avoid CPU migration you can use CPU affinity to pin them, but I wouldn't do that without pretty convincing performance measurements. That is the sort of thing that in the best case is only going to be a few % performance improvement, but if it goes wrong in the future it can be a bit hit such as if you change the # of active processes on the system without fixing the affinity settings. It may be worth it, but I would want to actually measure the performance improvement rather than just doing it "because".

I'm assuming there is no budget for an upgrade, but keep in mind there are 10 core consumer CPUs from Intel and 16 cores from AMD (if your application uses the Intel MKL AMD may not be an option).
That would let you run 2 or 3 batches of 5 processes in parallel. I don't know if that would actually be an advantage in your application but it might be worth considering.

PKTKS · « **Reply #11 on:** December 22, 2020, 01:08:49 pm »

Quote from: mag_therm on December 21, 2020, 05:51:50 pm

Thanks PK
The first deployments will be on Fedora 32 and it looks like Red Hat introduced "CGroups v2" in Fedora 31, saying that there is no user experience yet.
(..)

You made me wonder how things have changed past decades...

I have used "RedHat" (quoted because it is not the same thing)
long time ago in the 90's ... circa 95/98 while using Slackware
before which is a totally different environment..

But.. "RedHat" has being trapped on the cash pitfall
and since the "fedora" thing started I have switched
the Debian side (real Debian BEFORE sytemd - aka potterix-py)

Now... changes again led RedHat (the thing IBM now owns)
into some sort of that limbo land in which IBM probably
only cares about large volume licenses and give a shit
to users ... fedora and or CentOS or (gone) ScientificLinux

The Potterix-py thing started to implement those stand
alone license volume model.. crapping whatever middle man
or user or sysadmin in the way..

So the future or that Fedora (Potterix-py wayland) thing
is rather cloudy.. Fortunately I diverted both dependencies
( RPM and DEB ) long ago..

But I wonder why anyone now.. would stick with that probably
doomed and cloudy diverted from UNIX path... having far more
choices today than early 90 s or 00 s .. ..

I would be very skeptical translating whatever may be from
those environments to today .. thinking that far more choices
(and better ones) already exist.. Arch (with systemd) or
preferably a fork w/regular sysV unix reliable environment..

I would really change whatever "RedHat" legacy today for Arch
or other better more reliable scenario...

Debian is also trapped in that Potterix-py wayland pitfall
and seems mostly buried there...

2 cents of wonder..
Paul

mag_therm · « **Reply #12 on:** December 22, 2020, 03:11:10 pm »

Over past 10 years, I have upgraded two times to use the largest number of cores available in fanless industrial boxes in a tough factory floor environment where fans
are not feasible.
At present the 6 core 8700 is the largest # cores available in the milled aluminum semi-sealed 12V boxes from the vendors we use.
They run on a machine level lan, not connected to the internet.

In about 2012 I looked closely at CUDA, but it would have been a big redevelopment also the GPU modules needed fans and a 19 inch rack.

ejeffrey comment:

"Do you have any ability to change the # of workers, or is 5 set by the underlying problem?
Does one batch have to completely finish before the next one starts? Can you run more than 1 at a time"

Your comment is valid and I looked at this before the latest 2020 upgrade. The user's master process could easily be upgraded to handle more groups of 5 workers.

With 5 cores running on the 8700, and using the systemctl service, ( the way I have it, for better or worse) we have a significant speed increase.
That is, compared to the previous 4 core version where time slicing was necessary, with the workers as background processes in rc.local without assigning any niceness.

For PK's comment about Fedora. I have used it since 2007, changing to the x86_64 as soon as it was available in 2013 I recall.
There were changes in about 2016 that affected my application.
One was elimination of the HAL layer, another was the re-naming of LAN interfaces.
Now systemctl.
Fedora has been quite reliable for the application overall.
Looking back, there have been no box failures, about 3 CompactFlash failures before changing to the lower capacity industrial version.
We now use industrial CFast.
Apart from that there were a few o/s failures from "friendly fire" , well meaning client's engineers getting in to make an improvement, and crashing it.
I learned that in that case, it is only a waste of time trying to do a post-mortem, so just send out a new media.

Thanks for all the comments. I am retiring soon and have done this 2020 update with a young engineer who is familiar with linux.

PKTKS · « **Reply #13 on:** December 22, 2020, 03:32:59 pm »

hhmm

making more sense now...

But due to recent changes in RedHat itself (now IBM indeed)
and the probably doomed fedora CentOS panorama...

I am (myself in particular case) ditching anything fedora
redhat or equivalent for the time being in sight..

I can not even guess the income changes on that..

my hunch is not very good for that
Paul


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Linux systemctl Scheduling query (Read 1869 times)

mag_therm

Linux systemctl Scheduling query

andersm

Re: Linux systemctl Scheduling query

mag_therm

Re: Linux systemctl Scheduling query

PKTKS

Re: Linux systemctl Scheduling query

mag_therm

Re: Linux systemctl Scheduling query

PKTKS

Re: Linux systemctl Scheduling query

mag_therm

Re: Linux systemctl Scheduling query

ejeffrey

Re: Linux systemctl Scheduling query

mag_therm

Re: Linux systemctl Scheduling query

andersm

Re: Linux systemctl Scheduling query

ejeffrey

Re: Linux systemctl Scheduling query

PKTKS

Re: Linux systemctl Scheduling query

mag_therm

Re: Linux systemctl Scheduling query

PKTKS

Re: Linux systemctl Scheduling query

Share me