process is hanging on high cpu load even if its priority is set to the maximum

yucefrizk · 06-18-2010, 10:11 AM

Hi Experts,

Two applications (app1 and app2) are running on a server with 64-bits 4 quad core CPU and RHEL OS

Code:

# uname -a
Linux ADM 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

I set nice of these applications to -5 and rtprio to RT:

Code:

top - 12:30:33 up 49 days, 11:15,  3 users,  load average: 6.34, 3.33, 1.95
Tasks: 257 total,   2 running, 255 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.4%us,  0.8%sy,  0.0%ni, 93.8%id,  0.6%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:  32960228k total, 30457160k used,  2503068k free,   609396k buffers
Swap: 65544896k total,      300k used, 65544596k free, 18648080k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27476 user1     RT  -5 23192 5908  532 S   27  0.0   2:06.65 app1
27438 user1     RT  -5 23192 5908  532 S   23  0.0   2:05.57 app2
13722 user1     15   0  122m  15m 7888 S    8  0.0   7714:35 app3
 7614 root      10  -5     0    0    0 D    6  0.0  91:45.54 kjournald
13782 user1     15   0 1360m 1.2g 7968 R    4  3.9   2597:21 app4
13743 user1     15   0 1360m 1.2g 7968 S    2  3.9   2597:28 app5
13769 user1     15   0 1360m 1.2g 7968 S    2  3.9   2591:10 app6

even with the high priority of app1 and app2, when I run a dd command that consumes 100% of its core, these two applications will hang, shouldn't this "dd" command free the CPU for these applications when it's running with them on the same core?!

P.S: app1 and app2 are applications that interact with the sctp stack and time response is critical for them.

timmeke · 06-18-2010, 10:25 AM

I think there will always be some interruptions of app1 & app2, even if they are relatively smaller/less frequent with highest priority.
If you don't want any interruption, a variation of realtime linux http://www.linuxfordevices.com/c/a/L...ference-Guide/ could be more suitable than a (more general purpose) redhat box.

Of course, if you can manage it, it could be useful to dedicate 2 of your cores to app1 & app2 and use the other 2 for most of the other apps, like dd.

yucefrizk · 06-18-2010, 11:15 AM

thx timmeke, there's a procedure to isolate cores from RHEL http://kbase.redhat.com/faq/docs/DOC-15596 and when I will start my applications I'll force them to start on the isolated processors.

AFAIK, irqbalance is responsible of distributing jobs on the CPU cores, cannot I configure this process to never run a job on my applications core? (in case I don't want to totally isolate them)

timmeke · 06-21-2010, 02:06 AM

If I were you, I'd focus on the posted procedure.

I can understand the reluctance to completely disabling irqbalance, as proposed in the procedure, and rather somehow "reconfigure" it.
Do keep in mind that irqbalance does not balance jobs/processes, only the handling of hardware interrupts by the different cores - assigning
your jobs to the cores (and keeping other jobs from running on the same core) is just part of the story.

From a quick search, there doesn't seem to be an option to configure irqbalance - it only seems to be designed for a sophisticated "equal load" balance. That is, except by disabling it and setting the irq affinity to cpus yourself as outlined in the procedure. But that's a question better suited for the hardware forum...

It's probably either irqbalanced or manually set (in /proc/...), not both.

As I understood your goal, you may want to dedicate cores not just for running app1 and/or app2, but maybe even for handling the device interrupts that need to go into the sctp stack (which in turn is polled by your apps). If you limit these interrupts to just one core, and keep other interrupts away from the same core, should help you process them more timely.

Question remains which cores to pick - this would be architecture related (as the 4 cores are not always completely working independently from each other).

So, in practice, I'd recommend:
1- Just to be sure, make sure you have a bootable CD (Live CD,...) on stand-by in case it does get messed up.
2- Having a think about the which cores to dedicate and how, then try the posted procedure.
3- If you run into trouble with the irqs, post to the hardware forum here on LQ.

yucefrizk · 06-21-2010, 02:32 AM

Quote:

If I were you, I'd focus on the posted procedure.

In fact that's what I'm doing right now on a test server.

Quote:

From a quick search, there doesn't seem to be an option to configure irqbalance - it only seems to be designed for a sophisticated "equal load" balance. That is, except by disabling it and setting the irq affinity to cpus yourself as outlined in the procedure. But that's a question better suited for the hardware forum...

In the following page they said that's possible to disable irqbalance for specific isolated CPUs

http://www.redhat.com/docs/en-US/Red...s_Binding.html

Quote:

As I understood your goal, you may want to dedicate cores not just for running app1 and/or app2, but maybe even for handling the device interrupts that need to go into the sctp stack (which in turn is polled by your apps). If you limit these interrupts to just one core, and keep other interrupts away from the same core, should help you process them more timely.

How can I specify a core for handling only SCTP interrupts?

Code:

Question remains which cores to pick - this would be architecture related (as the 4 cores are not always completely working independently from each other).

My server has a 4 quad CPU, so I'll isolate the last four cores.

I'll do my tests and get back to you

Anyway, many thanks for your reply

syg00 · 06-21-2010, 02:48 AM

I've seen that article before - and I don't like it. cgroups (aka cpusets) is a better option IMHO.
Let's see the result of this - "grep -i ^processor /proc/cpuinfo"

yucefrizk · 06-21-2010, 02:51 AM

Quote:

Originally Posted by syg00

Let's see the result of this - "grep -i ^processor /proc/cpuinfo"

processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7
processor : 8
processor : 9
processor : 10
processor : 11
processor : 12
processor : 13
processor : 14
processor : 15

syg00 · 06-21-2010, 03:06 AM

Good - just making sure. Try this to see which tasks are in uninterruptible sleep

Code:

top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'

yucefrizk · 06-21-2010, 03:29 AM

Quote:

Originally Posted by syg00

Good - just making sure. Try this to see which tasks are in uninterruptible sleep

Code:

top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'

when executing high IO load, I can see my applications in uninterruptible sleep status (in one of many snapshots):

Code:

top - 11:19:45 up 52 days, 10:04,  4 users,  load average: 6.81, 3.96, 2.54
Tasks: 258 total,   2 running, 256 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us,  0.8%sy,  0.0%ni, 93.7%id,  0.5%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:  32960228k total, 32803420k used,   156808k free,   975820k buffers
Swap: 65544896k total,      300k used, 65544596k free, 18799108k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14108 user1    16   0 23188 5904  532 D    8  0.0  90:21.48 app1
 7614 root      10  -5     0    0    0 D    4  0.0  99:57.97 kjournald
14142 user1    16   0 23188 5904  532 D    4  0.0  90:11.81 app2
 6765 root      15   0     0    0    0 D    0  0.0   0:00.21 pdflush
 6804 root      15   0     0    0    0 D    0  0.0   0:00.31 pdflush
 6880 root      15   0     0    0    0 D    0  0.0   0:00.29 pdflush
 6916 root      15   0     0    0    0 D    0  0.0   0:00.01 pdflush
 6936 root      15   0     0    0    0 D    0  0.0   0:00.00 pdflush
Total status D: 8

yucefrizk · 06-21-2010, 09:10 AM

Noticed that always these applications are hanging after entering in this uninterruptible sleep status, will isolating their CPUs solve the issue?

P.S the IO activities (dd command...) are running on the same disk

timmeke · 06-21-2010, 10:27 AM

Maybe you can tell me a little more on what your apps will be doing on the hard disk? Do they impose a heavy IO load as well?

yucefrizk · 06-21-2010, 10:58 AM

Quote:

Originally Posted by timmeke

Maybe you can tell me a little more on what your apps will be doing on the hard disk? Do they impose a heavy IO load as well?

The applications are reading from SCTP socket, parsing the received information, returning results and writing in log files.

Basically the traffic they are receiving is huge and they impose heavy IO load, but what I see from the output of "top" the total CPU load on their own processor do not go more than 40% and the iowait percentage is 0% as seen below:

Code:

top - 19:01:16 up 52 days, 17:46,  3 users,  load average: 1.53, 1.64, 1.79
Tasks: 252 total,   1 running, 251 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.9%us,  0.7%sy,  0.0%ni, 92.8%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:  32960228k total, 25096800k used,  7863428k free,   124332k buffers
Swap: 65544896k total,      300k used, 65544596k free, 13987100k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29850 user1     5 -10 23188 5896  524 S   28  0.0  16:59.87 app1
29815 user1     5 -10 23184 5896  524 S   25  0.0  16:57.52 app2

yucefrizk · 06-22-2010, 02:28 AM

One more info, if I run the below shell script that consumes a lot of CPU but without any IO activities, my applications keep working fine, so basically the problem is encountered only during the presence of high IO activities on the machine.

Code:

#!/bin/bash

while [ 1 ]
do
i=0
if [ 3 -ge 2 ]
then
        let i++
fi
if [ 2 -le 3 ]
then
        let i--
fi
done

In this case will my issue be solved by isolating the CPUs? any other suggestions? thanks

Code:

top - 10:24:03 up 53 days,  9:09,  2 users,  load average: 2.37, 1.76, 1.45
Tasks: 245 total,   3 running, 242 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.2%us,  1.0%sy,  0.0%ni, 88.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  32960228k total, 23375120k used,  9585108k free,   254148k buffers
Swap: 65544896k total,      300k used, 65544596k free, 12349956k cached

  PID USER     PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18073 user1    25   0 63796 1068  908 R  100  0.0   3:03.39 cpu.sh
 2813 user1    15   0 23176 5888  528 S   20  0.0  91:03.08 app1
 2778 user1    15   0 23176 5888  528 S   19  0.0  90:56.56 app2

timmeke · 06-22-2010, 02:38 AM

As IO seems the bottleneck, try sorting that out first (as this will have the biggest impact).
A suggestion would be to use the (somewhat old-style) ramdisk to store the logfiles.
You'll need to figure out a way to sync them to the hard drive occasionally.

When IO has improved, the bottleneck may shift to CPU, in case the isolation may become necessary to improve further.

Bottom line, don't write it off just yet... it doesn't look as promising now, but it may still come in handy later.

syg00 · 06-22-2010, 04:15 AM

I/O isn't necessarily the problem - uninterruptible sleep is generally thought to be caused by (disk) I/O, but not necessarily. It just an attribute of a process. And as stated the %wa is zero - that means no tasks are waiting to use the (any) CPU whilst I/O is outstanding. Could be hard to track anyway with that many online CPUs.
In this case I'd say poor code - presumably one of the applications under discussion or a device driver.

kjournald and pdflush are kernel threads - I wouldn't expect them to be in "D" state under a heavy I/O load. The fact that there are so many pdflush processes might indicate the (disk) I/O is very bursty. pdflush is spawned as needed to write the data to disk (after a sync say). I would expect them to go away after a period of I/O inactivity.
I would guess the SCTP driver is holding up the apps whilst decryprting (or whatever), and then dumping a heap of I/O, then doing it all again.
For single threaded code with that many CPUs, I can't see trying to bind processes to CPUs is going to help at all.

Just guessing of course.