LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-09-2023, 03:08 AM   #16
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0

Quote:
Originally Posted by wpeckham View Post
Actually those numbers do not reflect CPU load at all as far as I can tell. They appear to be related to queue loading as measured by stack size and/or response time snapshots. Many things can affect those numbers.

I agree that this does not appear any kind of problem and a detailed investigation may not be a good use of your time, but if the machine were one of mine I would be VERY curious!
Checking the I/O, memory, and CPU usage of the processes running during the event window and comparing against results outside the window might prove instructive. IT is certainly something I would try.
Yes, and this is something I've tried to do. But the scripts I've run show me nothing out of the ordinary, and that's why I'm asking the question here if there's some other monitoring-command or stuff I can try - which perhaps will indicate to me what process to check out.

And yes, exactly, the CPU isn't swamped with work during this time - so I also believe it do be a queueing of processes/threads, that stumble upon each other a bit, wait states, which pushes the load average through the roof during a very short amount of time. The problem is, as I've written above, that this makes our monitoring of the server sad - since the high peaks f*cks up our chars. =(

Yes, very curious indeed =)
 
Old 10-09-2023, 03:12 AM   #17
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00 View Post
Yeah, me too.

But if the attempts the OP has already tried don't show anything, I'd be looking at some kernel traces. Hard to convince me it's more important than my weekend slacking off tho' ....

loadavg is probably the most misunderstood (and pointless) metric in linux IMHO.
Kernel traces sounds a bit too excessive for my taste, and I hope there's easier methods to just monitor and _find_ what process it is that's doing this at around that time. But we'll see. I'm thinking of starting the psacct service during the window now, and also hammering out more textfiles using pidstat. Hopefully that catches something.

loadavg is certainly pointless when people think of it like "CPU usage" - which is it not, really - but it's actually a quite good metric just for seeing stuff that stands out, like this... =(
 
Old 10-09-2023, 03:33 AM   #18
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by MadeInGermany View Post
Make a cron job that at 3:30 runs
pidstat >pidstat.out
I'm gonna start this monster at 03:25 now.

pidstat -l 1 400 >/tmp/ts.txt &
pidstat -r -l 1 400 >/tmp/tr.txt &
pidstat -t -l 1 400 >/tmp/tt.txt &
 
Old 10-09-2023, 03:38 AM   #19
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,998

Rep: Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338
you might want to run it several times, from 3:25 to 3:45 even every second (for example).
 
Old 10-10-2023, 02:42 AM   #20
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pan64 View Post
you might want to run it several times, from 3:25 to 3:45 even every second (for example).
I run it for 400 times, one time each second. So, over the period that something is happening. That's what the parameters "1 400" specify.

Also, my previous script runs as a daemon, and wakes up as soon as the load average is above a threshold and then logs a bunch of things for a minute or two. But that script misses the actual burst of jobs happening.
 
Old 10-19-2023, 12:42 PM   #21
metaed
Member
 
Registered: Apr 2022
Location: US
Distribution: Slackware64 15.0
Posts: 372

Rep: Reputation: 172Reputation: 172
Quote:
Originally Posted by elgholm View Post
that script misses the actual burst of jobs happening
The only tool I know of that is 100 % reliable at recording short-lived processes is process accounting. When enabled, process accounting logs every process when it exits, no matter how short-lived.

If there are no short-lived processes logged in the process accounting file around the time in question, then you can safely conclude it is a long-lived process, not a burst of short-lived processes.
 
Old 10-20-2023, 04:21 AM   #22
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
I think I've narrowed it down.
It's exim_tidydb that gets started by anacron a couple of minutes before 03:30.

Now I just need to investigate why it wreaks havoc with my load avg.

Thank you all for giving me some pointers! Really helpful, much appreciated, and also nice to not feel so alone out there =)
 
Old 10-20-2023, 04:40 AM   #23
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by elgholm View Post
I think I've narrowed it down.
It's exim_tidydb that gets started by anacron a couple of minutes before 03:30.
And just to quote myself... I don't even use this server as a mailserver, so, it's a standard exim install - just to be able to shuffle local mail. So it doesn't have any large databases to "tidy" up.
 
Old 10-20-2023, 12:13 PM   #24
metaed
Member
 
Registered: Apr 2022
Location: US
Distribution: Slackware64 15.0
Posts: 372

Rep: Reputation: 172Reputation: 172
Quote:
Originally Posted by elgholm View Post
It's exim_tidydb that gets started by anacron a couple of minutes before 03:30
How many messages in queue? exim -bp
 
Old 10-21-2023, 03:08 AM   #25
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by metaed View Post
How many messages in queue? exim -bp
2.

I don't even use exim on this server. It's just default installed.

What's funny is that when I put the anacron-job back, the spike reappeared, so it's most probably that. But when I run it manually, as root, exactly nothing happens - as is expected, since there's nothing to clean. But somehow it manages to find a bunch of stuff to "do" at 03:20-03:30 each day... Weird.
 
Old 10-21-2023, 03:54 AM   #26
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Update: I've not cleared everything in the /var/spool/exim/db directory, and restarted Exim - which is just a default setup (it doesn't handle any mail for real, just sends them of to another host).

The directory had some files in it, nothing big. I think the biggest one was like 1-200 KB or so.

Now it's just wait and see..
 
Old 10-21-2023, 06:11 AM   #27
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,819

Rep: Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211
Perhaps other stuff is running at that time that makes the disk slow.
Try to shift it to another time slot.
Low mail usage: then let it run at weekends only.
 
Old 11-02-2023, 02:38 AM   #28
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by MadeInGermany View Post
Perhaps other stuff is running at that time that makes the disk slow.
Try to shift it to another time slot.
Low mail usage: then let it run at weekends only.
Yeah, but it wasn't that. Still no idea what is causing this. =(
 
Old 11-02-2023, 03:49 AM   #29
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,998

Rep: Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338Reputation: 7338
What is causing what? Do you have any data/log/info/whatever to analyze?
 
Old 11-03-2023, 02:19 AM   #30
elgholm
LQ Newbie
 
Registered: Oct 2023
Posts: 24

Original Poster
Rep: Reputation: 0
Unhappy

Quote:
Originally Posted by pan64 View Post
What is causing what? Do you have any data/log/info/whatever to analyze?
No. That's exactly what I need help to find.
I've checked all the logs I know about, but there's nothing special in them. That's the issue.
I'm hoping for someone to perhaps point me towards something I haven't tried yet.
Attached you can see the load average for tonight... =(
Attached Thumbnails
Click image for larger version

Name:	boring.png
Views:	13
Size:	36.9 KB
ID:	41974  
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need Help with High Load Average & High Sleeping Task js9028 Linux - Server 3 07-20-2019 08:55 AM
[SOLVED] tesseract-4 (pdfsandwich) and high load average/CPU load kaz2100 Linux - Software 2 08-13-2018 09:02 PM
[SOLVED] Redshift transition from day-night extremely slow Lysander666 Slackware 8 08-05-2018 12:36 PM
Load average stay as high as around 1.00 lawrence_lee_lee Linux - Software 2 09-10-2008 01:22 AM
CPU high load every night invent Linux - Server 2 11-22-2007 10:36 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration