LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 09-24-2022, 06:11 PM   #1
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Rep: Reputation: Disabled
Blank screen on kernel load from grub


Got a bit of a brain-buster here and i'm not sure if anyone can help me, but i'll post anyway just on the off chance someone has any ideas of what i might try next.

First, i've been maintaining my own linux "distro" for the past 20 years, everything is compiled from source and i've implemented some custom package management via encap (similar to gnu stow, basically symlinking /usr from a directory of packages). I also treat the kernel as a package of sorts.

My home network is running off a central server and essentially workstations are just a deployment of the system software with /home (and a few other things) mounted over nfs from the server. Deploying a workstation is pretty easy, i boot up the machine with a system-rescue.org usb stick, partition the drives, copy the system software over from the server, set up an EFI partition using grub and i'm good to go.

I have a ryzen machine i bought in 2020 that i provisioned in just this way and has been my primary workstation since then, with everything working really nicely.

A few days ago my server crashed and i couldn't find a usb stick to save my life so i pulled the SSD software RAID drives out of it and put them in the rzyen machine to test. I've since found my usb stick and have managed to get the server back up and running - in the end it looks like the EFI bios essentially "forgot" about my grub install and it was defaulting to trying to boot using the old MBR method. I just had to re-run grub-install to fix it.

However, since i took the server drives out of it my ryzen workstation won't boot. Grub loads off its NVME drive EFI partition, but as soon as it tries to boot the kernel, i just get a blank screen and no activity. No messages at all to indicate what's going on, just a blank screen.

I've gone in with the systemrescue stick and tried just about everything i can think of, including a complete rebuild like i did when i first set it up in 2020. I also tried a rebuild onto a spare SATA SSD instead of the NVME drive. I've tried the last three kernels that are booting all my other machines just fine. No matter what i try, just a blank screen when grub tries to load the kernel.

This is trying to boot my custom kernel off the hard drive. The systemrescue usb stick boots just fine. I can even get it to use its "findroot" mode to find the nvme partition - i had to remove the /sbin/init symlink and replace it with the binary, but then it worked, though the sysresc kernel doesn't support a lot of stuff i need for a proper boot. i've also tried booting with a gentoo install stick, which also worked fine. Finally i tried an arch install disk, and actually installed arch onto the SSD, and that was able to boot fine.

My first assumption was that the kernel was somehow corrupted, or the partition/filesystem itself was (even though grub could see the files on it fine). But after several re-installs where i re-partitioned everything and copied all the known-good files from the server, that can't be it.

I figured maybe it was some BIOS setting but other x86_64 kernels work just fine.

I've recovered from a lot of odd situations over the past 20 years but i'm just not having any luck here. Any idea how i could get *some* kind of error message or anything to see more about what's maybe going on?

Happy to provide more specifics...

Thanks
 
Old 09-24-2022, 07:10 PM   #2
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
Stabbing in the dark ... I'd do BIOS reset to clear CMOS.
 
Old 09-24-2022, 07:37 PM   #3
colorpurple21859
LQ Veteran
 
Registered: Jan 2008
Location: florida panhandle
Distribution: Slackware Debian, Fedora, others
Posts: 7,375

Rep: Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593Reputation: 1593
have you tried putting a video=1024x768 and/or somethihg similar on the linux line of the grub menuentry?
 
Old 09-24-2022, 08:20 PM   #4
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by colorpurple21859 View Post
have you tried putting a video=1024x768 and/or somethihg similar on the linux line of the grub menuentry?
I hadn't, but good thought. I did try nomodeset and some different connection options, though - i've normally used a displayport cable but the video card also has HDMI out, which i tried to no avail. Just tried the suggested video=1024x768 as well as the native resolution of the monitor, also no difference. I'll investigate the various video parameters, but something tells me this isn't the issue...

Also note i can tell there's video output, as the monitor remains on (doesn't go into sleep mode), just blank.

Also i've tried using the system rescue stick version of GRUB to load the kernel off the NVME disk, and that had the same effect i believe. Which really makes it seem like it's the kernel itself? Though it's the same one that's booting all my other machines and worked on this machine too before all this started...

Last edited by kalaleq; 09-24-2022 at 08:27 PM.
 
Old 09-24-2022, 08:26 PM   #5
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Emerson View Post
Stabbing in the dark ... I'd do BIOS reset to clear CMOS.
I'm willing to try any sort of voodoo at this point...

I saw there's a recent BIOS update for my board (MSI MAG Tomahawk X570) so i installed that, reset a few things to my liking (numlock and fullscreen logo and such), booted into the rescue stick and re-installed grub.

Same result.
 
Old 09-24-2022, 08:40 PM   #6
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
I take you did not reset your BIOS?
BIOS upgrade won't reset CMOS in full. In case your CMOS is corrupt only hard reset [with jumper] clears it.
 
Old 09-24-2022, 09:01 PM   #7
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Emerson View Post
I take you did not reset your BIOS?
BIOS upgrade won't reset CMOS in full. In case your CMOS is corrupt only hard reset [with jumper] clears it.
You are right of course. Long time since i've had to mess with jumpers! I figured out the method for my board, did it, verified on boot that CMOS was indeed cleared... and, still the same result. Was worth a try!
 
Old 09-24-2022, 09:09 PM   #8
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
OK, has this box ever booted from NVMe? It dawns me some motherboards are choosy, mine for instance has two NVMe slots, but only one is bootable.
 
Old 09-24-2022, 10:28 PM   #9
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Emerson View Post
OK, has this box ever booted from NVMe? It dawns me some motherboards are choosy, mine for instance has two NVMe slots, but only one is bootable.
Yep, before i opened it up and tested the server SSDs it was booting just fine from NVMe. It just... stopped. But as noted i did also try moving everything over to a SATA SSD and had the same issue.

It really does look like it's something to do with the kernel itself, given that other kernels boot on the machine, and i can get one of those kernels to mount the root partition and start up the system with limited functionality. I've even installed one of those kernels (arch) on the SSD and booted it from there, but mine won't boot. I haven't tried another kernel on NVMe yet but that sure feels like a red herring at this point...

It just doesn't make sense as like i said, it's the same kernel that was booting this machine before, it's the same kernel that boots all my other machines (no other ryzens, but a couple of althon FXs and an intel NUC), and i've re-copied it several times from the server. So it can't be that the kernel is corrupt, or the file system it's on is corrupt...

I'm stymied. I'll keep poking at it and if there are any other suggestions i'll try them... i'm sure something will work eventually as the machine is obviously okay, but man, this has me scratching my head.
 
Old 09-24-2022, 10:30 PM   #10
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
I'm wondering if there's some way to run a kernel other than getting grub to start it. I mean, it's also the same kernel i've got installed on a libvirt/qemu virtual machine and works fine there too...
 
Old 09-24-2022, 10:35 PM   #11
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
rEFInd is another loader.
EFI stub kernel is loaded directly by EFI BIOS, no loader needed.
 
Old 09-24-2022, 10:47 PM   #12
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
Does your kernel require initramfs? If yes then failure to load it may be your problem.
 
Old 09-24-2022, 11:52 PM   #13
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
No, no initramfs required, though i was using one before all this happened, just for early loading of a non-critical SSD RAID0 array, which has since been scrapped anyway so i could try booting from one of the SSDs. I am still using the initramfs but i've tried both with and without.

Besides, wouldn't there be at least a little kernel output before it failed, if that was it? What's making this so hard to diagnose is the total lack of output.

I feel like if i could get the kernel to say *anything* at all, i'd have something to work with.

Last edited by kalaleq; 09-24-2022 at 11:54 PM.
 
Old 09-25-2022, 12:44 AM   #14
mrmazda
LQ Guru
 
Registered: Aug 2016
Location: SE USA
Distribution: openSUSE 24/7; Debian, Knoppix, Mageia, Fedora, others
Posts: 5,859
Blog Entries: 1

Rep: Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074Reputation: 2074
Quote:
Originally Posted by kalaleq View Post
I'm wondering if there's some way to run a kernel other than getting grub to start it.
This is what kexec is made for. Boot some kernel normally, then try loading the problem kernel via kexec.
 
Old 09-25-2022, 01:59 AM   #15
kalaleq
LQ Newbie
 
Registered: Mar 2021
Posts: 27

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by mrmazda View Post
This is what kexec is made for. Boot some kernel normally, then try loading the problem kernel via kexec.
Interesting! I'd vaguely heard of kexec but never really looked into it. Thanks, will do so tomorrow and see if it gives me any more clues! (or just another blank screen...)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Screen flashes and turns blank after resuming from blank state. Possibly a kernel bug. Fedora 35 with 5.14 kernel noname01 Linux - Newbie 3 11-21-2021 07:45 PM
Infinite Grub Loop: GRUB GRUB GRUB GRUB GRUB GRUB GRUB GRUB GRUB GRUB... beeblequix MEPIS 2 11-02-2013 10:56 PM
Booting my new ubuntu install = "GRUB GRUB GRUB GRUB GRUB" etc. dissolved soul Ubuntu 2 01-13-2007 12:55 PM
Virtually blank screen on Fedora 4 load JohnLocke Linux - Laptop and Netbook 4 08-11-2005 09:34 PM
Blank screen after trying to load 9.2 who kid Mandriva 6 03-11-2004 12:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 12:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration