Memtest failures
Background
One of the most common causes of "random crashes" is bad memory. Proper server hardware has ECC memory which can detect and correct most occurrences. For cheap development hardware and most home PCs, ECC memory is sadly not usually an option and very rarely used when it is available.
In those cases you need to use a dedicated memory tester such as memtest86 or memtest86+. We've often found bad memory within the first run (which takes only a minute or two). Note that BIOS memory testers are next to useless in our experience and speaking personally, I always turn it off on home systems to reduce boot time.
The problem
Recently however, memtest86+ (our usual choice) has stopped working on many systems - typically those with larger amounts of memory installed. The symptoms seen are ones such as "error: too small lower memory (0x99100 > 0x96000)", "Error 28: Selected item cannot fit into memory" or perhaps "Address -x1000 is out of range".
The solution
Fortunately help is at hand. Using the suggestions on this bugreport, we recompiled memtest86+ with a different load address and used the ELF version. Note that the ELF version supplied in Debian does not have this alteration and did not work for us. Even the recent v4 memtest86+ does not work in either variant.
Firstly, download our patched memtest and save it into /boot/memtest. Now alter your boot loader appropriately - the exact detail varies according to the distribution, however what you want to end up with is something like these examples:
For GRUB:
title Memtest86
root (hd0,0) # Change this for your setup
kernel --type=netbsd /boot/memtest
For GRUB2:
menuentry "Memtest86" {
knetbsd /boot/memtest
}
| Attachment | Size |
|---|---|
| memtest. | 122.88 KB |