I have a ESXI host (Dell T620) that was being working away from approx 2 years without an issue.
Last week I ran into a problem:
loading /b.b00 loading /jumpstart.gz loading /useropts.gz loading /features.gz loading /k.b00 loading /vc_intel.b00 loading /procfs.b00 loading /vmx.v00 CRC error during decompression. Received CRC (0x8ae2a503) != calculated CRC (0x886cc395) gzip_extract failed for /vmx.v00 (size 0): CRC error Error 20 (CRC error) while loading module: /vmx.v00 Compressed MD5: 0000000000000000000000000000 Fatal error: 28 (CRC Error)
and during a subsequent reboot I saw:
Mutibit ECC errors were detected on the RAID controller If you continue, data corruption can occur.
I Download a clean ISO of 7.0U2B and installed on clean USB drive on the internal port.
Issue are resolved, no data corruption on the datastores.
This morning I arrive at the office and users have no access to the internet (pfsense). I find some VM’s are frozen and not accessible, even via host web UI. Try a soft reboot and it doesn’t respond. I power cycle via power button and it hangs on reboot:
nfsclient loaded sucessfully
I try to replicate fix from last week and the installer freezes:
Loading /EFI/Boot/boot.cfg UEFI Secure Boot is not enabled Failed to load crypto64.efi : not found Falling back to internal crypto suite Loading /b.b00 Fatal error: 15 (Not Found)
- USB drive (Internal Port) was swapped and the install on a clean drive was again not sucessful.
- Life Cycle Controller was in Maintenance Mode
- USB driver were placed into the rear of server chassis, and install progressed to user options screen.
- Attempted to boot from original USB, originally placed in the internal USB port from a rear port. Boot stalled as before.
- Commenced install of 7.0U3B which was successful.
I guess that these errors are something to do with internal USB port, but I am completely guessing. Has anybody ever seen anything like this, or have any idea of what I can do to narrow down the root cause. It is working now, but I would like to try understand the cause to prevent it happening again.