Hi all
I have a bunch of 5900 servers with 2x2tb nvme drives which run “production” vms., This is bomb proof and I love it
However, in each server I also have 4x2tb sata drives to act as general storage / backup tragets etc These are configured as md0 and attached through XO as usual as EXT based storage.
However, these are horribly unstable. Out of 6 servers only 1 seems to be still working as expected. Some show endless coalesce chains that wont rebuild. Some will run VMs other wont. For a few I have simply had to destroy md0 and create md1 and rename the SR to “dead do not use” as I cant even unplug them. (so far reattaching as LVM seems to be working )
The crazy incident that caused me to reach out is I have a test windows vm setup on one of the arrays (no other disks). This VM can be stopped /started etc and works perfectly well (for a windows vm running on a raid 10 sata)
However, I still cant scan the SR..
SMlog shows
May 13 10:56:13 48000 SM: [1886528] ***** Local EXT3 VHD: EXCEPTION <class ‘xs_errors.SROSError’>, The SR scan failed [opterr=uuid=9847acc7-8268-4b01-856d-670d6256fff5]
May 13 10:56:13 48000 SM: [1886528] File “/opt/xensource/sm/SRCommand.py”, line 385, in run
May 13 10:56:13 48000 SM: [1886528] ret = cmd.run(sr)
May 13 10:56:13 48000 SM: [1886528] File “/opt/xensource/sm/SRCommand.py”, line 111, in run
May 13 10:56:13 48000 SM: [1886528] return self._run_locked(sr)
May 13 10:56:13 48000 SM: [1886528] File “/opt/xensource/sm/SRCommand.py”, line 161, in _run_locked
May 13 10:56:13 48000 SM: [1886528] rv = self._run(sr, target)
May 13 10:56:13 48000 SM: [1886528] File “/opt/xensource/sm/SRCommand.py”, line 370, in _run
May 13 10:56:13 48000 SM: [1886528] return sr.scan(self.params[‘sr_uuid’])
May 13 10:56:13 48000 SM: [1886528] File “/opt/xensource/sm/FileSR.py”, line 208, in scan
May 13 10:56:13 48000 SM: [1886528] self._loadvdis()
May 13 10:56:13 48000 SM: [1886528] File “/opt/xensource/sm/FileSR.py”, line 294, in _loadvdis
May 13 10:56:13 48000 SM: [1886528] raise xs_errors.XenError(‘SRScan’, opterr=‘uuid=%s’ % uuid)
May 13 10:56:13 48000 SM: [1886528]
May 13 10:56:13 48000 SM: [1886528] lock: closed /var/lock/sm/760e624c-4383-6326-9138-958c14d59030/sr
any ideas?