Xcp corrupt vm issue

Magicker · June 15, 2023, 12:04pm

Hi all

1st post

in order to best utilise a server with 2x4gb drives I have installed xcp-ng and created 2 vms

1 on 1st drive / sr (2x2tb disks merged with lvm to create 4gb ish)
2nd on 2nd drive / sr (2x2tb disks merged with lvm to create 4gb ish)

These both do nothing other than run rsnapshot (this method halved the time it takes 1 server to back up the same number of servers)

So far good. I am really happy with the performance and everything was working great… till the file system on the second vm went read only… no problem… quick reboot and fsck… nope… this disk is trashed! 4 times through and still showing errors… lots of errors.

So… there is a good chance this is just a broken hard drive…

But

smartctl --test=long /dev/sdb

then

smartctl -l selftest /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.19.0+1] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Completed without error 00% 2934 -

2 Short offline Completed without error 00% 1904 -

3 Short offline Completed without error 00% 1904 -

So if there is no problem on the disk… how on earth did the VM get so completely trashed?

am I doing anything inherently stupid by setting up like this? All the vms does is to pull a bunch of files over the network once a night so not all that much work

LTS_Tom · June 15, 2023, 12:15pm

Just because Smart does not report an error does not mean the drive does not have any issues. Go through the logs in /var/log/xensource.log & /var/log/xenstored-access.log and look for error messages.

Magicker · June 15, 2023, 12:44pm

fab …
I am guessing none of this is good news for the drive…

LTS_Tom · June 15, 2023, 1:01pm

That’s hard to read, always better copy paste logs instead of screenshots, but yes that looks bad.