Proxmox Backup Server on TrueNAS

As I noted above to fix the error: From the TrueNAS UI, go to the dataset then click on the dataset you want to change then click the “Edit” button next to details, hit “Advanced Options” scroll down and look for “ACL Type”

PBS chunks are 4MB. A 1MB record size allows ZFS to store each chunk across exactly 4 records. This is much more efficient than the default 128K, which would require 32 records per chunk so will offer some performance improvements of maybe 5-15% on HDD and probably less so on SDD for the backups. The garbage collection jobs will probably see the most improvement.

1 Like

If you really want the best speed and performance but at the cost of storage efficiency then mirrors are the way to go.
I have a video and forum post on the topic here:

Yes, but probably not linearly for this specific workload. In ZFS, IOPS scale with the number of VDEVs, while throughput scales with the total number of disks. So it will be a faster overall with a 4th VDEV.

Changing the dataset record size to 1M would help as well. PBS chunks are 4MB. A 1MB record size allows ZFS to store each chunk across exactly 4 records. At 128K (default), your 3× RAIDZ1 VDEVs are doing a lot of small-block parity work. At 1MB, the writes become much more “sequential-like” and should boost performance a bit, especially when doing garbage collection.

Most of my demos are on freshly loaded / reloaded TrueNAS system, odd that some people have different defaults, unless they are nesting these under a dataset setup for a share which would have the other ACL type.

I have set it to 1M while creating dataset. I will try to add 4th VDEV.

Just was curious what speeds other users get with different pool setups

Good point Tom, I indeed created pbs-storage under proxmox dataset I already had voor pve-backups (as SMB/CIFS) and ISOs.

Thank you so much for this Tom. I was running PBS on it’s own hardware and was using the PBS server as a replication target for my TrueNAS. Now I have it reconfigured with 2x TrueNAS boxes. Fun Sunday afternoon project!

Is there a reference for this 4MB chunk size? Poking around in the .chunks folder I see sizes all over the place. Some are a few hundred KB, others a few MB, and many sizes in between. It would appear these are the compressed sizes. When uncompressed using a commandline such as

``proxmox-backup-debug inspect chunk CHUNKNAME --use-filename-as-digest false --decode /tmp/decoded``

Will decode chunk to file /tmp/decoded. These are indeed 4MB in size.

For purposes of this discussion, only the compressed size matters I believe. Since these appear to vary significantly depending on the particulars of the content being backed up, does it still make sense to use 1MB record size. I see many that are under 1 MB in size.

``
find . -type f -exec stat --format=“%s” {} + | awk ‘{if($1<131072) c1++; else if($1<262144) c2++; else if($1<512000) c3++; else if($1<1048576) c4++; else if($1<2097152) c5++; else if($1<3145728) c6++; else if($1<4194304) c7++; else c8++} END {print “Files < 128KB:”, c1, “\nFiles 128KB-256KB:”, c2, “\nFiles 256KB-500KB:”, c3, “\nFiles 500KB-1MB:”, c4, “\nFiles 1MB-2MB:”, c5, “\nFiles 2MB-3MB:”, c6, “\nFiles 3MB-4MB:”, c7, “\nFiles > 4MB:”, c8}’
``
Files < 128KB: 27411
Files 128KB-256KB: 35527
Files 256KB-500KB: 18413
Files 500KB-1MB: 30200
Files 1MB-2MB: 28099
Files 2MB-3MB: 9039
Files 3MB-4MB: 13804
Files > 4MB: 6216

This shows a distribution on the quantity based on range < 256KB, 256KB-500KB, 500KB-1MB, 1MB-2MB, 2MB-3MB, and 3MB-4MB. Command line courtesy of ai. As can be seen, majority are under 256KB.

It would seem a record size of 128KB, 256KB or 512KB might actually be a good balance.

Note, I do have compression disabled on this dataset as pbs does its own compression. Didn’t make much sense to add additional overhead trying to compress something already compressed.

Turns out in my setup that the PBS container did not inherit the Timezone- I was wondering why my hard drives were so noisy when my scheduled tasks weren’t supposed to kick off for another few hours. Small but thought maybe worth posting here. I also noticed that PBS says the DNS server is at 127.0.0.3 of all places- must be some TN container thing but it still pings all the right IPs, internal and external.

Interesting about the compression and record size between PBS or TN, I’m interested to see where this discussion goes. Though I just set mine to 1MB for now.

Looking at the chunk documentation here: Technical Overview — Proxmox Backup 4.1.2-1 documentation It looks like PBS for sure uses 4MiB chunks for block-based storage like VMs, and only mentions File-based as an alternative- so CTs are file based then, and thus use the dynamically-sized chunks- I’m assuming? If so, I wonder if doing a combination of “datastore1” with 4MiB record size for VMs only and “datastore2” with different settings for CTs? Or is this really getting way more detailed than is useful? :sweat_smile: Let alone getting into compression.

The documentation Technical Overview — Proxmox Backup 2.4.2-1 documentation says for block based backups (like VMs), fixed-sized chunks are used. The content (disk image), is split into chunks of the same length (typically 4 MiB). But it does use Dynamic chunking for file backups.

A more accurate way to look at this would be to look at not count based on file, but total volume by type. While you have many small files, they likely represent a tiny fraction of your actual disk usage.

Here is a script I used on mine

find . -type f -exec stat --format="%s" {} + | awk '{
    if($1<131072) {c1++; s1+=$1} 
    else if($1<262144) {c2++; s2+=$1} 
    else if($1<512000) {c3++; s3+=$1} 
    else if($1<1048576) {c4++; s4+=$1} 
    else if($1<2097152) {c5++; s5+=$1} 
    else if($1<3145728) {c6++; s6+=$1} 
    else if($1<4194304) {c7++; s7+=$1} 
    else {c8++; s8+=$1}
} 
END {
    printf "%-20s | %-10s | %-10s\n", "Bucket Size", "Count", "Total GiB"
    printf "------------------------------------------------------\n"
    printf "%-20s | %-10d | %-10.2f\n", "< 128KB", c1, s1/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "128KB-256KB", c2, s2/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "256KB-500KB", c3, s3/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "500KB-1MB", c4, s4/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "1MB-2MB", c5, s5/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "2MB-3MB", c6, s6/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "3MB-4MB", c7, s7/1024^3
    printf "%-20s | %-10d | %-10.2f\n", "> 4MB", c8, s8/1024^3
}'

And this is how my PBS datastore data is distributed

Bucket Size Count Total GiB
< 128KB 1,692 0.12
128KB–256KB 1,742 0.28
256KB–500KB 2,236 0.74
500KB–1MB 3,566 2.68
1MB–2MB 7,382 10.59
2MB–3MB 3,686 8.83
3MB–4MB 8,832 31.49
> 4MB 18,944 74.14

For me the 1M dataset block size makes sense, but if you have a different result because you are doing more file backups then a small block size would be better aligned.

That’s another perspective I hadn’t considered. PBS is used strictly for vm backups.

My output looks like this;

Bucket Size          | Count      | Total GiB
------------------------------------------------------
< 128KB              | 27567      | 1.59
128KB-256KB          | 35844      | 6.15
256KB-500KB          | 18525      | 6.52
500KB-1MB            | 30342      | 21.14
1MB-2MB              | 28181      | 38.69
2MB-3MB              | 9060       | 21.78
3MB-4MB              | 13826      | 49.12
> 4MB                | 6216       | 25.79

From the numbers above, bulk of the space is indeed used up by chunks larger than 1MB. In terms of performance, these jobs run during the night. Perhaps a restore would be quicker using the larger record size. Then again, restores are rarely performed - typical use case might be restoring vm after a failed upgrade.

In end it would seem one needs to balance overhead over performance. I chose overhead as my priority since the dataset resides on a 1TB nvme disk - goal is to maximize disk capacity. It does get replicated to another pool on a spinner daily. PBS garbage collection only happens on the nvme.


Here’s a thought for a future video (if you haven’t already addressed this) - Proxmox host backup. How to backup the entire host, proxmox settings, other settings, etc. That is if the disk proxmox is installed on fails, how to restore to prefail condition in shortest amount of time.

When using ext4, proxmox defaults to lvm for itself and all vm’s. Restoring backed up vm’s is not a big deal, but the host seems to be overly complicated.

I think there are some scripts to do that, but there is no official Proxmox way to do it. The idea is the hosts are supposed to be disposable and they mostly are but there is some setup work that needs to be done. Not sure if I will do that as a video or not (more likely not) as it’s not that much work to rebuild a host if you documented how you set it up.

is it possible to have a second bridge added to the lxc in truenas?
it would be lovely to have it reachable from different networks (:

Yes, you can add more interfaces to the container as bridges.

1 Like

nice (:
i even found it in the ui
sadly tho no matter what i tried inside the container the bridges have no network access :frowning:

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 10:66:6a:6d:3c:60 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.178.30/24 metric 1024 brd 192.168.178.255 scope global dynamic eth0
       valid_lft 7010sec preferred_lft 7010sec
    inet6 fe80::1266:6aff:fe6d:3c60/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever
13: eth1@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 10:66:6a:91:cf:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::1266:6aff:fe91:cf42/64 scope link proto kernel_ll 
       valid_lft forever preferred_lft forever

i n proxmox i have no problems adding two bridges to each lxc
in truenas it seams like somehow not every bridge is getting its IPs and routing breaks as soon as there is a second one added :confused:
thats real sad hopefully it gets better with the 26 update

//EDIT1:

that seams all really inconsistent.
i stopped the container and removed the 2nd bridge
now even the first one is without any ip O.o

even if i remove all and only re add one

# ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 10:66:6a:6d:3c:60 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::1266:6aff:fe6d:3c60/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever

//EDIT2

i noticed no matter what i do the SDN network i add does not create a config in
# ls /etc/systemd/networketh0.network

i compared it to proxmox lxcs where for each nic a config is created

//EDIT3

creating this file manually (copy pasted the eth0 to eth1 file and adapted it a little) worked to fix it (:
i guess truenas will have to do a little more work there (:

I followed this guide right when it came out, and everything seemed to work fine. Now I’m trying to move all of my VMs to a new server, and to do that I’m restoring my most recent backups. The restore process is moving at a glacial 1MB/s. I benchmarked the ZFS pool on the new server (3x mirrored pairs of Dell WD enterprise SSDs) and got about 1.5 GB/s write, an iperf3 test between the PVE server and PBS container returned 940 mbps, and an fio test on the PBS container (of the same underlying storage that PBS is using) shows a read speed of 1009 MB/s. I’ve tried with sync writes disabled on the destination pool, but that changed nothing.

I feel like I must be missing something massive and obvious for it to be performing this poorly.

Any thoughts?

Edit:

I should also note that the CPU usage is nearly 0 on both the TrueNAS host, the PBS CT, and the PVE host.

I also tried with backup encryption disabled, but the result was the same.

Backups run at about 60 MB/s

Restores take a lot of CPU power and a lot of calculations. This is really the wrong tool for migrating VMs. If you haven’t decommissioned the old server yet, doing a ZFS send will move your VMs at the full speed of your network

Something like:
zfs snapshot rpool/data/vm--disk-0@migrate zfs send rpool/data/vm--disk-0@migrate | \ ssh root@your_newServer zfs receive rpool/data/vm--disk-0

Unfortunately the old server isn’t ZFS. I also tried using Proxmox Datacenter Manager to do a migration, but got this error.

2026-03-15 09:32:20 ERROR: migration aborted (duration 00:00:01): error - tunnel command ‘{“with_snapshots”:1,“volname”:“vm-110-cloudinit.qcow2”,“cmd”:“disk-import”,“format”:“qcow2”,“migration_snapshot”:0,“allow_rename”:“1”,“export_formats”:“qcow2+size”,“storage”:“vm-pool-1”}’ failed - failed to handle ‘disk-import’ command - unsupported format ‘qcow2’ for storage type zfspool

If the restore process was so CPU intensive, I’d expect at least one of the CPUs in the chain to be at least a little higher than idle - maybe one core slammed to 100%. But I’m not seeing any extra CPU utilization during a restore. Also, I just tried a restore on another PVE / PBS pair, and that restore ran at 305 MB/s. Which seems more reasonable to me. That backup is also encrypted. The only difference there is that PBS is a VM on a Proxmox server, and has a RAID Controller backed virtual disk passed through to it.

First of all, a big thank you to @LTS_Tom for the great video!

At first, I had trouble getting the container to start because the user had a home directory in “/var/empty.” But I managed to get the container running once I changed that.
Otherwise, everything worked fine with the setup, but I’m now having issues with backups. After a lot of testing, I think I’ve found the problem, and it seems to be the onboard network card—or its driver—that’s causing the trouble.
I hope you might have an idea on how I can fix this.

Here’s the problem I’m having, backups keep failing with these errors:

Logs

INFO: 93% (46.7 GiB of 50.0 GiB) in 17m 9s, read: 0 B/s, write: 0 B/sERROR: backup write data failed: command error: write_data upload error: pipelined request failed: timed outINFO: aborting backup jobINFO: resuming VM againERROR: Backup of VM 103 failed - backup write data failed: command error: write_data upload error: pipelined request failed: timed out

INFO: 3% (3.0 GiB of 100.0 GiB) in 17s, read: 146.9 MiB/s, write: 145.1 MiB/sERROR: VM 1208 qmp command ‘query-backup’ failed - got timeoutINFO: aborting backup jobINFO: resuming VM againERROR: Backup of VM 1208 failed - VM 1208 qmp command ‘query-backup’ failed - got timeout

INFO: 1% (1.7 GiB of 100.0 GiB) in 6s, read: 269.3 MiB/s, write: 146.7 MiB/sERROR: VM 1205 qmp command ‘query-backup’ failed - got timeoutINFO: aborting backup jobINFO: resuming VM againERROR: Backup of VM 1205 failed - VM 1205 qmp command ‘query-backup’ failed - got timeout

INFO: 11% (11.1 GiB of 100.0 GiB) in 1m 15s, read: 155.4 MiB/s, write: 144.6 MiB/sERROR: VM 1208 qmp command ‘query-backup’ failed - got timeoutINFO: aborting backup jobINFO: resuming VM againERROR: Backup of VM 1208 failed - VM 1208 qmp command ‘query-backup’ failed - got timeout

After trying everything under the sun, someone suggested I try using a USB network adapter. Now that I’m running everything through it, I no longer have any issues, and all the backups have been completing successfully so far.
However, since I don’t want to use the USB network adapter permanently, I’m hoping one of you might know how I can fix the issue with the internal one.

The motherboard is a:
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: PRIME B450-PLUS

and the onboard network card is a Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

I really hope one of you has an idea about this.

If you need any more information, just let me know—I’m afraid I’m at a loss myself

Please excuse my English, which may be a bit strange, but I’m using a translator to help me.

Thanks in advance!

Regards, Cubefan

P.S. I should also mention that the onboard network card works fine otherwise—all traffic flows smoothly—but these issues occur during Proxmox backups.

This is darn awesome! A huge thanks!

1 Like

I think the dataset you create in TrueNAS might need a small adjustment:

Required:

  • atime: on (on my TrueNAS system this defaulted to off, though I’m not sure if that’s universal)

Optionally:

  • relatime: on (on my system this defaulted to on, but I needed to check using a bash shell. This is optional but with HDDs it’s likely helpful)

Why do we need atime?

Proxmox Backup Server runs a Garbage collection process and this process needs atime (or relatime) enabled. (I’m gonna use relatimeon my system). Here’s a link to the PBS documentation about Garbage Collection

What about record size?

I’m considering setting the record size to 512K but I’m not confident in this value. I found a forum post where someone dumped a histogram of their .chunks folder. Most files where at or below ~1–2 MiB. Here’s the post: How to get the exactly backup size in proxmox backup | Page 2 | Proxmox Support Forum (In ZFS the record size is a maximum and whether 512K or 1M are appropriate depends upon how the backup server is implements).

However, the histogram results in that form post strike me as unexpected. I would have imagined that most files would be exactly 4KiB. That said, if the PBS instance is mostly backing up file systems (not block devices) then maybe smaller files are more likely. Or maybe each 4KiB chunk gets compressed and that might explain things. In my case, I’m going to be backing up block devices, so I think recordsize=512K might make sense. I’d love to know what others think

Final Note

I don’t have any experience actually deploying Proxmox backup server :sweat_smile: (yet…) so please be careful before accepting any of my “advice.” Also, if you see anything wrong, please help me learn! I’m eager to know more and I’d be very thankful.