TrueNAS Replication Task Newbie Questions

loftis · April 26, 2024, 4:18am

Dear Community, I recently successfully replicated some data from one TrueNAS system to another. While the task was successful, I am a beginner and have some questions about TrueNAS’ replication. I hope you can help me answer them. Thanks in advance!

Performance

My setup: I was replicating data from machine A (located in North America) to machine B (located in Europe). In other words, I was moving data across the Atlantic Ocean. Speed isn’t that important here, as this is my worst-case offsite backup with data I don’t need to access quickly in an emergency. Still, I wondered why I was only getting about 50Mbps consistently over several days (I was moving several terabytes of data).

The upload capacity of machine A is 250Mbps, while the download speed of machine B is 100-150Mbps. Both machines are quite powerful (modern AMD CPUs, fast drives, 1Gbps network, etc.). My pfSense routers at both sites were also not heavily loaded in terms of CPU, and no other traffic was present during the replication time. The site-to-site connection is through Tailscale, which can also transfer much more data.

Is 50Mbps realistic, as the transfer latency will eat up the rest of the speed?
I assume that my bottlenecks should be the 100-150Mbps (double to triple speed missing) download speed of site B. Is this true?
How can I increase my transfer speed?
Are there any notable settings I should change in the replication task settings in terms of performance? What about transport (currently I have it set to SSH)?

Security

I first tried to connect my machines using the Admin Password method, which failed. Then I manually configured an SSH connection in the Backup Credentials section.

Is it possible to change the admin password after a successful connection?
What security steps should I take to minimize the impact of a machine compromise on one of the sites (I am concerned that malware will be able to delete replicated data on the other site)?
I have enabled (or it was the default) “Use sudo for ZFS commands”. I don’t really understand what this means, can you please explain?
I want to replicate my dataset so that it cannot be read on machine B for privacy reasons. To do this, I set up encryption on my dataset and did not manually share the password with site B. Is this the right way to do this, or are passwords shared automatically?
What is the additional “Encryption” setting in the replication task settings under Destination?
Which user should be set in the SSH connection settings? I currently use admin instead of root. Any tips here?

Stability

My replication task failed in the middle of the sync process.

Does TrueNAS replicate all data in a fail-safe manner even if the process fails and reboots or one of the machines crashes/reboots?
Are there any settings for additional replication stability?
Is it recommended to use the “Replicate from scratch” setting?

Thank you for your help, even if I have to write down a lot of questions! If you don’t understand my questions or need additional information to answer them correctly, please ask again!

LTS_Tom · April 26, 2024, 1:02pm

I have a video that answers all those questions except the hardening one of which get’s complicated and there is a write up here that dives into that Improving Replication Security With OpenZFS Delegation | Klara Inc

loftis · April 27, 2024, 11:44pm

Thanks for your answer. I missed that video. It explains a lot of questions already, thanks. I still have some open questions I would like to discuss here.

What about my performance issues? Am I correct that I should be getting 100Mbps, as this is the download speed of site B and the slowest point in the connection? I am currently only getting 60Mbps. Why is this happening?
When I switch to SSH+NETCAT I get an error that I cannot have any TCP ports open. Do I need to change my Tailscale ACLs for this? Still, I think my system can handle more than 60Mbps SSH crypto, so where could the bottleneck be?
I read the ZFS delegation article you linked. If I understand correctly, I could set up a ransomware secure system with TrueNAS, but only with terminal configs and not via the web interface. Please confirm if this is correct. What other ransomware precautions can I take or is TrueNAS Scale not really ready for ransomware secure replication?

I am still not sure about this setting. Can you please help me out?

This basically ensures that if my replication data is accidentally modified (partially deleted, etc.), TrueNAS will simply re-sync? What about performance? Would a small change trigger a full sync, or just the blocks that have changed (I ask because syncing all blocks would not be so much in the spirit of ZFS, although the name of the setting suggests it)?

LTS_Tom · April 28, 2024, 11:07am

Using SSH can be a big performance issue because you are slowing things down to the speed at which SSH can handle the tunnel.
Not completely sure why it does not work
Yup, lots of command line

You would use sudo if you want a user other than root to be used, but that user has to be able to use sudo. I suggest reading up on sudo.

Replicate from scratch only occurs when the system can not sync because something on the target has become corrupted and it has to send ALL the data again.

loftis · May 2, 2024, 12:05am

Thanks for your help! For now, I will wait until my replication is complete and then continue to play around and test. I just want to make sure my data is replicated first.

I will come back to the question of setting up SSH+NETCAT. I am assuming some pfsense or Tailscale ACL misconfiguration.

I will also say that I know what sudo is, it is just that the wording and description in TrueNAS sometimes seems to only make sense if you already know what it means. It is not as beginner friendly for the general tech-savvy person as other projects, in my opinion.

Still, I cannot get over the fact that there is no ransomware protection built into TrueNAS. Shouldn’t this be a first-class feature? Like some kind of data immutability feature? What alternative strategies would you recommend for ransomware data protection? My naive idea would be to have a second independent machine that I replicate data to, but keep offline the rest of the time.

Also, I’d like to know if latency in general has a significant impact on throughput, or if TrueNAS/ZFS replication isn’t much affected by it. In other words, will the Atlatic eat up my bandwidth?

Finally, I would like to know if there are any recommendations regarding full server replication/backup. Is it advisable to run one replication task for all datasets or is it better to run one replication task for each dataset? Maybe there is no benefit at all. I would like to find out.

One assumption is that it might be impossible/harder to recover individual datasets if they were handled by a replication task? Is this true, or can I run a full server (multiple datasets) replication task and partially restore individual datasets in an emergency?

LTS_Tom · May 2, 2024, 11:07am

Immutability in tech means that the data control plane at which the data is created, such as an NFS or SMB share can not be deleted by that control plane. This means that TrueNAS snapshots are immutable and good for ransomeware protection.

Latency is a factor, just not sure how much and I prefer the granular per dataset snapshots and replication.

loftis · May 3, 2024, 9:15pm

Why do you prefer per-snapshot replication? Does it make recovery easier? Can you do partial restores?

Considering TrueNAS immutability from the client side (SMB/NFS connection device) is a good point. Still, I would like to be able to compartmentalize between the storage servers themselves. I think security should always be a practice where multiple layers of defense are applied. These layers are a bit lacking in TrueNAS. It would be great if ixSystems could bring an easy to manage ZFS delegation setup to TrueNAS.

LTS_Tom · May 4, 2024, 9:06am

I have more snapshot and replication tasks so I can be granular on frequency, retention, and restore.

If there was a simple way to do it, they would implement it.

mcraven · August 27, 2024, 3:16am

Not trying to hijack this thread but, to be honest, I’m asking myself the same questions as the OP. I’m running TrueNAS core and, when setting up a new replication, there is an option for “SSH Transfer Security” that has “encrypted” and “unencrypted” choices.

I’m looking at the Advanced Replication Creation page and editing a previous replication job and I can’t seem to connect this “SSH Transfer Security” with an option on these pages.

Can I get a bit of help? I’d like to toggle this option to investigate impact on transfer performance. I’m in a private LAN and replicating server to server so I don’t really care for encryption between my source and destingation.

Thanks.

LTS_Tom · August 27, 2024, 9:23am

Not using the SSH Transfer is generally going to be much faster. But the options are not the same in Core and all my latest videos are being done on Scale because Core is reaching end of life as noted in my lated video.

mcraven · August 27, 2024, 10:13pm

Understood… I’ll be looking at upgrading. On CORE, if I edit an existing replication task, what is the option/setting for which I’m looking?

I want to try to toggle SSH encrypted vs unencrypted transport to test impact but I’m also wanting to check my existing replications to verify what option I selected when I created them.

Cheers!

mcraven · August 28, 2024, 9:42pm

I’ve experimented and will answer my own question here to help someone else… Of course, if my observation is wrong, please correct me.

What I was looking for was “SSH” <==> “SSH+NETCAT” in the transport options.

SSH = encrypted transport
SSH+NETCAT = unencrypted transport

To be clear, this encryption is separate from options to encrypt the replication on the remote destination. I was unsure as the “wizard” vs “advance” configuration pages use different terminology.

LTS_Tom · August 29, 2024, 10:54am

You are correct, the way it works is either

Encrypted: establishing a connection via SSH and then sending the data over SSH
Unencrypted: establishing a connection via SSH and then choosing a port for both servers to use NETCAT to send the data.

Wrapping the data transport in SSH has a much higher overhead and therefore can be much slower which is why they have the option.

alexa · February 11, 2025, 3:32pm

Bumping this thread as its most closely related to my question.

Tom at the begining of the video on replication you mentioned this works over a VPN.

But how do I setup the SSH connection to run via WireGuard?

Also a pull configuration would be preferable from a ransomware perspective?

LTS_Tom · February 11, 2025, 5:11pm

I build the VPN via my firewalls and TrueNAS uses that VPN to send traffic over. And whether to run push or pull depends on what system you feel is most at risk of someone getting into.

alexa · February 18, 2025, 9:55am

Hi Tom,

For TrueNAS replication, in my case, its possible for the source NAS to be shutdown for an extended period of time. Therefore, Id like to store my last 30 daily snapshots rather than deleting those older than 30 days.

It seems Synology can do this. Is there no way in TrueNAS?

TrueNAS forum for reference: Server off longer than snapshot retention period | TrueNAS Community

alexa · February 18, 2025, 5:56pm

ChatGPT suggests creating a cronjob with the following command:

zfs list -t snapshot -o name -s creation | grep “^your_dataset@” | head -n -30 | xargs -I {} zfs destroy {}

LTS_Tom · February 18, 2025, 6:35pm

I am using the automation in TrueNAS not separate CRON jobs so I don’t really have much insight to offer regarding other ways.