Freenas can't ping some hosts on storage vlan

Hi all,
I have a setup with a couple of XCP-ng servers and a FreeNAS server.
All servers are connected to two Ubiquity switches, an 1 Gbit switch and a 10Gbit switch.
The 1 Gbit switch is used for the management interface, and the 10 Gbit for a SAN and eventually also for office connections.
I have a separate VLAN for SAN (VLAN80) and a VLAN for the office connection.

My problem is that I can’t ping all XCP-ng server or some VM’s running on those server over the SAN VLAN 80. I can ping all servers over the management interfaces.
For instance I can ping the following:

  • 172.16.8.2 (FreeNAS) to 172.16.8.3 (XCP-ng)
  • 172.16.8.2 (FreeNAS) to 172.16.8.4 (XCP-ng)
  • 172.16.8.2 (FreeNAS) to 172.16.8.101 (Debinan VM @ 172.16.8.5)
  • 172.16.8.5 ((XCP-ng) to 172.16.8.3 (XCP-ng)
  • 172.16.8.5 ((XCP-ng) to 172.16.8.4 (XCP-ng)
  • 172.16.8.5 ((XCP-ng) to 172.16.8.33 (Debinan VM @ 172.16.8.3)
  • 172.16.8.5 ((XCP-ng) to 172.16.8.101 (Debinan VM @ 172.16.8.5)

But I CAN’T ping the following:

  • 172.16.8.2 (FreeNAS) to 172.16.8.5 (XCP-ng)
  • 172.16.8.2 (FreeNAS) to 172.16.8.33 (Debinan VM @ 172.16.8.3)

Yesterday I could ping form .8.2 to .8.33, but after some reboots I can’t anymore.

I’m running FreeNAS 11.3u1 and XCP-ng 8.1 RC1.
I chose the RC1 of 8.1 because we are building a new Server stack and I did not want to have to upgrade to a newer XCP-ng right after we were done setting up, and I anticipate the 8.1 will be final before we are completely done setting everyting up.

The Ubiquity switch are all configured with the correct VLAN’s.
I also set the MTU for the 10GBit interfaces to 9000, and enabled jumbo frames on the switches.

Has any one else had the same problem were some hosts on a vlan can’t be contacted by FreeNAS?

Below is a simple network diagram:

Kind regards,
Rienk

Check the MAC address table on the switch to see if all the MAC addresses from your VMs and storage interfaces are listed. Also, make sure all your subnet masks are correct (/24 or 255.255.255.0) assuming that is what they should be.

Hi Fred,

Thanks for the quick reply.
Do you know how to get the MAC table form an Ubiquity switch?

Looks like if you SSH to the switch and run…
enable
show mac-addr-table

Also, take a look at Insights in the web GUI. Someone mentioned it might be there too.

Thanks,
I found the same thing just now via google :flushed:

1 Like

I just checked the mac table in the switch, and I don’t see all addresses that are on VLAN 80 in the list.
The strange ting is that one of the IP’s I can’t ping from freeNAS has its MAC address in the list. But when I try to ping from freeNAS it wont work, and arp -a shows:
? (172.16.8.33) at (incomplete) on vlan80 expired [vlan]

Have you checked the subnet mask of the interfaces you can’t ping?

I just double checked the IP’s and netmask, and all are in the same range.
I will dive into the mac lists a bit further today to see if I can find any clues.

I just tried arping instead of ping.
Currently I can’t ping from .8.5 to .8.2, but I can ping from .8.5 to .8.3
When I use arping I get one response from .8.2, and multiple responses from .8.3
I have a lot of experience with (embededed) Linux systems. But not so much with network debugging at this level. I hope someone can get some information form the output below:

[10:05 super-builder ~]# arping 172.16.8.3 -s 172.16.8.5 -I xapi1
ARPING 172.16.8.3 from 172.16.8.5 xapi1
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 1.531ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 1.162ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.742ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.659ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.709ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.717ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.712ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.720ms
Unicast reply from 172.16.8.3 [AC:1F:6B:A5:93:04] 0.715ms
^CSent 9 probes (1 broadcast(s))
Received 9 response(s)
[10:05 super-builder ~]# arping 172.16.8.2 -s 172.16.8.5 -I xapi1
ARPING 172.16.8.2 from 172.16.8.5 xapi1
Unicast reply from 172.16.8.2 [AC:1F:6B:F8:AF:8A] 0.998ms
^CSent 9 probes (1 broadcast(s))
Received 1 response(s)

I would make sure you don’t have any duplicate MAC addresses in use. An easy way to figure this out is to determine your .2 MAC address and then unplug that device. Give it 5 minutes and then check the switch MAC address table to see if that address is still there, but tied to a different port.

I think I found the problem.
I pulled out one of the 10Gbit ports, and it started working.
I have all servers (FreeNAS and XCP-ng) configured with link aggregation. Also on the switch I configured the ports for link aggregation. But somehow FreeNAS does not seem to like it.
As soon as I plug in both network cables I can’t ping some IP’s from the FreeNAS box.

So are the hosts and storage configured for link aggr as well?

Yes, all hosts / storage and switches are configured as link aggr.

I would try to change out or trade a cable with a different system to see if that helps.

Changing the cable did not help.
What I did notice was that I was pulling out the cable of port 1 constantly to make it work.
When I pulled out the cable of port 2 while keeping port 1 plugged in I got a “Network is down” error from the ping command.
I will change the card with an other to see if that might fix the problem.

I just changed the network card, but this did not change a thing.
I also tried a different set of switch ports but this also did not matter.
The last thing I tried was changing the LACP settings on the XCP-ng machine (not the FreeBSD server I keep unplugging) from MAC address load balancing to IP based. But this did also not help, I didn’t think it would because it is not the machine having the problems.

Maybe change the port that the cable is plugged into? For example if port 1 on the NIC is plugged into port 1 on the switch, move it to port 2 on the switch since you know that is working.

I also tried different switch ports yesterday.
I also found this bug in FreeBSD, but I’m not sure if it is the same problem I have:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221146

Are you running the same driver (3.2.12-k)? Here is something else to consider. Instead of running a single path for iSCSI (with lagging), why not setup multi-path high availability (MPHA) for iSCSI and dedicate the two ports on your virtual hosts and storage to a single IP/network/VLAN. This would actually give you better throughput than what lagging would provide.

Edit: I see for FreeNAS they call it MPIO, but here is a good tutorial: https://www.virtualizationhowto.com/2018/08/freenas-iscsi-configuration-for-mpio/.

1 Like