[-] Netbird crashes my network

Hello, I am playing with netbird trying to create a “site-2-site” vpn.

This “second site” is machine that will be placed to my brother’s house so we will have access to shared services and get the offsite backup.

As said, I am testing it, but everytime I enable netbird on the secondary homelab the network crashes.
For unknown reason, the default gateway is still there (I can ping it) but the route for internet becomes unknown.
If I trace example google.com the first hop is not the gateway 192.168.178.1 (I have replicate its network) but truenas 192.168.178.22…

The only way to restore everything is stop the container & reboot everything (vm & router).
On the primary everything works as expected.

These are some command I did when netbird was enabled:

root@pollon:/home/olimpo# ping 9.9.9.9
PING 9.9.9.9 (9.9.9.9) 56(84) bytes of data.
^C
--- 9.9.9.9 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2054ms

root@pollon:/home/olimpo# ip route show
default via 192.168.178.1 dev enp6s18 onlink 
10.0.20.0/24 dev enp6s19 proto kernel scope link src 10.0.20.23 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.18.0.0/16 dev br-143794984aa2 proto kernel scope link src 172.18.0.1 
172.20.0.0/16 dev br-76f07613276d proto kernel scope link src 172.20.0.1 linkdown 
172.21.0.0/16 dev br-d95b2ab635f9 proto kernel scope link src 172.21.0.1 
192.168.178.0/24 dev enp6s18 proto kernel scope link src 192.168.178.23 


olimpo@pollon:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug enp6s18
#iface enp6s18 inet dhcp
iface enp6s18 inet static
  address 192.168.178.23/24
  gateway 192.168.178.1

This is the compose I used (the same code works perfectly on the primary vm).

---
# -------------------------------------------------------
# servizi
# -------------------------------------------------------

services:
  netbird:
    image: netbirdio/netbird:latest
    container_name: netbird
    restart: unless-stopped
    network_mode: "host"
    security_opt:
      - no-new-privileges:true
    env_file: .env
    volumes:
      - /home/olimpo/docker_data/netbird/etc:/etc/netbird
    environment:
      NB_SETUP_KEY: ${CHIAVE}
    cap_add:
      - NET_ADMIN

to do the tests, I created a virtual pfsense connected to the primary pfsense like so.
the vlan 240 on the first, has completely access to internet and blocks other vlan access.
Other vlans cannot access to vlan 240.

I am not an expert so maybe this is the cause, but I don’t know why it crashes the network.

EDIT:

I am an idiot, a tired idiot… it is time to get a break :slight_smile: and finish tomorrow.
Everything was right but the port on the switch wasn’t assigned to vlan 240 but my manangement…

Sleep deprivation and IT doesn’t mix well. Idk how many times I’ve had to force myself to take a break after working on something for hours. Only to come back and find out what was wrong in 5 minutes.

3 Likes

You are totally right! .
…even if often it’s hard to stop and go and go to sleep easily without continue to think of how to solve :sweat_smile:

…Anyway 1
yesterday night after changing the vlan on the switch port everything worked …but partially. :innocent:
Because I think I have to fix the routes on both sides to let traefik services on lan A be reached on the lan B and the opposite but this is another story.

…Anyway 2
I have a little update, because it seems that the problem is still present.

What I have noticed is that if the the vm is stopped with the netbird running, on next boot as soon as the container automatically starts, the network crashes.

Instead if I stop the container before power off the vm, on the boot up there are no issues, not even when I manually start the Netbird.

so the only condition to avoid the crash is this:

~/docker_compose/netbird$ docker compose down
~/docker_compose/netbird$ sudo reboot now
--- after the reboot & login ---
~/docker_compose/netbird$ docker compose up -d 

It seems related to the vm boot condition and the option “unless-stop” on the docker compose (and it is strange because its “twin” on lan A [different key but same os, same compose and same versions] has zero problems).

What is strange is that when it happens, the vm thinks that it is gateway is the NAS IP (192.168.178.22), even if it not specified anywhere…

These are the outputs of some commands when the crash occurs:

traceroute IP

olimpo@pollon:~/docker_compose/netbird$ traceroute 9.9.9.9

traceroute to 9.9.9.9 (9.9.9.9), 30 hops max, 60 byte packets
1 truenas.miodominio.com (192.168.178.22) 0.381 ms 0.352 ms 0.337 ms^C

traceroute NAME

olimpo@pollon:~/docker_compose/netbird$ traceroute google.com

traceroute to google.com (142.250.180.174), 30 hops max, 60 byte packets
1 truenas.miodominio.com (192.168.178.22) 0.265 ms 0.235 ms *^C

ip route show


olimpo@pollon:~/docker_compose/netbird$ ip route show

default via 192.168.178.1 dev enp6s18 onlink
10.0.20.0/24 dev enp6s19 proto kernel scope link src 10.0.20.23
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.18.0.0/16 dev br-143794984aa2 proto kernel scope link src 172.18.0.1
172.20.0.0/16 dev br-76f07613276d proto kernel scope link src 172.20.0.1
172.21.0.0/16 dev br-d95b2ab635f9 proto kernel scope link src 172.21.0.1
192.168.178.0/24 dev enp6s18 proto kernel scope link src 192.168.178.23

HOSTS

olimpo@pollon:~/docker_compose/netbird$ cat /etc/hosts

127.0.0.1 localhost
127.0.1.1 pollon.miodominio.com pollon

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

INTERFACES

olimpo@pollon:~/docker_compose/netbird$ cat /etc/network/interfaces

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug enp6s18
#iface enp6s18 inet dhcp
iface enp6s18 inet static
address 192.168.178.23/24
gateway 192.168.178.1

allow-hotplug enp6s19
iface enp6s19 inet static
address 10.0.20.23/24

Commands like dhclient or restarting networking service, reboot the vm have no effect.
The only way to restore the network on this vm is to stop netbird and reboot pfsense.

after pfsense reboot:

olimpo@pollon:~$ traceroute google.com
traceroute to google.com (142.251.209.14), 30 hops max, 60 byte packets
 1  192.168.178.1 (192.168.178.1)  0.528 ms  0.510 ms  0.500 ms

here I have no special rules, just basic and no floating

On TrueNas 192.168.178.22 I set a static route to move traffic for 192.168.203.0/24 to the netbird vm 192.168.178.23.

what could be the cause?

EDIT.

this issue slowly propagate to the entire network, after a while even my pc get the 192.168.178.22 as gateway