'Can't SSH into RancherOS which is installed in iohyve in FreeNAS within a virtual machine
I'm preparing for a Server upgrade, but before doing so I want to have a dry-run within a VM first.
I'm running Linux Mint on a laptop. Currently I have FreeNAS v9.10.2-U6 installed within QEMU and RancherOS v1.5.6 installed into a VM via iohyve.
[laptop]
|_ [QEMU]
|_ [FreeNAS]
|_ [iohyve]
|_ [RancherOS]
I'm able to SSH into FreeNAS with no problem, but I can't SSH into Rancher. When trying to connect to Rancher it eventually times out. When I run the ssh
command with -vvv
it seems to hang on debug1: Connecting to <RANCHER_IP> [<RANCHER_IP>] port 22.
before eventually timing out.
This is what I've tried so far:
- Verified the Rancher VM is reachable from the Host via
ping <RANCHER_IP>
- Verified sshd is running in the Rancher VM
ps -ef | grep sshd
- Verified the SSH port is being listened to in the Rancher VM
netstat -nl | grep :22
- Checked my
iptables
rules on the Host and Guest and there doesn't appear to be a rule that would be blocking communication.
This is my first time dealing with networking within nested VM's so I'm not certain if there's something simple I'm missing. I look forward to any insight the community may have.
Solution 1:[1]
TL;DR, I had to disable Hardware Offloading within the FreeNAS VM. For a persistent fix, within FreeNas' GUI I went to Init/Shutdown Scripts
and created a Post-Init
Command
script that ran
ifconfig vtnet0 -rxcsum -txcsum -rxcsum6 -txcsum6 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -tso -tso4 -tso6 -lro -vlanhwtso -vlanhwcsum
Full Troubleshooting Steps:
- Verified the MTU for Host, FreeNAS, and Rancher were all the same (1500)
- Host:
ifconfig | grep mtu
- FreeNAS:
ifconfig | grep mtu
- Rancher:
ifconfig | grep MTU
- Host:
- Verified Rancher has outside access:
ping google.com
- Verified the Host, FreeNAS, and Rancher could communicate
- Host to FreeNAS:
ping <FREENAS_IP>
- Host to Rancher:
ping <RANCHER_IP>
- FreeNAS to Host:
ping <HOST_IP>
- FreeNAS to Rancher:
ping <RANCHER_IP>
- Rancher to Host:
ping <HOST_IP>
- Rancher to FreeNAS:
ping <FREENAS_IP>
- Host to FreeNAS:
- Verified
sshd
is running in the Rancher VM:ps -ef | grep sshd
- Also tried restarting
sshd
:sudo system-docker restart console
in case there was some sort of race condition.
- Also tried restarting
- Verified the SSH port is being listened to in the Rancher VM:
netstat -nl | grep :22
. - Verified routing tables, and that there was a default gateway for all
- Host:
route
- FreeNAS:
netstat -r
- Rancher:
route
- Host:
- Tried adding a dedicated SSH port and listening IP for Rancher, and verified via
netstat
that just that IP and Port were being listened to. This was to rule out any possible port conflicts. - Checked
iptables
rules on the Host and Rancher (FreeNAS doesn't have a firewall) and there weren't any rules that blocking communication.- Turned the Firewall rules off, then restarted Rancher's sshd (nadda), then rebooted the FreeNAS VM (nadda).
- There is a firewall tool in FreeNAS, but verified that nothing was set up with:
ipfw table all list
.
- While in FreeNAS I checked network traffic to see if my SSH request was even getting there. For each case I had 2 terminals open, one connected to FreeNAS, the other was to connect to Rancher. Since the output is so long in the Live env (because the SSH connection did complete), I'm only adding one of the logged items for each case since the pertinent info is in the first log.
- On Live:
sudo tcpdump -nnvvS '(src <HOST_IP> and dst <RANCHER_IP>) or (src <RANCHER_IP> and dst <HOST_IP>)'
.tcpdump: listening on ix0, link-type EN10MB (Ethernet), capture size 65535 bytes 15:01:53.957264 IP (tos 0x0, ttl 64, id 56881, offset 0, flags [DF], proto TCP (6), length 60) <HOST_IP>.60648 > <RANCHER_IP>.22: Flags [S], cksum 0xfae8 (correct), seq 468317589, win 64240, options [mss 1460,sackOK,TS val 2321761697 ecr 0,nop,wscale 7], length 0
- On VM:
sudo tcpdump -nnvvS '(src <HOST_IP> and dst <RANCHER_IP>) or (src <RANCHER_IP> and dst <HOST_IP>)'
tcpdump: listening on vtnet0, link-type EN10MB (Ethernet), capture size 65535 bytes 14:59:03.029922 IP (tos 0x0, ttl 64, id 25421, offset 0, flags [DF], proto TCP (6), length 60) <HOST_IP>.45688 > <RANCHER_IP>.22: Flags [S], cksum 0x8403 (incorrect -> 0x69a6), seq 3645881181, win 64240, options [mss 1460,sackOK,TS val 1007017042 ecr 0,nop,wscale 7], length 0
- Noticed that
cksum
hadincorrect
a lot, so I ran this on the Hostethtool --show-offload <ETHERNET_INTERFACE_NAME> | grep tx-checksumming
and it told me it was on. Ransudo ethtool -K <ETHERNET_INTERFACE_NAME> tx off
to disable it, re-rantcpdump
and ssh command, still gotincorrect
forcksum
, so I renabled checksummingsudo ethtool -K <ETHERNET_INTERFACE_NAME> tx on
. At least I thought the last command reset things, after a reboot of FreeNAS the network was no longer reachable. I ended up runningsudo ethtool --reset <ETHERNET_INTERFACE_NAME> all
, and eventually recreating the VM from scratch and rebooting my system to get things reset.
- On Live:
- Finally came across the solution in this post after a Google search for
iohyve tap0 or epair
of all things. Quoting the relevant info in case the post disappears at some point.I ran into a very similar situation recently. I could ping the jails to & from bhyve guests but I could not pass any actual traffic. From other physical devices I had no issue passing traffic. The problem ended up being the hardware offloaders (TSO, HWSUM, etc) were causing the issue, which I found kind of ironic considering the traffic was not making it to the hardware in my case. I used
tcpdump
and could see the traffic had checksum errors. I turn off the hardware offloaders and everything started working, took me two weeks to figure this out. In hindsight I should of rantcpdump
on the first day.Try turning off the hardware offloading, then rerun
ifconfig -v
if it took effect, then test to see if you can pass actual traffic.Disable hardware offloading:
ifconfig igb0 -rxcsum -txcsum -rxcsum6 -txcsum6 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -tso -tso4 -tso6 -lro -vlanhwtso -vlanhwcsum
- So for my use case I SSH'd into FreeNAS, made sure the Rancher VM was stopped, disabled the off-loading (replaced
igb0
withvtnet0
), started the Rancher VM back up, and finally tried to SSH into Rancher... and succeeded. Basically my previous attempt to disable offloading was correct, but I needed to do it within FreeNAS, not the Host... which is a bit counter intuitive to me considering it's a bridged network and I'm passing my exact hardware resources through to the VMs.
- So for my use case I SSH'd into FreeNAS, made sure the Rancher VM was stopped, disabled the off-loading (replaced
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | theOneWhoKnocks |