Solvedansible Anisble does not allow handling of "host unreachable" errors
✔️Accepted Answer
Does anyone else agree we need to revisit how we are handling unreachable errors?
We have a use case where we are using a play to secure our servers. While securing our servers we disable ssh logins with root. If the setup is run again on secured node it will obvious fail because the root user is not permitted for ssh logins.
Our inventory is dynamic meaning we could have 1000 nodes all secured or 1000 nodes and 'n' unsecured.
The problem we encounter is the secure role runs as root initially and by the end of play, root login is disabled. For the nodes that are unsecured the play runs not problem but for the nodes that are secured the get an unreachable error. We want to be able to handle that and pass it. The method mention above does not seem to work when all nodes are secured. The play is stopped.
I agree with the idea that rescue
should be able to catch unreachable errors and decide what to due next.
For now we have made sure that our infrastructure code runs the setup play when the node is initialized but I think being able to have the option to handle unreachable errors would be very helpful
Other Answers:
Let's get up this issue. Please, reopen this. It's really painful when you work with autoscaling group on the cloud.
What's the issue exactly? Let's start with, for example, 20 nodes gathering info from EC2 dynamic inventory. A couple of them might be terminated by EC2 API to scale down. If they will, Ansible marks these hosts as unreachable, finish all tasks on helathy nodes. But! It returns "exit code 4" which means error.
The issue is that it sometimes doesn't seem to be error at all. But the target CI system (Jenkins, Bamboo, whatever) marks this operation as an error and breaks the full operational cycle. Meta task doesnt work on multiple nodes. Even if it will, it's going to try to connect to hosts again which seems unreasonable at this situation.
It treats by a shell-wrapper which handles "4 exit code", but, surely, this is a nasty workaround. So, please, include the meta or any other engine which helps to signal Ansible that unreachable is OK and we don't need to return error if some nodes (for example, by percentage) will be unreachable&
I will not use Ansible because of the lack of error handling for unreachable hosts
After testing, it seems - meta: clear_host_errors
(and refresh_inventory
) does not clear unreachable host errors.
ISSUE TYPE
COMPONENT NAME
Anible core (?)
ANSIBLE VERSION
I updated to 2.2.0.0 and the behavior is identical.
CONFIGURATION
ssh.config contains:
OS / ENVIRONMENT
N/A
SUMMARY
Ansible does not allow handling of "host unreachable" errors. The following methods of handling errors do not catch "host unreachable" errors and do not allow playbook logic to detect and act upon such situations:
STEPS TO REPRODUCE
Attempts to catch the error:
Retries:
Try/catch block:
EXPECTED RESULTS
Host Unreachable errors are handled by the error-handling logic in the playbook
ACTUAL RESULTS
Ansible behaves as if the error-handling logic does not exist.