The purpose of this document is to identify ways in which the vehicle interface component might fail, and to determine mitigation strategies for these failure modalities.
Application logic
The purpose of the vehicle interface is to perform input validation on control inputs, pass the result to the vehicle, and report information from the vehicle.
The general process for this application is then as follows:
- Receive data
- Timeout -> (optional) hazard lights and slow down
- (Optional) compute acceleration/steer commands via controller
- (Optional) low pass filter on controls
- Read state commands
- (Optional) pass commands through the state machine
- Send final commands to the vehicle
- Read information from vehicle
- Publish information that's been read from vehicle
- (Optional) update the state machine
Failure modalities
For each of the steps, the following failure modes have been identified:
- Receive data
- Input data is wrong
- Data from DDS/ADAS stack doesn't arrive in time
- Data is high frequency
- Data could be out of order
- Data is out of range
- Timeout -> (optional) hazard lights and slow down
- (Optional) compute acceleration/steer commands via controller
- (Optional) low pass filter on controls
- Implementation is wrong (of the state machine, filter, controller, etc.)
- Filter could be not working right
- Read state commands
- No state commands available (via read)
- (Optional) pass commands through the state machine
- Conflicting commands are sent
- Send final commands to vehicle
- Manual control might happen during autonomous control
- Commands might not be acknowledged by vehicle
- Command might not be executed by vehicle
- Read information from vehicle
- Data from the vehicle platform doesn't arrive in time
- Sensor data from vehicle might be wrong or corrupted
- (Optional) update state machine
- State machine doesn't get updated, or misses an update (putting it in an inconsistent state)
Mitigations
For each failure modality, the following mitigations (or rationales on why mitigations are not required) have been identified:
- Input data is wrong
- Validate inputs via the state machine, ensure system integrity via security features
- Data from DDS/ADAS stack doesn't arrive in time
- Come to a safe stop on data timeout
- Data is high frequency
- Low pass filter
- Data could be out of order
- Keep track of latest timestamp, ignore (and warn on) data that's older than the latest timestamp
- Data is out of range
- Clamp values, using configured limits
- If data is far out of range, issue a warning; could be user error or some other issue
- Implementation is wrong (of the state machine, filter, controller, etc.)
- Fully test implementations
- Filter could be not working right
- Can do extra validation on the output; FFT, keep track of derivatives, or keep track of variance
- No state commands available (via read)
- This is not an error: there is nothing to do
- Conflicting commands are sent
- State machine makes commands consistent
- Manual control might happen during autonomous control
- If vehicle is not in autonomous mode, don't even try to send commands
- Commands might not be acknowledged by the vehicle
- If communication mechanisms supports acknowledgements, warn if commands have not been acknowledged within some timeout
- Command might not be executed by vehicle
- Can add extra state to
SafetyStateMachine
, might need to keep some history since updates might lag a little
- Data from vehicle platform doesn't arrive in time
- Depends on the platform
- This could be a critical error (e.g. DBW is down)
- The developer should definitely be notified, and the platform should perform timeout behavior (but might not fully mitigate the risk)
- Sensor data from vehicle might be wrong or corrupted
- Ensure that all components elsewhere in the stack validate their inputs; implementer of interface should use domain-specific knowledge to validate
- The state machine doesn't get updated, or misses an update (putting it in an inconsistent state)
- Minimize path length in the state machine–ensure that the state machine relies on last observation rather than a history of observations
Summary
The mitigations proposed as a result of the failure analysis can be encoded with the following architectural components:
- Low pass filter
- Safety state machine
- Data clamping can occur here
- Validating that data is low frequency can happen here
- Clamp control data to a safe range can occur here
- Warning when control data is wildly out of range can occur here
- State machine should have a short path length
- Warnings should be emitted if a state transition doesn't occur after being commanded (within some timeout)
- Combination of control and state commands should be made consistent here
- Platform interface
- Commands should not be sent if vehicle is not in autonomous mode
- If data does not arrive in time from the platform, a warning or error should be raised
- If the vehicle communication mechanism supports acknowledgements, a warning should be raised if an acknowledgement was not received in time
And the following behaviors should be enforced by the overall implementation:
- Safe stop on timeout
- Ignore old data
Related issues
- #4944: Add failure analysis document for vehicle interface
- #4770: Review new design articles for the 1.0.0 release