Created: 2022-1-4 21:11:13
Modified: 2022-1-4 21:11:13
AV fuzzing: focus on diversity, find bugs (testing), evaluation
safety(collision testing), functionality “correctness”(planning&tracking) (traffic rules), mobility, and rider’s comfort
traffic violations
traffic offenses can be vehicle-related
导航能够解决“压着导向车道实线变道”还是“应该变道”的问题
semantic action
motivation e.g. “follow the car in front of you” or “quickly overtake that car on your left”
property: a semantic action can have a very long time horizon; small number; semantic action space -(kinematic calculations)> action space
formalize: goals & aggressiveness level
evaluation: quality function based on a machine learning approach trained on a large variety of road types, instead of learning a Q function over immediate geometric action
Responsibility-Sensitive Safety for planning in critical situations
A Lane-Based Coordinate System
requirement without usefulness -> keep Safe Distance
proper response -> best effort (Proper Response with Extra Evasive Effort) -> excessive response
assume: under reasonable Situation
a single lane road: cars cannot perform lateral maneuvers, cars never drive backward
common sense rule: if someone hits you from behind it is not your fault
metrics: Safe longitudinal distance — same direction
proper response: rear car’s acceleration, brake according to safe distance
a straight road (a general lane geometry): cars cannot perform lateral maneuvers, cars can drive backward
common sense: if someone hits you from behind while you are reversing it is your fault
metrics: Safe longitudinal distance — opposite directions
proper response: both cars’ acceleration, brake according to safe distance
a straight road (two routes of the same geometry)
metrics: Safe Lateral Distance
proper response: both cars’ acceleration, brake according to safe distance
common sense: if you can avoid an accident without causing another accident, you must do it
multiple different road geometries: roundabouts, junctions, and merge into highways; one route has priority over others, and vehicles riding on it have the right of way; two cars on different routes
metrics: Safe Lateral/Longitudinal Distance for Two Routes of Different Geometry (abstract? position relationship?)
special scenario at an intersections with traffic lights, like right of way is given, not taken
no route geometry, e.g. the parking lot
(not consideration)
Pedestrians: zebra crossing or residential street / residential road (unstructured roads?)
Occlusion
common sense: When a human driver claims “but I couldn’t see him”, a counter argument is often “well, you should’ve been more careful”, like limit v or proper Behaviour(always adhering to proper response)
stress-testing techniques
fitness functions in the area of testing autonomous systems [25]
AST
Adaptive Stress Testing: Finding Likely Failure Events with Reinforcement Learning
Airborne Collision Avoidance System (ACAS X, TCAS) scenarios of near mid-air collisions (NMACs)
(2 distinct types in 200 hours)
formulations: simulator state (stacked system and environment states) is fully observable or partially observable, Markov process with discrete time and continuous state, sequential decision process -> MDP or MCTS-SA
partial observability: a modified Monte Carlo tree search (MCTS) algorithm using only pseudorandom number generator of the simulator
differential adaptive stress testing (DAST): failure in targeted system but in baseline
AV-FUZZER
AV-FUZZER: Finding Safety Violations in Autonomous Driving Systems
adv: edge cases and diversity, all 5 types within 20 hours of search on Apollo
GA + local fuzzer + repeat differently
CROUTE
labeled Petri net: map
place: a road
transition: a set of junction lanes connecting the same roads
labels assigned to transitions: traffic signs at junctions
route type at a junction: junction topology features, route features at a junction
junction topology feature: describe how a road connects with other roads at a junction
(a junction contains multiple topology features, junctions maybe topology equivalent)
route: consist of lanes
route feature: describe land changing and motions for traffic signs, to describe route feature (feature keep when only lane-changing times changes?no.)
param format: (initial lanes, target lanes, obstacles positions)
gene method: base + search obstacles (causing land changing)
mutate method: an adding operator
ATLAS
classification:
road-topology characteristic of a junction lane: describe intersecting lanes of junction lane using incoming and outcoming roads
generate different abstract scenarios describing different interaction patterns
generate in each abstract scenario:
AutoFuzz
Paracosm
programmatically describe complex driving situations: a synchronous reactive programming model (reactive: refer to vehicles or persons)
Components: It -> Ot, graphical assets, physical properties
A Paracosm configuration: components
test parameter coverage: k-wise combinatorial coverage & low dispersion
various test input generation strategies: random sampling over discrete parameters & deterministic quasi-Monte Carlo methods for continuous parameters
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles
interesting scenario: where a collision or near collision happens, and the AV can avoid it in at least one way (including critical scenarios, can be ranked by complexity; scenario having less CriticalTime before collision is unavoidable)
Characterizing: how many safe paths exist and how hard is it to follow them (complexity of the scenario w.r.t. possibility of avoiding accidents)
metrics: SafePathInv UnsafePercent AvgEffort MinEffort NarrowInv CriticalTime
using ‘the number of computed safe driving paths, total paths in the scenario (all driving trajectories), narrowness of safe paths, and the effort required to follow each safe path’.
computed safe driving paths: calculating the annulus sector in the next time-step, discretize the 2D sector with a grid, quantizing the AV valid state
tensor representation for AV only (fix the trajectories of all actors other): enumerate all states of AV at time t
Generating method:
generate an unsafe sequence (initial executed driving scenario as input + perturbations) (change the policy of closest actors (three different attacking modes) <-> distance as cost function;) diversity? , create a critical scenario (sample interesting scenarios by starting from collision time tc and moving back in time to find t0. )
Sequences: states of all the vehicles extracted from real-world data or from driving simulations
generate approximately 240 scenarios (80 accidents, more than 90% avoidable) per hour on a single system
Suraksha
analyze the safety effects/sensitivities of using various perception (degraded perception due to HW/SW capabilities or inaccurate perception) parameters or level.
Generating:
goal: efficient scenarios, difficulty levels(depend on AV velocity when the target is revealed)
results: 12 driving scenarios for the four categories with difficulty levels, select four scenarios for analysis.
Metrics:
scenario-independent metrics: Braking L1, Throttle L1, AV coordinate L1, Minimum distance (AV info: Actuators, Location, Velocity, Acceleration; Target obstacle info: Location)
Perception Quality Requirements Analysis (error allowed analysis):
component-level design choices depend on which metrics
AV implementations improvement & component-level requirements (trade-off principle: system has upper limit with poor components under requirements)
Adversarial evaluation of autonomous vehicles in lane-change scenarios
reward: add penalty for violation of traffic rules
driving performance: lane-change success, collision, velocity
traffic rules: rules break(?)
diversity: N agents (actor and critic in DDPG) with random initializations
Clustering: DP-Means by state Sequences generated
Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation
Multimodal in multi-modality (pattern/logic Scenarios?)
flow-based generative model (to estimate the multimodal distribution of safety-critical scenarios):
as the objective function
conditional input according to characteristics of the tasks
generated: efficiently querying the task algorithms and a simulator
flow-based to estimate the multimodal distribution
testing at different stress levels
pre-training a generative model (assuming RealNVP, a flow-based model) (by maximizing log-likelihood) (to approximate the distribution of the real data) , a modified flow-based model (with a conditional input) (weighted maximum likelihood estimation WMLE where weight related to the risk metric and data probability) (log-likelihood approximately proportional to risk level), adaptive sampler (step along gradient of worth exploring value related to risk metric minus risk level, calculated by NES under Monte Carlo method)
SADL
neuron activation traces
Surprise. with respect to training data.
Surprise Adequacy of sets: the range of surprise
objective: higher(Surprise Coverage)
diversified goal: from those similar to those significantly different and adversarial. with respect to training data
adversarial example classifiers?
sampling inputs for retraining? broader? SA values
reference
(attack and defence) Autonomous vehicle security: A taxonomy of attacks and defences 2016
(defense) A non-conservatively defensive strategy for urban autonomous driving 2016
Testing advanced driver assistance systems using multi-objective search and neural networks 2016 PreScan FCW
Deepxplore: Automated whitebox testing of deep learning systems 2017
Safe at any speed: A simulation-based test harness for autonomous vehicles 2017
Automated generation of diverse and challenging scenarios for test and evaluation of autonomous vehicles 2017
Adaptive stress testing: Finding failure events with reinforcement learning 2018
Adaptive stress testing for autonomous vehicles 2018
Scalable end-to-end autonomous vehicle testing via rare-event simulation 2018
Simulation-based adversarial test generation for autonomous vehicles with machine learning components 2018
Testing vision-based control systems using learnable evolutionary algorithms 2018
Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: ICSE (2018)
DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems (ASE) 2018
Chauffeur-net: Learning to drive by imitating the best and synthesizing the worst. (2019) *
Guiding deep learning system testing using surprise adequacy 2019
Generating adversarial driving scenarios in high-fidelity simulators 2019
Boosting operational DNN testing efficiency through conditioning 2019 FSE/ESEC
Adaptive stress testing with reward augmentation for autonomous vehicle validation 2019
Automatically testing self-driving cars with search-based procedural content generation 2019
Did we test all scenarios for automated and autonomous driving systems? 2019
Towards system-level testing with coverage guarantees for autonomous vehicles 2019
Genetic algorithm-based test parameter optimization for adas system testing 2019
(attack) Attacking vision-based perception in end-to-end autonomous driving models 2019
(attack) Are self-driving cars secure? evasion attacks against deep neural networks for steering angle prediction 2019
Learning accurate and human-like driving using semantic maps and attention. CoRR (2020) *
Generating avoidable collision scenarios for testing autonomous driving systems 2020
Identification and explanation of challenging conditions for camera-based object detection of automated vehicles 2020 *
AV-FUZZER: Finding Safety Violations in Autonomous Driving Systems 2020
Coverage-based Scene Fuzzing for Virtual Autonomous Driving Testing 2021
PGFUZZ: Policy-Guided Fuzzing for Robotic Vehicles 2021
Simulation Driven Design and Test for Safety of AI Based Autonomous Vehicles 2021 CVPR
Generating and characterizing scenarios for safety testing of autonomous vehicles 2021
(scenario) Depiction of priority lightvehicle pre-crash scenarios for safety applications based on vehicleto-vehicle communications
(scenario) Pegasus—first steps for the safe introduction of automated driving
(dstop, under single bit-flip) Ml-based fault injection for autonomous vehicles: A case for bayesian fault injection
(dsafe) Shared vehicle control using safe driving envelopes for obstacle avoidance and stability
(dsafe) Design and evaluation of a driving mode decision algorithm for automated driving vehicle on a motorway
(fitness) Fitness functions for testing automated and autonomous driving systems
Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
Binfi: an efficient fault injector for safety-critical machine learning systems
Tensorfi: A configurable fault injector for tensorflow applications
Modeling input-dependent error propagation in programs
Understanding error propagation in gpgpu applications
VerifAI: A toolkit for the formal design and analysis of artificial intelligence-based systems
(test DL) Safety verification of deep neural networks. In: CAV (2017) *
(test DL) Reluplex: An efficient SMT solver for verifying deep neural networks. In: CAV (2017) *
AV accidents reported: California DMV (https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/testing), Waymo simulated driving behavior in reconstructed fatal crashes within an autonomous vehicle operating domain, Database approach for the sign-off process of highly automated vehicles, Survey on scenario-based safety assessment of automated vehicles.
Generating effective test cases for self-driving cars from police reports 2019
攻击面在于软件输入:系统正常状态下能否正常实现某些功能
还有纯软件层面(系统不正常)和纯算法(实现某些功能时的满意度)层面。
(Formal Verification)a formal scenario description language: to represent an operational design domain (ODD) ( disa: only focus on one component of the system or a simple scenario, hard for perception component)
scenarios: sensor noise, driving maneuvers
Scenic: a language for scenario specification and scene generation. (Defining and substantiating the terms scene, situation, and scenario for automated driving, Scenarios for development, test and validation of automated vehicles, Ontology based scene creation for the development of automated vehicles)
failures: possible (frequency, realistic), dangerous,
Adaptive stress testing (AST) : find the most likely path from a start state to a failure state in a discrete-time simulator, a Markov decision process. (disa: failure is unavoidable; same types of failures )
Fuzzing (AFL, Fuzz revisited: A re-examination of the reliability of unix utilities and services.), grammar-based fuzzing ( Grammar-based whitebox fuzzing., Evolutionary grammar-based fuzzing.), structure-aware fuzzing ( GRIMOIRE: synthesizing structure while fuzzing., SLF: fuzzing without valid seed inputs.).
RL: Observing a repeated one, necessary for RL to be effective, requires RL to accumulate a huge set of historical data, which can be very time-consuming. In contrast, GA guides the search by trial and error without relying on having a huge set of historical data.
metamorphic testing: if two inputs to a DL system are similar with respect to some human sense, the outputs should also be similar
看到奇怪的名词先在原文搜索, 如果是普通词组合的描述则一般要看原文整段。
论文直接的相关研究一般采用1年前的多一点,更老的更新的都会比较少。
distribution
function
workflow
objective: reward, fitness, loss