An alternative but related concept to creating metrics is to create benchmarks that must be passed. Ideally, not all of the benchmarks would require evaluation on the real hardware platform, but could include the use of logs and simulations.
- How can we best leverage data logs to evaluate self-driving algorithms?
- How to evaluate the “value” of a specific data log?
- What are the right annotations to these data logs?
- How can we best leverage simulation to evaluate self-driving algorithms?
- How to compose component-level benchmarks into system-level benchmarks?
- How to decompose a system-level benchmark into measurable and tractable benchmarks on the component level?