Keynote 1: Distribution-aware Test Adequacy for Neural Networks
Prof. Dr. Matthew Dwyer (University of Virginia)

Matthew B. Dwyer is the Robert Thomson Distinguished Professor in the Department of Computer Science at the University of Virginia. He has authored more than 140 scholarly publications in program analysis, software specification, and automated formal methods. His research contributions have been recognized with the ICSE “Most Influential Paper” award (2010), the SIGSOFT “Impact Paper” award (2010), the ESEC/FSE “Test of Time” award (2018), the SIGSOFT “Impact Paper” award (2021). He was named an ACM Distinguished Scientist (2007), a Fulbright Research Scholar (2011), an IEEE Fellow (2013), a Parnas Fellow (2018), and an ACM Fellow (2019). He is most proud of his work with his 18 PhD students.

Testing a neural network (NN) is a critical step in determining whether it meets its deployment requirements. This traditionally involves assessing test accuracy to judge how well the learned model generalizes, but it can also include assessment of robustness, fairness, and other properties. A key challenge in testing any software system is determining when to stop the process, but for critical systems it is just as important to provide evidence about how thoroughly testing covers possible system behavior.

Since NNs are trained to produce a statistically accurate function approximation for data drawn from a target data distribution, we contend that testing efforts should be assessed in terms of how well they cover the data distribution.

In this talk, we first describe how prior work that ignores the data distribution can lead to wasted testing effort and inaccurate characterizations of the thoroughness of testing. We then describe a new approach, called “input distribution coverage” (IDC), that formulates test adequacy measures over the latent distribution of generative models to avoid these problems. IDC is the first systematic black-box coverage criteria for assessing the adequacy of NN test suites. Moreover, IDC’s configurable design allows it to support high-confidence applications and test suite augmentation.

Keynote 2: Towards Certification of Machine Learning in Aeronautical Applications
Dr. Ganesh Pai (KBR / NASA Ames Research Center)

Dr. Ganesh Pai is a Technical Fellow for System Safety Assurance at KBR, Inc., and a contractor member of the scientific staff in the Robust Software Engineering technical area of the Intelligent Systems Division at NASA Ames Research Center. His professional practice supports the safe engineering and operations of unmanned aircraft systems, while his research interest is aerospace systems and software dependability, with a focus on safety. Ganesh has authored more than 40 articles on dependability and safety in systems and software engineering, and he has served on the program committees of numerous conferences and workshops on those topics. Dr. Pai is actively involved in safety standardization efforts through the IEEE Functional Safety Standards Committee, the IEEE Systems and Software Engineering Standards Committee in the Software Safety Working Group, and the SAE G-34 Artificial Intelligence in Aviation Committee in the System Safety and Process Considerations Subgroups.

Machine learning (ML)-based functionality intended for use in aeronautical applications will require certification/approval (C/A) before they can be deployed into operational service. Amongst the key steps in C/A is to establish acceptable means and methods of compliance to the applicable regulations. The underlying objective is the provision of assurance that ML-based applications will perform as intended. The assurance landscape for this purpose can be categorized broadly as based on development assurance and, more contemporarily, assurance argumentation. In this talk, we will first explore the available guidance and standards constituting this assurance landscape, examining their scope and applicability to different types of ML. We will then present ongoing work on Dynamic Assurance Cases as a framework that leverages principles from both development assurance and assurance argumentation, reflecting on its suitability as a means of compliance in C/A of ML-based applications.

Invited Talk 1: Quantifying Misalignment Between Agents
Prof. Dr. Shiri Dori-Hacohen (University of Connecticut)

Prof. Shiri Dori-Hacohen is an Assistant Professor at the Department of Computer Science & Engineering at the University of Connecticut, where she leads the Reducing Information Ecosystem Threats (RIET) Lab. She is also the Founder & Chair of the Board at AuCoDe. Prof. Dori-Hacohen’s research focuses on threats to the information ecosystem online and to healthy public discourse from an information retrieval lens, informed by insights from the social sciences. Her research has been funded by the National Science Foundation (NSF) and Google, among others; she has served as PI or Co-PI on over $1.7M worth of federal funds from the NSF. Her career in both academia and industry spans Google, Facebook, and University of Massachusetts, Amherst among others. She received her M.Sc. and B.Sc. (cum laude) at the University of Haifa in Israel and her M.S. and Ph.D. from the University of Massachusetts Amherst where she researched computational models of controversy. Prof. Dori-Hacohen is the recipient of several prestigious awards, including the 2011 Google Lime Scholarship, and first place at the 2016 UMass Amherst’s Innovation Challenge. She has taken an active leadership role in broadening participation in Computer Science on a local and global scale.

Growing concerns about the AI alignment problem have emerged in recent years. Previous work has mostly been qualitative in its description of the alignment problem and/or has attempted to align AI actions with human interests by focusing on value specification and learning. However, we still lack a systematic understanding of how misalignment should be defined and measured. Prior work on controversy in computational social science offers a mathematical model of contention among populations (of humans), which seems to offer a promising avenue with regards to misalignment. In this talk, we will present a work in progress that extends the mathematical model quantifying contention, in order to apply it to quantify misalignment among any population of agents (human or otherwise).

Invited Talk 2: A System Safety Perspective for Developing and Governing Artificial Intelligence
Dr.ir. Roel Dobbe (TU Delft)

Dr.ir. Roel Dobbe is an Assistant Professor working at the intersection of engineering, design and governance of data-driven and algorithmic control and decision-making systems. Roel holds a PhD in Control, Intelligent Systems and Energy from the Department of Electrical Engineering and Computer Sciences at the University of California Berkeley. He was an inaugural postdoc for the AI Now Institute at New York University. His main interests are in modernizing critical infrastructure and sensitive decision-making through understanding and data-driven tools for analysis and decision-making that enable safe, sustainable and democratically just societal systems. He works closely with public, private and societal organizations, with an emphasis on energy systems and public services.

The rise of AI systems across societal domains and public organizations and infrastructures contributes to new safety hazards. While these lead to both grave and pernicious harm and loss, there is still a lack of consensus on how to diagnose and eliminate new hazards. For decades, the field of system safety has dealt with accidents and harm in safety-critical systems governed by varying degrees of software-based automation and decision-making. This field understands safety and other values as socio-technical and emergent system properties requiring integral approaches to instantiate these across the design and operation of a systems as well as its institutional context. This chapter draws implications for safeguarding AI systems from seven lessons drawn up by system safety pioneer Nancy Leveson, exemplified with recent AI system accidents. These lessons make crystal clear that safeguarding AI systems requires a transdisciplinary AI system safety approach that establishes both institutional safety control structures and organizational cultures that empower all involved and affected stakeholders to identify hazards and translate these to concrete safety constraints in system design, operation and governance. The system safety canon thereby offers a welcome and actionable set of tools and design principles to broadening more technical conceptions of AI safety and bridge the gap towards preventing emerging harms and hazards in AI systems.

Link to handbook chapter that this technical talk proposal is based on:

Invited Talk 3: Safety in AI-Enabled Warfare Decision Aids
Dr. Bonnie W. Johnson (Naval Postgraduate School)

Dr. Bonnie Johnson is a senior lecturer with the Systems Engineering Department at the Naval Postgraduate School (NPS). She has a BS in physics from Virginia Tech, an MS in systems engineering from Johns Hopkins, and a PhD in systems engineering from NPS. She leads research projects for ONR, OPNAV, Army, and Marine Corps sponsors in directed energy laser weapon systems, automated battle management aids, artificial intelligence, predictive analytics, and complex systems engineering. Prior to working at NPS, she was a senior systems engineer with Northrop Grumman and SAIC, working on Naval and Joint air and missile defense systems.

Advances in computational technologies and Artificial Intelligence (AI) methods present new opportunities for developing automated tactical decision aids to support warfighters making weapons engagement decisions. Tactical decisions become increasingly complex and can overwhelm human decision-making as threats increase in number, speed, diversity, and lethality. The deployment of such AI-enabled decision aids must consider system safety. This study explored the potential safety risks and failure modes that may arise as automation and AI technologies are introduced and implemented to support human-machine weapons engagement decisions. The Naval Postgraduate School is studying safety risks and failure modes, root causes, and mitigation and engineering strategies to prevent, address, and recover from this new class of possible safety failures.

The study includes five research initiatives: (1) a holistic study of safety risks, root causes, and mitigation strategies related to the introduction of AI into warfare decision aids, (2) analysis of the decision risk involved in human-machine teaming for making weapon engagement decisions, (3) analysis of human-machine trust for AI-enabled weapon engagement decisions, (4) an evaluation of the safety risks in implementing AI as a decision aid for air and missile defense decisions, and (5) a study of military data gathering and management activities and how they relate to AI safety. This technical talk will discuss the five research initiatives and report on their findings to date.