Dr. Ece Kamar, Microsoft Research AI
Keynote: AI in the Open World: Discovering Blind Spots of AI

With the widespread use of AI systems in many domains that are important for our society, shortcomings of AI systems create concerns around reliability, safety, biases, and trust. Therefore, it is of critical importance to understand why AI systems fail, how these failures can be uncovered through detailed evaluation, and how systems can be continuously improved. In this talk, I will discuss the way blind spots emerge in different learning frameworks due to incompleteness of training. I will present approaches that can effectively combine algorithms and human feedback towards debugging AI systems and overcoming the limitations of AI algorithms towards developing reliable systems.

Ece Kamar is a Principal Researcher in the Adaptive Systems and Interaction Group at Microsoft Research AI and is serving as the Technical Advisor for Microsoft’s advisory committee on AI, ethics, and effects in engineering and research . Ece received her Ph.D. in computer science from Harvard University in 2010. Ece works on developing AI systems that combine the complementary strengths of machines and people. She is passionate about investigating the impact of AI on society and studying ways to develop AI systems that are reliable, unbiased and trustworthy. She has over 40 peer-reviewed publications at the top AI and HCI venues and served in the first Study Panel of Stanford’s 100 Year Study of AI (AI100). She is currently leading internal efforts at Microsoft focusing on improving reliability and safety of AI systems.

Prof. François Terrier, Commissariat a l’Energie Atomique
Invited Talk: Considerations for Evolutionary Qualification of Safety-Critical Systems with AI-based Components

Current certification practices of safety-crtical systems are antiquated and unable to scale with these corrective and evolutionary increments, since AI quality can change in a disproportional way regarding the change factors (e.g., usage context and used data). While qualification/certification of safety-critical systems with AI-based components is still a grand challenge, we explore in this talk the considerations to adopt a more evolutionary qualification/certification approach.

Professor at CEA-INSTN and Director of Research at CEA, François Terrier is the head of the software and sys-tem engineering department of Institut CEA List of CEA Tech. From the world’s top level excellence of its teams in model based engineering and formal methods, the de-partment is in charge of developing methods and tools ensuring system and software conformity with safety and security requirements. François Terrier has a PhD in arti-ficial intelligence and worked 10 years in the domain of expert systems using three-valued, temporal or fuzzy logics. Since 1994, he conducts research on engineering trustworthy software. His research works have produced more than 100 publications. As head of the system and software engineering department, François Terrier is also in charge on development of new activities focused on three new strategic axes: fair and trustworthy distributed systems and block-chains; and trustworthy embedded arti-ficial intelligence; design and validation environment for quantic computing. From 2019, he also takes the lead of CEA’s strategic and traversal program on Trustworthy Artificial Intelligence.

Dr. Sameer Singh, University of California, Irvine
Invited Talk: Evaluating and Testing Natural Language Processing Systems

Current evaluation of the generalization of natural language processing (NLP) systems, and much of machine learning, primarily consists of measuring the accuracy on held-out instances of the dataset. Since the held-out instances are often gathered using similar annotation process as the training data, they include the same biases that act as shortcuts for machine learning models, allowing them to achieve accurate results without requiring actual natural language understanding. Thus held-out accuracy is a poor proxy for measuring generalization, and even further, has little to say about robustness to attacks. In this talk, I will introduce a number of approaches we are investigating to perform a more thorough evaluation of NLP systems. I will first describe automated techniques for perturbing instances in the dataset that identify loopholes and shortcuts in NLP models, including semantic adversaries and universal triggers. I will then describe recent work in creating comprehensive and thorough tests and evaluation benchmarks for NLP that aim to directly evaluate comprehension capabilities. I will cover a number NLP tasks in the talk, including sentiment analysis, textual entailment, and question answering.

Dr. Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine (UCI). He is working primarily on robustness and interpretability of machine learning algorithms, along with models that reason with text and structure for natural language processing. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he also worked at Microsoft Research, Google Research, and Yahoo! Labs. His group has received funding from Amazon, Allen Institute for AI, NSF, DARPA, Adobe Research, and FICO.