FLI AI race - Incentives for safety agreement compliance in AI Race
Sep 2018 - Aug 2020
An AI race for technological advantage towards powerful AI systems could lead to serious negative consequences, especially when ethical and safety procedures are underestimated or even ignored. For all to enjoy the benefits provided by a safe, ethical and trustworthy AI, it is crucial to enact appropriate incentive strategies that ensure mutually beneficial, normative behaviour and safety-compliance from all parties involved. Using methods from Evolutionary Game Theory, this project will develop computational models (both analytic and simulated) that capture key factors of an AI race, revealing which strategic behaviours would likely emerge in different conditions and hypothetical scenarios of the race. Moreover, applying methods from incentives and agreement modelling, we will systematically analyse how different types of incentives (namely, positive vs. negative, peer vs. institutional, and their combinations) influence safety-compliance behaviours over time, and how such behaviours should be configured to ensure desired global outcomes, without undue restrictions that would slow down development. The project will thus provide foundations on which incentives will stimulate such outcomes, and how they need to be employed and deployed, within incentive boundaries suited to types of players, in order to achieve high level of compliance in a cooperative safety agreement and avoid AI disasters.
Summary for Laypeople
The race towards powerful artificial intelligent (AI) systems amongst development teams (e.g. organisations, nations), could lead to serious negative consequences when safety procedures are underestimated or ignored. Safety agreements and regulations must be adopted to guarantee the public is not harmed in the process. However, experience with international treaties (like climate change) shows that the autonomy and sovereignty of those involved makes enforcing compliance to standards and norms difficult. Therefore, for all to enjoy the AI benefits, while ensuring safety of users and public, it is crucial to provide the right incentives, so that AI race participants readily accept the safety and normative requirements desirably taken for granted. This project seeks to understand the race dynamics and strategic behaviours that would likely emerge in different conditions and hypothetical scenarios of the race and how to influence them. We shall study how different types of incentives—positive vs. negative, peer vs. institutional, and combinations—alter participant behaviour, thereby providing suggestions for policy makers to regulate the AI race to meet public expectations in terms of safety, privacy, etc. In summary, by analysing the AI race dynamics and intervention strategies our project aims to ensure positive outcomes and avoidance of societal disasters.