Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

According to Lin et al. (2025), AI safety and AI security differ in the following way:

Main documents on AI safety and AI security

AI Safety risks from the AI Safety report

Selected risks from the AI safety report relevant in research context

  1. Risks from malicious use

    • Harm to individuals through fake content. AI-generated fake content can be used to manipulate governance processes, sabotage collaborations, or personally target researchers and undermine trust in science.

    • Manipulation of public opinion. AI-generated fake content and narrative manipulation can directly target researchers, research projects, research ideas, and public trust in science.

    • Cyber offence. AI-assisted cyber attacks may affect research infrastructures, collaborations, and sensitive scientific outputs.

    • Biological and chemical attacks.

  2. Risks from malfunctions

    • Reliability issues. This can lead violations of research ethics, research integrity, and research governance due to irresponsible use of AI systems having reliability issues.

    • Bias. Biased data, models, and AI systems can lead to discriminatory or misleading results violating research ethics principles. Researchers have an ethical duty to identify, mitigate, and transparently report bias.

  3. Systemic risks

    • Risks to the environment. Key ethical question for research projects: Is the environmental cost of using an AI system justified by the expected scientific and societal benefit?

    • Risks to privacy. This can lead to violations of research ethics, research integrity, and research governance.

    • Risks of copyright infringement. This can lead to violations of research ethics, research integrity, and research governance.

  4. Impact of open-weight general-purpose AI models on AI risks.

Risks from open-weight general-purpose AI models on AI risks

The AI safety report has the following points:

From the other side:

AI Security Exchange

Threat model: Types of threats

Three types of threats (https://owaspai.org/goto/threatsoverview):

  1. threats during development-time: when data is obtained and prepared, and the model is trained/obtained. Example: data poisoning (injecting bad data into the training data)

  2. threats through using the model: through inference; providing input and getting the output. Examples:

    • direct prompt injection (malicious prompt into the user interface),

    • indirect prompt injection (malicious prompt is embedded in external content) or

    • evasion (hidden malicious instructions via obfuscation, encoding, hidden text, and payload splitting)

  3. other threats to the system during runtime: in operation - not through inference. Example: stealing model input

Threat model: Impacts

6 types of impacts that align with three types of attacker goals (disclose, deceive and disrupt):

  1. disclose: hurt confidentiality of train/test data

  2. disclose: hurt confidentiality of model Intellectual property (the model parameters or the process and data that led to them)

  3. disclose: hurt confidentiality of input data

  4. deceive: hurt integrity of model behaviour (the model is manipulated to behave in an unwanted way and consequentially, deceive users)

  5. disrupt: hurt availability of the model (the model either doesn’t work or behaves in an unwanted way - not to deceive users but to disrupt normal operations)

  6. disrupt/disclose: confidentiality, integrity, and availability of non AI-specific assets

Threats to agentic AI

Threats:

References
  1. Lin, Z., Sun, H., & Shroff, N. (2025). AI Safety vs. AI Security: Demystifying the Distinction and Boundaries. arXiv. 10.48550/ARXIV.2506.18932
  2. Hall, P., Mundahl, O., & Park, S. (2025). The Pitfalls of “Security by Obscurity” and What They Mean for Transparent AI. Proceedings of the AAAI Conference on Artificial Intelligence, 39(27), 28042–28051. 10.1609/aaai.v39i27.35022