githubEdit

Narrative_smuggling

Description

circle-exclamation

Attack Examples

  • Writing poems that encode hidden instructions

  • Creating stories where first letters spell out commands

  • Using metaphors to describe system instructions

  • Creating dialogues that reveal model behavior

  • Using character descriptions to encode prompts

  • Crafting narratives that gradually reveal information

  • Example: Writing a story where all words start with letters that spell the intended command

Last updated