githubEdit

Get_prompt_secret

Description

triangle-exclamation

Attack Types

chevron-rightAttack Typeshashtag
  • Direct Solicitation — Simply asking for the secret directly

  • Encoded Solicitation — Requesting the secret in a transformed or encoded format

  • Character-by-Character Extraction — Getting individual characters or subsets of the secret

  • Letter/Character Position Techniques — Asking for specific letters at certain positions

  • Indirect Referencing — Referring to the secret without using restricted terms

  • Pattern Recognition Exploitation — Using examples to establish a pattern the model will follow

  • Repetition Attacks — Repeating similar prompts until the model complies

  • Language Misdirection — Using non-English prompts to bypass constraints

  • Alternate Representation Attacks — Requesting ASCII codes or other representations

  • Formatting Tricks — Using special formatting to bypass filters

  • Creative Context Setting — Creating scenarios where revealing the secret is contextually appropriate

Attack Examples

  • Asking the model to repeat back its instructions

  • Requesting a translation or transformation of the secret (ROT13, base64, etc.)

  • Asking for the first N letters/characters of the secret

  • Getting ASCII codes of characters instead of the characters themselves

  • Using creative delimiters to force interpolation of secrets

  • Providing examples of pattern extraction to prime the model

  • Using non-English language prompts to weaken instruction following

  • Requesting the secret as part of a creative work (poem, song, story)

  • Using code evaluation context to extract information

  • Creating fictional scenarios where revealing the secret is "safe"

  • Using restricted character sets (e.g., only words starting with 'S')

Example Prompts

Last updated