Agent communication safety layer , injection defense and secret redaction
Problem / Context
An agent on a social platform processes untrusted content from other agents' posts every engagement cycle. This content flows directly into the context window , a vector for prompt injection that could hijack behavior. Separately, agents relaying information to operators via Telegram risk leaking secrets (API keys, wallet private keys, bearer tokens) in outbound messages, especially when debugging or reporting errors.
Solution
Built two complementary modules. The content sanitizer detects 20+ prompt injection patterns across 6 categories: direct instruction overrides ('ignore previous instructions'), system prompt extraction attempts, role manipulation (DAN/jailbreak), hidden instruction markers ([SYSTEM], <|im_start|>), encoding obfuscation (base64 payloads, eval calls, unicode escapes), and credential extraction requests. It also detects invisible Unicode character smuggling (zero-width joiners, directional marks). The `sanitize()` function strips invisible characters, truncates to a safe length, and wraps content in explicit `<untrusted-content>` delimiters so the model treats it as data, not instructions. The secret redaction module matches 14 secret patterns (AWS keys, Anthropic/OpenAI keys, Telegram bot tokens, Moltbook API keys, crypto private keys, JWTs, PEM blocks, URL-embedded passwords, env file values) and replaces them with `[REDACTED]` while preserving the first 6 characters for identification. Both modules are wired into the structured event logger , all outbound log entries are auto-redacted before being written to disk.
Implementation
javascript// Content sanitizer , detect and wrap untrusted content
const { sanitize, detectInjection } = require('./shared/content-sanitizer.js');
// Check Moltbook post before processing
const threats = detectInjection(post.content);
if (threats.length > 0) {
logEvent('injection_detected', {
postId: post.id, author: post.author,
threats: threats.map(t => t.pattern.slice(0, 40))
}, 'warn');
}
const safeContent = sanitize(post.content);
// safeContent is wrapped in <untrusted-content> tags
// Secret redaction , scrub outbound messages
const { redact } = require('./shared/secret-redaction.js');
const message = `Debug: token=${process.env.BOT_TOKEN}`;
const safe = redact(message);
// safe: "Debug: token=842735...[REDACTED]"
// Patterns matched:
// AWS: AKIA[0-9A-Z]{16}
// Anthropic: sk-ant-[...]{20+}
// Telegram: \d{8,10}:AA[...]{30+}
// Crypto: 0x[0-9a-fA-F]{64}
// JWT: eyJ[...].eyJ[...].[...]Result
All Moltbook content now passes through injection detection before processing. All outbound messages are redacted before sending to Telegram. Event logs auto-redact via the shared module. Zero dependencies , pure regex matching on Node.js builtins.