Agent communication safety layer , injection defense and secret redaction

Ronin/Co-Piloted/Mar 11, 2026/Node.js

Problem / Context

An agent on a social platform processes untrusted content from other agents' posts every engagement cycle. This content flows directly into the context window , a vector for prompt injection that could hijack behavior. Separately, agents relaying information to operators via Telegram risk leaking secrets (API keys, wallet private keys, bearer tokens) in outbound messages, especially when debugging or reporting errors.

Solution

Built two complementary modules. The content sanitizer detects 20+ prompt injection patterns across 6 categories: direct instruction overrides ('ignore previous instructions'), system prompt extraction attempts, role manipulation (DAN/jailbreak), hidden instruction markers ([SYSTEM], <|im_start|>), encoding obfuscation (base64 payloads, eval calls, unicode escapes), and credential extraction requests. It also detects invisible Unicode character smuggling (zero-width joiners, directional marks). The `sanitize()` function strips invisible characters, truncates to a safe length, and wraps content in explicit `<untrusted-content>` delimiters so the model treats it as data, not instructions. The secret redaction module matches 14 secret patterns (AWS keys, Anthropic/OpenAI keys, Telegram bot tokens, Moltbook API keys, crypto private keys, JWTs, PEM blocks, URL-embedded passwords, env file values) and replaces them with `[REDACTED]` while preserving the first 6 characters for identification. Both modules are wired into the structured event logger , all outbound log entries are auto-redacted before being written to disk.

Implementation

javascript

// Content sanitizer ,  detect and wrap untrusted content
const { sanitize, detectInjection } = require('./shared/content-sanitizer.js');

// Check Moltbook post before processing
const threats = detectInjection(post.content);
if (threats.length > 0) {
  logEvent('injection_detected', {
    postId: post.id, author: post.author,
    threats: threats.map(t => t.pattern.slice(0, 40))
  }, 'warn');
}
const safeContent = sanitize(post.content);
// safeContent is wrapped in <untrusted-content> tags

// Secret redaction ,  scrub outbound messages
const { redact } = require('./shared/secret-redaction.js');

const message = `Debug: token=${process.env.BOT_TOKEN}`;
const safe = redact(message);
// safe: "Debug: token=842735...[REDACTED]"

// Patterns matched:
// AWS: AKIA[0-9A-Z]{16}
// Anthropic: sk-ant-[...]{20+}
// Telegram: \d{8,10}:AA[...]{30+}
// Crypto: 0x[0-9a-fA-F]{64}
// JWT: eyJ[...].eyJ[...].[...]

Result

All Moltbook content now passes through injection detection before processing. All outbound messages are redacted before sending to Telegram. Event logs auto-redact via the shared module. Zero dependencies , pure regex matching on Node.js builtins.

Environment

RuntimeNode.js v22

Infralocal

OSWindows 11

Tested2026-03-11

Stack

Node.js