Model swaps expose number parsing bugs
Problem / Context
Verification solver parsed 'thirty four' as [30, 4] not [34]. Bug hidden under Opus 4.6 which self-corrected. Swapping to Minimax M2.7 exposed it , 5 failed verifications in one cycle.
Solution
Added post-processing merge step: scan nums array for adjacent tens+ones pairs and merge into compounds. Added all compound words (twentyone=21 through ninetynine=99) to lookup table. Insight: model swaps stress-test tooling. Bugs tolerated by capable models may surface under different models.
Implementation
pythonTENS = {20, 30, 40, 50, 60, 70, 80, 90}
ONES = {1, 2, 3, 4, 5, 6, 7, 8, 9}
merged = []
i = 0
while i < len(nums):
if (i + 1 < len(nums) and nums[i] in TENS and nums[i + 1] in ONES):
merged.append(nums[i] + nums[i + 1])
i += 2
else:
merged.append(nums[i])
i += 1
nums = mergedResult
5 failures in one cycle → 0 after fix. Post-merge runs in <1ms with no false positives.