The honest tell
· Passive webdriver flagsWhat the test does
test('Level 1 sign in', async ({ page }) => {
await page.goto('/bot-detection/level-1/');
await page.getByLabel('Email').fill('[email protected]');
await page.getByLabel('Password').fill('hunter2');
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page.getByText('Access granted')).toBeVisible();
}); What Playwright sees
Error: expect(locator).toBeVisible() failed
Locator: getByText('Access granted')
Expected: visible
Received: hidden
Timeout: 5000ms Plain-English explanation (click to expand)
Browsers volunteer a lot about themselves to every site they visit — what version they are, what extensions are loaded, whether they are being controlled by an automation program. When Playwright drives a browser, the browser honestly admits "I am being automated" through a flag called navigator.webdriver that any site can read in a single line of JavaScript. Stock Playwright also has no plugins installed, no notification permissions set, and identifies itself as "HeadlessChrome" in its version string. Each of these is a yes/no question a site can ask in milliseconds.
The browser inside a VNC session is a regular, fully-fledged Chrome that a regular user started. Nothing is automating it from the inside — the automation happens outside the browser, at the operating-system level, by moving a mouse and pressing keys on a remote desktop. The browser does not know it is being driven, so none of these flags get set, and it reports back the same values any real human visitor would.
Playwright context — could this test be fixed in Playwright? (click to expand) 2/5 · Stealth-plugin arms race
Verdict: technically patchable, but it's an arms race that the page always wins eventually.
Each of the five remaining signals can in principle be spoofed from Playwright:
navigator.webdrivercan be hidden via--disable-blink-features=AutomationControlledplus anaddInitScriptthat redefines the property.- The User-Agent can be spoofed with
--user-agent="..."to strip theHeadlessChrometoken. navigator.plugins,navigator.languages, and theNotification.permission/permissions.querypair can all be patched viaObject.definePropertyin an init script.
Off-the-shelf stealth bundles (playwright-extra + puppeteer-extra-plugin-stealth) ship most of these patches already. The catch: every Chrome release introduces new tells, and commercial bot-detection vendors (Cloudflare, DataDome, PerimeterX, Imperva) maintain fingerprint databases of every known stealth-plugin signature. You spend more time updating your evasions than writing tests, and you only ever win temporarily.
AIVA context — what would need to change in AIVA to pass this (click to expand) 1/5 5/5
AIVA fails this level because of one signal: navigator.webdriver = true. AIVA launches Chrome via Puppeteer in aiva-node/src/control-server/src/browser.ts:204 (puppeteer.launch({...})), and any browser attached via CDP has this flag set automatically by Chrome itself.
The pragmatic fix is a single init script. Add this to AIVA's page-setup flow (e.g., next to the existing hideCursorScript wiring):
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
configurable: true,
});
});
Bot Arena's L1 check is literally navigator.webdriver === true → FAIL. Returning undefined makes the check pass. This is exactly what every stealth plugin does (puppeteer-extra-plugin-stealth, playwright-extra-stealth, etc.). The original "multi-week refactor" estimate was for the architecturally pure fix — replacing Puppeteer/CDP entirely with a non-CDP control plane. That's the right answer if you need to pass sophisticated bot-detection vendors that fingerprint the shape of navigator.webdriver (own vs prototype descriptor, getter behaviour, etc.). For Bot Arena and most "naive equality check" detection layers, the 5-line patch is sufficient.
Trade-off: the init-script patch is detectable by sites that audit property descriptors. If AIVA's target customers operate sites with enterprise-grade detection, the architectural path becomes the right long-term investment. For this demo and a wide class of real-world cases, the patch is the right answer today.
Why it failed — Detection Log
- fail webdriver — navigator.webdriver = true
- fail plugins — navigator.plugins.length = 0 (expected > 0)
- pass languages — navigator.languages = [en-US]
- fail ua-headless — User-Agent contains "HeadlessChrome/148.0.7778.96"
- pass notif-permission — Notification.permission and permissions.query agreed