Contextual AI AppSec Review - PivotChip Security inc.

Why Generic AI Scans Fall Short

Feeding a codebase into an LLM and asking it to find vulnerabilities produces findings that are technically plausible but contextually wrong. The model doesn't know which inputs are stored, which routes are public-facing, or what your secrets are named - so it guesses. Most of what it returns is noise, and the real issues get buried in it.

This process allows to minimize false positives. Before any AI analysis runs, we use custom scripts to build a structured, factual picture of what your application actually accepts, stores, and exposes. That evidence becomes the model's working context - not the source code in full, but a precise inventory of what matters for security.

What changes with structured context

When AI is given verified data about specific inputs, confirmed data flows, and mapped configuration exposure, it can reason about actual paths rather than theoretical ones. Findings reference real code locations. False positives drop significantly. Critical issues surface clearly instead of being diluted by generic warnings.

Generic Scan vs. Contextual Review

The same AI model produces very different results depending on what you give it.

Inpute Type	Generic AI Scan	Contextual AI Review
Input to the model	Raw source code or file paths	Structured inventory of inputs, flows, and config exposure
Data flow awareness	Inferred - often incorrect	Verified programmatically before analysis
Finding specificity	Pattern-matched from training data	Tied to actual code locations and variable names
False positive rate	High - validation context is missing	Lower - model sees what's actually adjacent to each input
Secret detection	Pattern matching on naming conventions	Pattern matching plus runtime exposure analysis
Review vectors	Whatever the model prioritizes	OWASP Top 10 applied one at a time to structured data

How It Works

Five sequential steps, each building on the output of the previous one. The first four are programmatic. The fifth is where AI analysis runs.

1

Build the ground truth

Custom scripts parse your backend schemas - Prisma config files, raw SQL migrations - to produce a master list of every defined database column. A separate scan reads the source code for every external data entry point: REST route parameters, JSON request bodies, GraphQL resolver arguments, and incoming webhooks.

The result is two inventories: what the database expects, and what the application accepts from the outside world.

→ Output: database column list, external input list

2

Classify every input

The two inventories are cross-referenced. Each external input is compared against the column list and placed into one of two categories:

Persistent - matches a database column; data this application intends to store. Injection attacks and unauthorized writes live here.
Volatile - no column match; data consumed in transit or reflected back to the client. Reflected vulnerabilities live here.

Knowing which category an input belongs to means the AI can be asked the right questions about it rather than a blanket query that treats all inputs the same way.

→ Output: classified input inventory

3

Verify the data flows

Classification alone is not enough. For each identified input, the tooling extracts 30–40 lines of surrounding code - enough to see what actually happens to the value after it arrives. Pattern matching then looks for concrete storage operations within that window: ORM queries, raw SQL adapters, Redis writes, Kafka producer calls.

Two discrepancies are flagged automatically:

A persistent input with no storage operation in its execution window
A volatile input that appears in a database write

The goal is to replace assumptions about what the code does with verification of what it actually does.

→ Output: verified data flow map with code context

4

Map the configuration surface

Pattern matching scans for embedded environment variables, Vault PKI paths, hardware security module references, and financial credentials. Each extracted value is assessed for exposure risk based on naming conventions and where it ends up at runtime.

Two categories receive immediate flags:

Framework-specific variables with client-side prefixes (e.g., NEXT_PUBLIC_) that contain backend secrets
Hardcoded bearer strings or authentication tokens that appear directly in source without referencing an environment variable

Hardcoded secrets survive credential rotation and persist in version history. This step surfaces them explicitly before AI analysis begins.

→ Output: configuration exposure inventory

5

Run structured AI analysis

The three datasets - classified inputs, verified data flows, configuration exposure - become the model's working context. They are fed in logical chunks sized to the model's context window. The model does not receive the full codebase.

Analysis runs against specific review vectors rather than a broad "find vulnerabilities" prompt. For web applications, this means the OWASP Top 10 is applied one category at a time: broken access control, injection, insecure design, and so on. Each pass asks the model to reason about the structured data against that specific threat class.

Findings produced this way are tied to actual code locations, actual data flows, and actual configuration exposures. Each one can be followed from the entry point to the flagged destination using the extracted code context.

→ Output: finding report with code locations and data flow evidence

Reading the Output

The analysis will surface real problems. It will also surface findings that look alarming but aren't - because validation logic lives in middleware outside the extracted window, or because the AI reasoned incorrectly from incomplete information. Three questions sort them out.

Does the data path actually connect?

Trace the input from the entry point to the flagged destination. If a condition, type check, validation call, or permission gate sits between them, the finding needs further investigation before it can be confirmed.

Is the path reachable without authentication?

A vulnerability on an admin-only endpoint is serious. The same vulnerability on a public webhook is critical. Findings behind specific permission requirements carry lower urgency than findings on unauthenticated routes.

Does the fix already exist elsewhere in the codebase?

Check how similar inputs on similar routes are handled. If the flagged instance is the outlier, it is a real gap. If it matches how everything else is handled, the analysis likely missed shared handling code - a false positive.

What a Real Finding Looks Like

A confirmed finding has three characteristics. Any finding missing one of these warrants closer inspection before action.

Specific location

The finding names the file, function, or route where the input enters the application. Not a class of vulnerability in the abstract - a concrete place in the code.

Confirmed data path

The value can be traced from its entry point to the flagged destination using the extracted code context, with nothing between them that inspects, transforms, or restricts it.

No contradicting code nearby

The surrounding 30–40 lines contain no validation calls, middleware references, or type checks that would handle the input before it reaches the vulnerable operation.

What This Process Covers

The methodology is framework-agnostic. It works wherever the underlying patterns exist in the source.

Input sources

REST route parameters
JSON request bodies
GraphQL resolver arguments
Webhook payloads
Query strings and headers

Storage targets

ORM queries (Prisma, Sequelize, TypeORM)
Raw SQL adapters
Redis writes
Kafka producer calls
File system writes

Configuration types

Environment variables
Vault PKI paths
HSM references
Financial credentials
Hardcoded auth tokens

Start with a Mapping Session

We walk you through the process the first time, build the necessary automation for your stack, and establish a reusable review workflow your team can run on each release.

Schedule a Mapping Session