AI System Privacy Audit: Secrets, Tokens, and API Keys¶
System in scope: doc_quality_compliance_check — application secret management (SECRET_KEY, DATABASE_URL, ANTHROPIC_API_KEY, PERPLEXITY_API_KEY), session token lifecycle, password hashing, recovery token flow, bootstrap credential provisioning, and Docker Compose database credentials.
1. System Diagram¶
Secrets-relevant architecture facts used in this risk sheet:
- All sensitive configuration values are loaded from environment variables via Pydantic
BaseSettings(config.py); a.envfile is supported for local development (.gitignored). SECRET_KEY: used as the HMAC secret for hashing session tokens and password recovery tokens. Default value is"change-me-in-production". Amodel_validatorinSettingsraisesValueErrorif the default is detected inenvironment == "production"— but not instaging.DATABASE_URL: embeds database username and password in a connection string; default containsCHANGE_MEplaceholder. No secrets manager integration is implemented.ANTHROPIC_API_KEY/PERPLEXITY_API_KEY: optional; empty string default means the LLM path is silently skipped when not configured. No key rotation, expiry check, or scoping mechanism is implemented.- Password storage: PBKDF2-HMAC-SHA256 with 240 000 iterations, random 16-byte salt per password, stored as
pbkdf2_sha256$<iterations>$<salt_b64>$<hash_b64>— a sound custom implementation, but not using a vetted library (e.g.,passlib,argon2-cffi). - Session tokens: generated with
secrets.token_urlsafe(48), HMAC-hashed (SHA-256) withSECRET_KEYbefore DB persistence; raw token sent to browser as HTTP-only cookie. Session table retains all rows (including revoked) indefinitely. - Recovery tokens: generated with
secrets.token_urlsafe(48), hashed withSECRET_KEY+SHA-256, stored inpassword_recovery_tokenstable; a development debug flag (auth_recovery_debug_expose_token) returns the raw plaintext token in the API response body — conditional onenvironment == "development". - Docker Compose (
docker-compose.yml): PostgreSQL service usesPOSTGRES_PASSWORD: postgres— a weak, hardcoded credential in a committed file. - Service-to-service API key (
X-API-Key/Authorization: Bearer): validated againstSECRET_KEYvalue — the same secret serves dual duty as both the session-token HMAC key and the service-client API key. - No secrets manager (Vault, AWS Secrets Manager, Azure Key Vault) or secret rotation mechanism is implemented.
2. Data Flow Analysis¶
| Data Flow | Source | Destination | Encrypted? | Logged? | Priority |
|---|---|---|---|---|---|
SECRET_KEY loaded from env / .env file at startup |
.env file or process environment |
Settings singleton in memory → session_auth.py, security.py, auth.py |
Process memory (no encryption at rest in env/file) | Not logged; used as HMAC material | High |
DATABASE_URL (with embedded password) loaded at startup |
.env file or process environment |
SQLAlchemy engine connection pool | TLS in-transit to DB (depends on URL param); credential in plaintext in DATABASE_URL string |
Not logged directly; may appear in debug/error traces if SQLAlchemy prints the full URL | High |
ANTHROPIC_API_KEY / PERPLEXITY_API_KEY loaded at startup |
.env file or process environment |
Settings singleton → model adapter → external API call header |
In-transit (HTTPS/TLS); key never returned in API responses | Not logged; could appear in provider SDK debug traces if LOG_LEVEL=DEBUG |
High |
| Raw session cookie issued to browser | create_server_session() → set_session_cookie() |
Browser HTTP-only cookie | In-transit (TLS); HTTP-only, secure flag enforced outside development |
Raw token not persisted; only session_token_hash stored in DB |
High |
| Session token hash stored in PostgreSQL | session_auth.py |
user_sessions table — session_token_hash column |
At-rest DB controls | Not separately logged; row retained after expiry/revocation | High |
Service-client API key validated against SECRET_KEY |
X-API-Key or Authorization: Bearer header |
require_api_auth() in security.py — constant-time comparison via hmac.compare_digest |
In-transit (HTTPS/TLS) | Not logged; single shared secret for all service clients — no per-client identity | High |
| Raw recovery token returned in API response (development debug mode) | auth.py → POST /api/v1/auth/recovery/request |
Response body: { "debug_token": "<raw>", "reset_url": "..." } |
In-transit (TLS in dev only if localhost); but raw secret in plaintext HTTP body |
Not logged to audit_events; only token hash and expires_at persisted |
High |
| Recovery token hash stored in DB | auth.py |
password_recovery_tokens table — token_hash column |
At-rest DB controls | auth.recovery.requested event in audit_events |
Medium |
| Bootstrap user credentials provisioned at startup | AUTH_MVP_EMAIL, AUTH_MVP_PASSWORD env vars → AppUserORM row |
app_users table (password hashed with PBKDF2) |
Env var source; hash stored at-rest | Startup event; plaintext password value is in env/config only | High |
POSTGRES_PASSWORD: postgres in committed docker-compose.yml |
Repository source control | Docker Compose → PostgreSQL container | Not encrypted (dev Docker network); hardcoded in version-controlled file | Committed to repository history | High |
Corrected interpretation for data privacy and GDPR¶
- GDPR does not directly regulate secrets, but secret compromise directly enables data breaches, which must be reported under GDPR Art. 33. Every secret mismanagement risk in this sheet is therefore a potential GDPR breach enabler.
- The
SECRET_KEYis a root-of-trust credential: if leaked, all existing session tokens can be forged (HMAC verification bypassed), all recovery tokens can be computed, and all service-client access can be replicated. This makesSECRET_KEYrotation the single highest-impact hardening action. - The dual use of
SECRET_KEYas both the session HMAC key and the service-client API key means a service-client credential leak also compromises session token integrity — and vice versa. - The
DATABASE_URLcontaining an embedded password is a known OWASP A02:2021 (Cryptographic Failures / Secret Exposure) pattern; if the URL appears in a stack trace, log file, or debug output it constitutes an unintended credential disclosure.
3. Sensitive Data¶
Sensitive Data: SECRET_KEY — Root-of-Trust Session HMAC and Service API Secret¶
- Category: Cryptographic secret / system credential
- Examples:
SECRET_KEY=<64-char hex>in.env; used in_hash_session_token(),_hash_recovery_token(),require_api_auth()— same value for all three purposes - Why Sensitive: Compromise enables session token forgery (full account takeover for any user), recovery token prediction (password reset for any user), and unauthorised service-client access to
skills/*andobservability/*endpoints - Current Protection: Env var /
.envfile (.gitignored); production startup guard rejects default value; not logged - Risk (or Harm) if Exposed: Full system compromise; GDPR Art. 33 breach notification required; all user sessions must be invalidated; all recovery tokens must be rotated
Sensitive Data: DATABASE_URL Embedding Plaintext Password¶
- Category: Infrastructure credential / connection string
- Examples:
postgresql+psycopg2://dbuser:CHANGE_ME@localhost:5432/doc_quality; default inconfig.pywithnosec B105suppression; also in.env.example - Why Sensitive: Database credential grants direct read/write access to all personal data:
app_users,user_sessions,audit_events,quality_observations,bridge_human_reviews— the full system data estate - Current Protection:
.envfile excluded from version control;CHANGE_MEplaceholder in default; no rotation mechanism - Risk (or Harm) if Exposed: Direct database access bypassing all RBAC; bulk personal data exfiltration; modification or deletion of audit trail; GDPR Art. 33 major breach
Sensitive Data: ANTHROPIC_API_KEY / PERPLEXITY_API_KEY — External Provider Credentials¶
- Category: Third-party API credentials
- Examples:
ANTHROPIC_API_KEY=sk-ant-...;PERPLEXITY_API_KEY=pplx-... - Why Sensitive: Compromise enables unauthorised model usage billed to the organisation; more critically, the API key is sent in the
Authorizationheader of every model call — if an attacker can replay this key they can submit arbitrary prompts to the provider's API, potentially exfiltrating system prompts or bypassing rate limits - Current Protection: Env var /
.envfile; empty string default (LLM path disabled if not set); not logged atINFOlevel - Risk (or Harm) if Exposed: Unauthorised model API usage and cost; prompt injection via replayed API calls; no per-call or per-context scoping of the key
Sensitive Data: POSTGRES_PASSWORD: postgres Hardcoded in docker-compose.yml¶
- Category: Infrastructure credential committed to version control
- Examples:
POSTGRES_PASSWORD: postgreson line 12 ofdocker-compose.yml - Why Sensitive: Hardcoded weak credential is committed to the repository; any person with repository read access has the development database password; if the same password is accidentally reused in staging or production (common operational error), database access is trivially compromised
- Current Protection: Intended for local development only; Docker network not exposed beyond
localhostin default config - Risk (or Harm) if Exposed: Credential reuse in non-development environments; repository access = database credential; OWASP A07:2021 (Identification and Authentication Failures)
Sensitive Data: Recovery Token Debug Exposure (auth_recovery_debug_expose_token)¶
- Category: Authentication token exposed in API response body
- Examples:
{ "debug_token": "<raw_token_urlsafe_48>", "reset_url": "http://localhost:3000/reset-access?token=..." }returned byPOST /api/v1/auth/recovery/requestwhenAUTH_RECOVERY_DEBUG_EXPOSE_TOKEN=true - Why Sensitive: A raw, single-use password-reset token returned in plaintext in an HTTP response body is a high-value target; if the flag is accidentally enabled outside a development sandbox (e.g., in a shared staging environment), any observer of the response (log aggregator, proxy, developer tooling) captures a live reset credential
- Current Protection: Conditional on
environment == "development" AND auth_recovery_debug_expose_token == True; default isFalse - Risk (or Harm) if Exposed: Account takeover via captured recovery token; GDPR Art. 33 breach notification required if a real user's account is affected
4. Privacy Risks¶
Risk 1: SECRET_KEY serves dual purpose — session HMAC and service-client API key¶
- Priority: High
- Risk Category: Cryptographic key management and separation of concerns
- GDPR Reference: Art. 32 — appropriate technical measures; Art. 33 — breach notification (key compromise = system-wide breach)
- Potential Harm/Impact: A single leaked
SECRET_KEYvalue simultaneously: (a) allows session token forgery for any user account, (b) allows recovery token prediction enabling password reset for any user, © grants service-client access toskills/*andobservability/*endpoints. There is no way to rotate service-client credentials without also invalidating all active user sessions — no surgical incident response is possible - Ability to Implement Control: High
- Recommended controls:
- Introduce separate secrets:
SESSION_HMAC_KEY(session token HMAC),RECOVERY_HMAC_KEY(recovery token HMAC),SERVICE_API_KEY(service-client authentication) — three independent values, independently rotatable. - Add a
SECRET_KEY_VERSIONor key ID mechanism so that token hashes can identify which key generation they were produced with, enabling zero-downtime rotation. - Document a key rotation runbook: how to rotate each key independently, what sessions are invalidated, and how service clients are notified.
Risk 2: No startup guard prevents use of default SECRET_KEY in staging¶
- Priority: High
- Risk Category: Configuration hardening and environment parity
- GDPR Reference: Art. 32 — security of processing; Art. 25 — data protection by design
- Potential Harm/Impact: The
model_validatorinSettingsonly rejects the default"change-me-in-production"whenenvironment == "production". A staging environment withenvironment = "staging"and the default key is fully operational — all sessions are signed with a publicly known key value, making every user session trivially forgeable - Ability to Implement Control: High
- Recommended controls:
- Extend the
validate_security_defaultsvalidator to also reject default secret values whenenvironment in ("staging", "production"). - Add the same guard to
AUTH_MVP_PASSWORD(CHANGE_ME_BEFORE_USE) andDATABASE_URLpassword (CHANGE_ME) for non-development environments. - Consider a CI pre-deployment check that reads the environment's
SECRET_KEYand fails the pipeline if it matches any known default.
Risk 3: DATABASE_URL embeds plaintext database password in a config string¶
- Priority: High
- Risk Category: Credential exposure in configuration — OWASP A02:2021
- GDPR Reference: Art. 32 — appropriate technical measures; Art. 33 — breach enabler
- Potential Harm/Impact: SQLAlchemy may include the full
DATABASE_URLin exception messages and debug output; ifLOG_LEVEL=DEBUGor an unhandled exception is logged, the database password appears in the log stream and potentially in the log aggregator (see Risk Sheet: Structured Logging, Risk 3); the placeholderCHANGE_MEinconfig.pyand.env.examplerisks being reused directly - Ability to Implement Control: Medium
- Recommended controls:
- Use a secrets manager (Vault, AWS Secrets Manager, Azure Key Vault) to inject
DATABASE_URLat runtime rather than sourcing from a.envfile. - As an intermediate measure, configure SQLAlchemy with
hide_parameters=Trueon the engine to prevent credential exposure in exception output. - Add a startup validator: if
DATABASE_URLcontains the literal stringCHANGE_ME, reject startup with a clear error in all non-development environments. - Use PostgreSQL's
PGPASSFILE(.pgpass) orPGPASSWORDenvironment separation instead of embedding the password in the URL.
Risk 4: Hardcoded POSTGRES_PASSWORD: postgres committed to version control¶
- Priority: High
- Risk Category: Credential committed to source control — OWASP A07:2021
- GDPR Reference: Art. 32 — security of processing
- Potential Harm/Impact: Any developer or automated system (CI/CD, GitHub Actions, Dependabot) with repository read access has the development database credential; if this password is reused — even accidentally — in a non-development environment, the database is trivially accessible; the credential is also in repository history and cannot be removed without a full
git filter-branch/ BFG rewrite - Ability to Implement Control: High
- Recommended controls:
- Replace the hardcoded value with an environment variable reference in
docker-compose.yml:POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}— keepingpostgresas a local-only default that is never committed as a required credential. - Add a
docker-compose.override.ymlor.env.dockerfile (both.gitignored) for developer-specific values. - Add a pre-commit hook or CI secret-scanning step (e.g.,
trufflehog,gitleaks) to prevent future hardcoded secrets from reaching the repository.
Risk 5: Recovery token debug exposure flag — risk of misuse outside development sandbox¶
- Priority: High
- Risk Category: Debug backdoor in authentication flow
- GDPR Reference: Art. 32 — security of processing; Art. 33 — breach notification if exploited
- Potential Harm/Impact:
AUTH_RECOVERY_DEBUG_EXPOSE_TOKEN=truereturns a live, single-use password-reset token in the HTTP response body. If set on a shared staging environment (shared with QA engineers, product reviewers, or frontend developers), any developer with access to browser developer tools or a shared proxy captures a live credential for any user who requests a password reset during that period - Ability to Implement Control: High
- Recommended controls:
- Add an additional guard: refuse to activate this flag if more than one user record exists in
app_users(i.e., only safe in single-user bootstrap scenarios). - Add a startup warning log entry when this flag is enabled:
logger.warning("SECURITY_WARNING: auth_recovery_debug_expose_token is enabled — DO NOT USE IN SHARED ENVIRONMENTS"). - Remove the flag from
.env.exampleentirely so it is never accidentally copied into a shared environment configuration. - Consider replacing this debug pattern with a test-only token endpoint that is compiled out of production builds.
Risk 6: No API key scoping, rotation, or per-client identity for service-to-service authentication¶
- Priority: Medium
- Risk Category: Service credential lifecycle and least-privilege
- GDPR Reference: Art. 25 — data protection by design; Art. 32 — security of processing
- Potential Harm/Impact: A single
SECRET_KEY-based API key authenticates all service clients (CrewAI orchestrator, automation tools, test harnesses) with theservicerole. There is no per-client identity, no expiry, no revocation mechanism, and no audit trail distinguishing which service client made a given request. If the key is shared with a third-party integration or leaked from one service, all service clients are compromised simultaneously - Ability to Implement Control: Medium
- Recommended controls:
- Issue per-client API keys (e.g.,
ORCHESTRATOR_API_KEY,AUTOMATION_API_KEY) each stored separately; validate against a key registry rather than a single shared secret. - Assign each service client a unique
actor_id(e.g.,orchestrator-v1,ci-automation) in therequire_api_authresponse so that service-client requests are attributable inaudit_events. - Add key expiry and a rotation schedule (e.g., 90-day rotation enforced via startup validator).
Risk 7: PBKDF2 password hashing is a custom implementation without a vetted library¶
- Priority: Medium
- Risk Category: Cryptographic implementation risk
- GDPR Reference: Art. 32 — appropriate technical measures
- Potential Harm/Impact: The custom PBKDF2-HMAC-SHA256 implementation in
passwords.pyis functionally correct (240 000 iterations, 16-byte random salt, HMAC-based constant-time comparison), but custom cryptographic code carries implementation risk: iteration count is hardcoded (not configurable for future increases), there is no algorithm migration path if PBKDF2 needs to be replaced with Argon2id, and the format stringpbkdf2_sha256$...mimics Django's format but is not interoperable with standard tools - Ability to Implement Control: High
- Recommended controls:
- Replace the custom implementation with
passlib(withargon2backend) orargon2-cffidirectly — both are vetted, actively maintained libraries with built-in migration support. - If PBKDF2 must be retained, make
iterationsa configurable setting (PASSWORD_HASH_ITERATIONS) and add a startup validator that rejects values below 260 000 (the 2023 OWASP minimum). - Add a password rehash-on-login path: when a user successfully logs in and their stored hash uses outdated parameters, silently re-hash with current parameters.
5. Cross-Sheet Consistency¶
| Control Area | Related Risk Sheet | Alignment Required |
|---|---|---|
SECRET_KEY rotation impacts all active sessions |
Risk Sheet 2 (RBAC, Risk 2) | Session table purge (recommended in sheet 2) must be coordinated with key rotation — revoke all sessions before rotating SECRET_KEY |
DATABASE_URL password in log traces |
Risk Sheet 3b (Structured Logging, Risk 2) | DEBUG log guard must cover SQLAlchemy engine initialisation path to prevent URL (with password) from appearing in debug output |
ANTHROPIC_API_KEY / PERPLEXITY_API_KEY in provider SDK debug output |
Risk Sheet 3b (Structured Logging, Risk 2) | Provider SDK log suppression must cover API key header values |
| Service-client API key — single shared secret | Risk Sheet 2 (RBAC, Risk 5) | Per-client API keys enable attribution of service-client access to observability/* (Risk 5 sheet 2) — dependent fix |
| Recovery token debug flag | Risk Sheet 3b (Structured Logging) | If debug_token is ever accidentally logged (e.g., by a request-body logger middleware), it constitutes a credential exposure in the log stream — body logging must never capture auth endpoint responses |
POSTGRES_PASSWORD in docker-compose |
Risk Sheet 1 (Model Providers, Risk 1) | On-prem model migration requires a private network DB configuration — the same docker-compose pattern must not be used for production DB connectivity |
Additional information from the repo¶
Brief inventory of sensitive configuration and credential surfaces in this repository. Do not commit real values; use .env (gitignored) or your platform’s secret store. Defaults in example files and docker-compose.yml are for local development only.
Environment variables (main API)¶
Loaded via src/doc_quality/core/config.py (Settings) and .env.example.
| Name | Role |
|---|---|
SECRET_KEY |
Session signing, cookie security context, and API auth for routes using require_api_auth (X-API-Key or Authorization: Bearer … must match this value). |
DATABASE_URL |
DB connection string; embeds DB user password in the URL. |
AUTH_MVP_EMAIL / AUTH_MVP_PASSWORD |
MVP bootstrap user (dev/demo); password must meet app policy (≥12 chars in production paths). |
AUTH_MVP_ROLES / AUTH_MVP_ORG |
RBAC/org binding for the MVP user (not cryptographic secrets, but identity policy). |
ANTHROPIC_API_KEY |
Optional LLM provider key for document enrichment paths. |
PERPLEXITY_API_KEY |
Optional key for live regulatory research (research_service, MCP). |
Related non-secret toggles that affect exposure of recovery material: AUTH_RECOVERY_DEBUG_EXPOSE_TOKEN (must stay off outside development).
Environment variables (standalone orchestrator)¶
Loaded via services/orchestrator/src/doc_quality_orchestrator/config.py (OrchestratorSettings).
| Name | Role |
|---|---|
API_SECRET_KEY |
Protects orchestrator HTTP endpoints (X-API-Key / Bearer); empty disables enforcement (see orchestrator main.py). |
ANTHROPIC_API_KEY |
Provider key for crew/scaffold LLM calls in the orchestrator process. |
NEMOTRON_API_KEY |
Optional key when Nemotron endpoints are configured. |
BACKEND_BASE_URL |
Not a secret by itself, but must align with how the main API expects orchestrator callbacks; avoid leaking internal URLs in client-facing configs. |
HTTP headers (runtime secrets, not in repo)¶
| Header | Use |
|---|---|
X-API-Key |
Same value as SECRET_KEY (API) or API_SECRET_KEY (orchestrator), depending on service. |
Authorization: Bearer <token> |
Alternate form of the same shared secrets above. |
X-Request-ID / X-Correlation-ID / X-Trace-ID |
Correlation identifiers (not secrets; listed in observability docs). |
Persisted secrets (database, not environment)¶
| Surface | Notes |
|---|---|
Session cookies (dq_session) |
Opaque cookie; server stores hashed session token (UserSessionORM.session_token_hash). |
| User passwords | Stored as hashes only (AppUserORM.password_hash). |
| Password recovery | Single-use recovery flow uses hashed tokens in password_recovery_tokens (raw token only sent to user, not stored). |
IDE / local tooling¶
| Location | Secret |
|---|---|
.vscode/mcp.json |
References ${env:PERPLEXITY_API_KEY} for the Perplexity MCP server (key stays in the shell environment, not in the file). |
Frontend¶
frontend/.env.local.example defines public origins (NEXT_PUBLIC_*) only—no API keys. The browser authenticates with httpOnly cookies issued by the backend after login.
Development defaults (rotate for anything real)¶
| Source | What |
|---|---|
docker-compose.yml |
POSTGRES_PASSWORD=postgres for the local Postgres container. |
Settings / OrchestratorSettings code defaults |
Placeholder change-me-in-production strings; production startup fails if SECRET_KEY is left at the API default. |
Test fixtures¶
.env.test contains intentionally weak, committed test-only values (e.g. SECRET_KEY=test-api-key). Use only under pytest, never in deployed environments.
