AI System Privacy Audit: Role-Based Access Controls and GDPR Compliance¶
System in scope: doc_quality_compliance_check (backend API, session auth, RBAC layer, PostgreSQL user/session storage).
1. System Diagram¶
RBAC-relevant architecture facts used in this risk sheet:
- The backend exposes a FastAPI application with two authentication modes: browser users via HTTP-only session cookies (email/password login) and service clients via
X-API-Key/Authorization: Bearerheader. - Role enforcement uses a
require_roles(...)dependency on every protected route. Roles defined:qm_lead,architect,riskmanager,auditor,service. - The
servicerole is restricted to explicitly machine-to-machine routes (/api/v1/skills/*,/api/v1/observability/*); it is no longer a blanket bypass. - Two routes are currently unauthenticated:
/api/v1/dashboard/*and/api/v1/templates/*. - Session data (email, roles, org,
last_seen_at) is persisted to PostgreSQL. Bootstrap/MVP credentials are configured via environment variables.
2. Data Flow Analysis¶
| Data Flow | Source | Destination | Encrypted? | Logged? | Priority |
|---|---|---|---|---|---|
| Login request (email + password) | Browser / API client | POST /api/v1/auth/login → session store (PostgreSQL) |
In-transit (HTTPS/TLS) | Auth event (success/failure, timestamp, email); login throttle state | High |
| Session cookie issued to browser | FastAPI auth route | Browser (HTTP-only Set-Cookie) |
In-transit (TLS); cookie is HTTP-only, secure flag enforced outside dev |
Session record in PostgreSQL (token hash, email, roles, org, expiry) | High |
| Authenticated request (cookie or API key) | Browser / service client | FastAPI route → require_roles dependency |
In-transit (HTTPS/TLS) | last_seen_at updated in session row; access control outcome not separately audit-logged |
High |
| Service-client auth (API key / Bearer) | Orchestrator or automation tool | FastAPI /skills/* or /observability/* endpoints |
In-transit (HTTPS/TLS) | API key check in require_api_auth; no per-call access log entry |
High |
| Unauthenticated dashboard access | Any caller (no credentials required) | GET /api/v1/dashboard/* |
In-transit (HTTPS/TLS) | No auth check; no identity logged | Medium |
| Unauthenticated template access | Any caller (no credentials required) | GET /api/v1/templates/* |
In-transit (HTTPS/TLS) | No auth check; no identity logged | Medium |
| Bootstrap user provisioning | Environment config (AUTH_MVP_* vars) |
FastAPI startup → app_users table |
Config-level (env vars / secrets manager) | Startup event; credential in env var outside runtime log | High |
| Role and org resolution (session row) | PostgreSQL session table | resolve_user_from_cookie / require_roles |
At-rest DB controls | Implicit (session row lookup); no dedicated access-decision log | Medium |
| Session revocation on logout | DELETE /api/v1/auth/logout |
PostgreSQL (is_revoked = True) |
In-transit (HTTPS/TLS) | Logout event in session row state; no separate audit-trail entry | Medium |
Corrected interpretation for RBAC and GDPR¶
- The primary GDPR boundary in the auth flow is the persistence of identity data (
user_email,user_org,user_roles,last_seen_at) in the session table — minimisation and retention obligations apply. - The
viewerrole appears in tests (test_auth_authorization_api.py) but is not in the documented implemented role set. Aviewercaller receives403on all tested routes, meaning this role effectively has zero access — a potential misconfiguration gap. - The
servicerole hardening (no blanket bypass) is a positive control. Residual risk: theobservability/*endpoint remains service-accessible and can return rich trace payloads containing personal data (see also Risk Sheet 1 — model trace over-collection). - The two unauthenticated routes expose application data without identity context, which is inconsistent with GDPR accountability and access-control principle (GDPR Art. 5(1)(f), Art. 25).
3. Sensitive Data¶
Sensitive Data: User Identity in Session Store¶
- Category: Personal data (identified natural persons) — GDPR Art. 4(1)
- Examples:
user_email(primary identifier),user_org, sessionsession_id,expires_at,last_seen_at - Why Sensitive: Directly identifies users and links their organisational role to access patterns and audit trails; retained in PostgreSQL with no visible TTL-based purge policy
- Current Protection: Server-side session with hashed token; DB access controls; session expiry and revocation support
- Risk (or Harm) if Exposed: Unauthorised disclosure of user identity and role assignments; GDPR breach; profiling of users from access patterns
Sensitive Data: Role Assignments and Permission Scope¶
- Category: Attributes linked to natural persons — implicit GDPR personal data when combined with identity
- Examples:
user_rolesarray per session row (e.g.,["qm_lead"],["auditor"]), org isolation fielduser_org - Why Sensitive: Reveals organisational responsibilities and access privileges; can be used for social engineering or targeted attacks; GDPR data minimisation applies
- Current Protection: Stored in session row; resolved per request by
require_roles; role set validated at route boundary - Risk (or Harm) if Exposed: Privilege mapping; lateral movement by attacker with partial DB access; GDPR accountability gap if role-to-action mapping is not audited
Sensitive Data: Bootstrap/MVP Credentials in Environment Configuration¶
- Category: Authentication secrets and provisioning data
- Examples:
AUTH_MVP_EMAIL,AUTH_MVP_PASSWORD,AUTH_MVP_ROLES,AUTH_MVP_ORG(env vars);SECRET_KEY(API key secret) - Why Sensitive: Compromise of bootstrap credentials gives attacker a fully provisioned account with configurable roles;
SECRET_KEYgrants service-client access toskills/*andobservability/* - Current Protection: Environment-variable configuration (not in code); excluded from source code
- Risk (or Harm) if Exposed: Full account takeover; unrestricted access to trace data; GDPR breach; credential reuse risk if same password is used across environments
Sensitive Data: Access Decision and Audit Context Not Separately Persisted¶
- Category: Audit/traceability gap — GDPR Art. 5(2) accountability
- Examples: Which role accessed which route at what time; 403 access denials; service-client route usage with payload summary
- Why Sensitive: Absence of access-decision audit log prevents retrospective investigation of data-access incidents; required for GDPR Art. 30 Record of Processing Activities and breach response
- Current Protection:
last_seen_atupdated on session lookup (coarse-grained); login throttle state tracked per email/IP - Risk (or Harm) if Exposed: Inability to detect or evidence unauthorised access; weakened GDPR breach-response capability
4. Privacy Risks¶
Risk 1: Unauthenticated routes expose application data without identity context¶
- Priority: High
- Risk Category: Access control — missing authentication
- GDPR Reference: Art. 5(1)(f) — integrity and confidentiality; Art. 25 — data protection by design
- Potential Harm/Impact: Any network-accessible caller can read dashboard data and templates without establishing identity; GDPR accountability principle violated (no record of who accessed what); potential leak of structural compliance artefacts or meta-information
- Ability to Implement Control: High
- Recommended controls:
- Add
require_authenticated_userdependency to all/api/v1/dashboard/*and/api/v1/templates/*routes. - If anonymous read-only access is intentional for templates, scope it to non-personal, non-compliance-sensitive data only and document the deliberate design choice in the privacy notice.
- Add access logging (caller IP or session ID + timestamp) for these routes until auth is enforced.
Risk 2: Session table retains personal data (email, org, last_seen_at) without visible purge policy¶
- Priority: High
- Risk Category: Data retention and minimisation — GDPR Art. 5(1)(e) and Art. 25
- GDPR Reference: Art. 5(1)(e) — storage limitation; Art. 13/14 — data subject transparency on retention
- Potential Harm/Impact: Expired or revoked sessions (
is_revoked = True) remain inapp_user_sessionswith full personal data indefinitely; contradicts GDPR storage limitation principle; complicates data-subject erasure requests (Art. 17) - Ability to Implement Control: High
- Recommended controls:
- Implement a scheduled job (e.g., nightly) to hard-delete session rows where
is_revoked = True OR expires_at < NOW() - <grace_period>. - Define and document maximum retention period for session records (e.g., 30 days post-expiry).
- On GDPR erasure request, immediately revoke and delete all session rows for the subject's email.
Risk 3: No dedicated access-decision audit log (GDPR accountability gap)¶
- Priority: High
- Risk Category: Audit trail and accountability — GDPR Art. 5(2), Art. 30
- GDPR Reference: Art. 5(2) — accountability; Art. 32 — security of processing; Art. 33 — breach notification readiness
- Potential Harm/Impact:
last_seen_aton session lookup provides coarse activity signal but does not record which route was accessed, what role was used, or whether a 403 was returned; in a breach scenario, no evidence trail exists for forensic investigation or Data Protection Authority reporting - Ability to Implement Control: High
- Recommended controls:
- Add a FastAPI middleware or route-level dependency that writes structured access-decision log entries:
session_id,email,roles,method,path,status_code,timestamp. - Include 403 denial events in the log (route attempted, role required vs role held).
- Protect this log with the same retention and access controls as the main audit-trail table.
- Cross-reference: access-decision log entries should link to
audit_eventscorrelation_idwhere available.
Risk 4: viewer role defined in test code but absent from authorised role set creates misconfiguration risk¶
- Priority: Medium
- Risk Category: RBAC gap and role lifecycle governance
- GDPR Reference: Art. 25 — data protection by design; Art. 32 — appropriate technical measures
- Potential Harm/Impact: The
viewerrole returns403on all tested routes but is not defined in the canonical role registry. If a user is assignedviewer(e.g., through bootstrap config or future user management), they receive zero access — which may be the intended behaviour but is undocumented. Alternatively, if a future route is added andvieweris not excluded, it could gain unintended access - Ability to Implement Control: High
- Recommended controls:
- Define an explicit canonical role registry (enum or constants file) listing all valid roles with their intended permission scope and whether they are human or machine roles.
- Add a startup assertion or test that no undefined role names can be provisioned via bootstrap config.
- Document whether
vieweris a planned future role (read-only access) or should be removed entirely.
Risk 5: Service-client (X-API-Key) can access observability endpoints containing personal data in traces¶
- Priority: Medium
- Risk Category: Least-privilege and machine-identity access to personal data
- GDPR Reference: Art. 25 — data protection by design (least privilege); Art. 32 — appropriate technical measures
- Potential Harm/Impact: The
observability/*endpoint is explicitly markedallow_service=True, meaning automated orchestrator or any bearer-token holder can retrieve rich trace payloads that may contain personal data from documents and prompts (see Risk Sheet 1, Risk 2). Service clients have no session identity or org context, making attribution difficult - Ability to Implement Control: Medium
- Recommended controls:
- Apply pre-response redaction to observability payloads served to service clients: strip
prompt_text,output_text, and any fields flagged as personal-data-bearing before returning. - Log service-client access to observability endpoints with
X-API-Keyidentifier hash and query parameters in the access-decision log. - Consider restricting service-client access to operational metadata only (latency, status, counts), not the rich trace payload.
Risk 6: Bootstrap/MVP credential pattern must not persist to production¶
- Priority: Medium
- Risk Category: Credential management and production hardening
- GDPR Reference: Art. 32 — appropriate technical measures for security of processing
- Potential Harm/Impact: If
AUTH_MVP_EMAIL/AUTH_MVP_PASSWORDenv vars are reused or not rotated between environments, a staging credential compromise grants production access; bootstrap accounts may accumulate active sessions that are never revoked; default password (CHANGE_ME_BEFORE_USEvisible in test code) presents brute-force risk if left in place - Ability to Implement Control: High
- Recommended controls:
- Enforce a startup check: if
AUTH_MVP_ENABLED=trueand environment isproduction, reject startup unless password has been explicitly changed from default. - Document the bootstrap account as a break-glass account: rotate immediately after initial provisioning, restrict to a dedicated
bootstraprole with minimal scope, and disable after handover. - Use a secrets manager (e.g., Vault, AWS Secrets Manager) for all production credential injection rather than plain environment variables.
5. RBAC Consistency with Risk Sheet 1 (Model Providers)¶
The following cross-cutting controls are shared with the model-provider risk set and must remain consistent:
| Control Area | Risk Sheet 1 Finding | Risk Sheet 2 Alignment Required |
|---|---|---|
| Observability endpoint access | Over-retained trace data with personal data | Service-client access to observability must apply same redaction policy (sheet 1, Risk 2) |
| Audit trail completeness | Model/version metadata per run | Access-decision log must link to same correlation_id as audit events |
| Least-privilege principle | Prompt/output access restricted by role | Role enforcement on observability and skills endpoints must not be wider than role enforcement on document/compliance routes |
| HITL approval for high-impact outputs | HITL role required for approval actions | HITL-approver role must map to a defined RBAC role (qm_lead or auditor) with explicit route permission — no anonymous or service-client approval permitted |
Additional information from the repo¶
What exists¶
- RBAC:
require_roles(...)gates API routes (e.g. documents, skills, research). - Cookies: Session cookie is
HttpOnly,SameSite=lax,Secureenforced outside development (session_auth.py,config.py). AuthenticatedUserincludesorgfrom session (user_sessions.user_org).audit_events,audit_schedules,LogEventRequest: optionaltenant_id/org_idfields for labeling.
document and HITL scope¶
SkillDocumentORMhas noorg_idortenant_idcolumn.search_documents(src/doc_quality/services/skills_service.py) queries allskill_documentsrows (filters: type, optional text search againstextracted_text/ filename), capped bylimit.GET /api/v1/documentsusessearch_documentswith empty query → returns up to 100 documents for any authenticated user with an allowed role — no filter byuser.org.- HITL (
hitl_workflow.py): no org/tenant parameters in queries.
Conclusion: The codebase matches a single-tenant / shared-database MVP: isolation is role-based, not organization-row-level. Multi-customer SaaS would need schema + query changes and consistent propagation of org from JWT/session into every read/write path.
