Building Identity & Access Management for a Self-Hosted PaaS

Three roles weren't enough. A contractor needed database access — not the whole project, just the database. An agent needed deploy permissions but nothing else. We needed IAM. We also needed it to stay simple.

## where it started

Every platform starts with simple auth. Ours was no different — a user signs up, creates a project, and has full control. Other people can be added as members with one of three roles: owner, admin, or member. A hardcoded hierarchy decided who could do what.

Then someone needed read-only access to logs. Then an agent needed deploy permissions but nothing else. Then a contractor needed temporary access to a database — not the whole project, just the database. The three-role model had nothing to say about any of this.

We needed IAM. But we also needed it to stay simple.

## the permission language

The simplest way to express "this person can do this thing to this resource" is a triple:

operations : resource-uri

Operations are named actions — read_file, deploy_service, query_db. Resources are hierarchical paths that mirror the platform's structure:

/project/myapp/service/api
/project/myapp/file/config/database.yaml
/project/myapp/db

Globs give you range. ** matches any depth, * matches one segment:

read_file:/project/myapp/file/secrets.env — one file
*:/project/myapp/** — everything

We explicitly decided against deny rules. Allow-only, deny by default. More verbose when you want "everything except X," but it eliminates an entire class of bugs where interacting allow and deny rules produce surprising results.

## named roles and groups

Policies get names so you don't paste operation lists repeatedly:

service-deployer : deploy_service,start_service,... : /project/myapp/service/**
project-admin : * : /project/myapp/**

Users never bind directly to roles. Everything goes through groups:

group/backend-team → role/service-deployer, role/file-viewer
user/alice → group/backend-team

This felt like unnecessary indirection at first. But team management is the common case — add someone to a group and they get the right access. Revocation is atomic: remove them from the group and everything disappears.

## the personal group trick

Strict "groups only" creates a problem: what about one-off permissions? Creating a group for temporary database access is absurd.

The fix: every user gets an auto-created personal group.

user/alice → group/user-alice (auto-created on registration)

One-off permissions bind to the personal group:

group/user-alice → role/temp-db-reader

The system's rule stays consistent — users only bind to groups. But it feels like giving Alice direct access. The personal group is invisible plumbing.

This was the key insight that made the whole design work. One rule, two use cases, zero exceptions.

## delegation without complexity

How do you give someone permission to give permissions? You could invent meta-operations like grant_read_file, but that scales horribly.

Instead, grant and revoke are just standard operations:

role/file-owner : grant,revoke : /project/myapp/file/**

The system enforces scope — you can only grant access to resources you have grant on. A file owner can delegate file access but can't grant service permissions. Two universal operations replaced what could have been dozens of meta-operations.

## how a request gets authorized

When DELETE /projects/{id}/services/api hits the API:

01.
Authenticate — verify the Ed25519 signature.
02.
Map — translate to delete_service on /project/myapp/service/api.
03.
Resolve — one SQL query: user → groups → roles via JOINs.
04.
Evaluate — for each role, check operation match and resource pattern match.
05.
Decide — first match wins. No match means denied.

Platform admins bypass all of this — is_admin is an implicit *:/**.

The whole evaluation is a single indexed query plus in-memory pattern matching. Sub-millisecond on SQLite.

## what we kept in mind

→
Explicit over abstract. No role inheritance, no deny rules. If something is allowed, there's a role that says so. You can always trace a permission back to a specific role bound to a specific group.
→
Personal groups solve the 80/20 problem. Team access uses regular groups. One-off access uses personal groups. These two patterns cover nearly every case.
→
Grant and revoke as operations, not meta-operations. Delegation works with the same primitives as everything else.
→
Backwards compatibility through migration, not abstraction. We migrated the old project_members table into IAM roles and bindings, then removed it. One source of truth, no split brain.
→
Start without caching. Two SQLite queries and pattern matching is fast enough on a single VPS. We'll add caching when we have evidence it's needed.

## what's next

→
Time-bound permissions — expires_at on role bindings for temporary access without manual revocation.
→
Policy caching — short-lived in-memory cache to eliminate redundant queries during request bursts.
→
Condition-based access — IP ranges, time of day, without changing the core schema.

But for now, the system does what we need: fine-grained, delegatable, auditable access control that doesn't make you want to throw your laptop out a window. That's about all you can ask from IAM.