About SCAMPER

SCAMPER aims to overcome the limitations and vulnerabilities of federated identity management and self-sovereign identity (SSI) by building upon a privacy-preserving multifactor distributed authentication solution, while guaranteeing adherence to EU legislation. The solution fully aligns with the GDPR and eIDAS regulations and employs distributed storage mechanisms to provide a reliable user authentication service. The main aim in this regard is to provide good usability, revocability, and accuracy and guarantee the highest privacy preservation standards.


1. Introduction and Goals

This document describes the software architecture for the Secure Data Vault (SDV) component of the SCAMPER project (Work Package 4).

The SDV serves as the decentralised personal cloud vault that is controlled by the person’s edge wallet. Unlike traditional cloud storage solutions, the SDV is designed according to the principles of self-sovereign identity (SSI) giving the user full control and ownership over their data. The SDV is designed to minimise risk in case of an untrusted provider of the SDV, prioritising data portability and cryptographic access control.

This architecture is inspired by the SOLID project and includes additional measures to ensure privacy and security for storing Personal Data. The outcome is the design of a Secure Digital Data Vault that is compatible with SSI systems, offering a decentralized storage solution where users maintain full control over their digital identities and sensitive information. The design will consider utility and adoptability while adhering to strict requirements regarding privacy and security. This outcome will not only support privacy-preserving digital identity management but will also pave the way for advancements in secure, user-owned data storage solutions.

In summary, we aim to achieve this objective through a Secure Data Vault that aligns with the latest privacy standards, empowers users to control their sensitive information autonomously, and supports compatibility with SSI.

1.1. Requirements Overview

The architecture is driven by the specific constraints of the SCAMPER Work Package 4 (WP4).

Table 1. WP4 Functional Requirements
ID Description

REQ-WP4-1

Client-Side Encryption
Final data encryption and decryption operations must occur at the Edge (User Wallet or permissioned Agent). The Vault must never possess the keys to decrypt stored data (Confidentiality).

REQ-WP4-2

Capability-Based Access Control
Access to the Vault must be governed completely by the user. The architecture must be able to cryptographically verify every read/write request, replacing traditional Access Control Lists (ACLs) managed by the server admin.

REQ-WP4-3

Separation of Identity and Storage
The Vault architecture must verify authorization based on cryptographic signatures (did:key), not database accounts. The user’s identity must remain decoupled from the storage infrastructure to allow for future portability. To ensure this all identifiable data should be encrypted.

REQ-WP4-4

Multi-Party Encrypted Sharing
The architecture must support mechanisms to allow authorized third parties to read or write specific files without sharing the user’s encryption keys.

REQ-WP4-5

Cryptographic User Control
The Vault must only be controlled by the user. Both access control and sensitive data must be based in cryptography to ensure access control and vault migration are totally enforced by the user alone.

REQ-WP4-6

User Unlinkability
The user’s privacy must be protected not only by encrypting data but also by obscuring the user’s identity. Third parties should not be able to infer information or identity of a user from the usage of a single decentralised identity (did:key).

1.2. Quality Goals

The Quality Goals for the Secure Data Vault focus on the resilience, security and privacy of the storage mechanism itself.

Table 2. Quality Goals
Goal Motivation

Confidentiality

The storage provider is treated as an untrusted party and a potential adversary to minimize the risk in this scenario. A malicious administrator at the SDV provider with database permissions can read or write encrypted, but should not be able to gain access to any private data of the user. They should only be able to disrupt services but not break the confidentiality.

Integrity

Users should not rely on trust in their SDV provider that data in the SDV has not been tampered with. Instead, only the user themselves can grant cryptographic access to data, as long as there private key has not been compromised, they can trust that only third parties whom they gave access can read/write data.

Performance

The Vault is used during real-time verifications. Availability and latency are paramount in making the SDV commercially viable.

Portability

The architecture must avoid vendor lock-in for the storage layer. A user needs to have full control over where their data is stored and what data is stored. It should be easy to migrate from SDV provider.

1.3. Stakeholders

Role/Name Example Expectations

Data Owner

End User

Expects their data to be accessible only to them and third parties they granted access. Expects easy data migration and access revocation. Expects easy use of edge wallet implementation that aligns with the current state of the art.

Service Provider

Bank, Employer, Hospital, …​

Expects a standard JSON-RPC API to fetch and manipulate data. Expects high availability: if the User grants them access, the Vault should serve the file even if the User is offline.

Storage Host

SCAMPER Registry

Expects to host data with no liability. They want an architecture that mathematically proves they cannot see the data, protecting them from GDPR data processing obligations. Expects easy setup and deployment.

DID Registry

Public Blockchain

Expects data owners and service providers to follow best practices in using the current existing did registries.

VC Issuer

Government / Employer

Expects to be unlinkable to VC’s issued to data owners when service providers request to validate those VC’s.

2. Architecture Constraints

The Secure Data Vault and Edge Wallet architecture is bound by strict requirements regarding biometric security, legal compliance (GDPR), and technical interoperability. These constraints limit the design choices but ensure the final system is secure, compliant, and usable in industrial contexts.

2.1. Technical Constraints

These constraints concern the hardware, software, and cryptographic primitives we must use.

Table 3. Technical Constraints
Constraint Explanation

TC-1: Biometric Key Derivation

The system is built upon key derivation from biometrics. The SDV design assumes that the user can create a cryptographic master key from their biometrics and can also recover that key in case of Edge Wallet loss. The Edge Wallet will thus be constrained to devices that support the specific biometric input used for key derivation.

TC-2: Mobile-First Edge Computing

All sensitive cryptographic operations must occur on the Edge Wallet (likely a Smartphone). The architecture cannot rely on server-side processing for cleartext data. Target devices must be able to process cryptographic operations efficiently and securely. To manage sensitive cryptographic keys, Edge Wallets need to have a secure hardware module (e.g., Secure Enclave on iOS, TrustZone on Android).

TC-3: W3C Standardization

The architecture must adhere to W3C standards as much as possible to ensure interoperability. This will drive architectural decisions, prioritising open source libraries and protocols that are widely adopted in the SSI ecosystem. The core standards include:
* Identity: DID Core (specifically did:key and did:peer).
* Credentials: Verifiable Credentials Data Model v1.1.
* Authorization: UCAN 1.0 (User Controlled Authorization Networks).

TC-4: Post-Quantum Readiness

Given the long lifespan of identity data, the cryptographic layer must be designed using Post-Quantum Cryptography (PQC) algorithms (e.g., Dilithium, Kyber) without compromising compatibility with existing systems.

2.2. Organizational Constraints

These constraints involve the project team, stakeholders, and development process.

Table 4. Organizational Constraints
Constraint Explanation

OC-1: Open Source License

The core components (Wallet SDK, Vault Connector, UCAN Validator, …​) must be released under a permissive Open Source license (e.g., Apache 2.0 or MIT) to follow the principles of self-sovereign identity and encourage community adoption.

OC-2: No Vendor Lock-in

The architecture must not depend on proprietary cloud features (e.g., AWS KMS, Azure AD). The "Secure Data Vault" must be deployable on any generic object store.

OC-3: Auditability

All access to the Data Vault must leave a tamper-evident audit trail. However, this audit log itself must preserve privacy.

3. Context and Scope

The Secure Data Vault (SDV) acts as the user’s sovereign agent in the cloud. It is a dual-zone storage system:

  1. Public Zone (Cleartext): Stores non-sensitive, unlinkable technical data to facilitate discovery and decentralized connection establishment.

  2. Private Zone (Encrypted): Stores sensitive Personal Data, Verifiable Credentials (VCs), or other user data, encrypted entirely at the edge.

The SDV operates within a hostile environment (the SDV provider itself is assumed to be untrusted for the sake of designing a secure system) but must serve data to trusted parties via standard protocols.

3.1. Business Context

The Business Context describes the information flow between the SDV and its stakeholders. The core principle is User-Centric Control with Privacy by Design.

context-diagram
Table 5. Business Communication Partners
Communication Partner Inputs (Sent to SDV) Outputs (Received from SDV)

Data Owner + (User Wallet)

Encrypted Blobs (VCs, Private Notes) + Public Data (DID Docs, Public Keys) + UCAN Tokens (Delegation of rights)

Synchronization (Fetching latest state) + Audit Logs (Who accessed my vault?) + Outstanding permissions (Signed UCANs)

Service Provider + (Data User)

Access Token (UCAN for authorization) + Read Request (Specific resource path)

Requested Data (Encrypted file or specific VP) + Note: The Verifier must possess the decryption key (shared via did:peer channel) to read Private Zone data.

Service Provider + (eg. VC issuer)

Personal Data of Data Owner (as encrypted blobs) + Verifiable Credentials of Owner (as encrypted blobs)

-

Data Provider + (Discovery / Resolver)

Discovery Request (Lookup via HTTP GET)

Public Technical Data (DID Document, Service Endpoint) + Constraint: This data must be unlinkable to the physical person (Req-WP4-6).

Storage Host +(Infrastructure Provider)

Hosting Resources for Secure Data Vault

Usage Metrics (Storage quota, Bandwidth) + Liability: The Host cannot see Private Data content.

3.2. Technical Context

The SDV is exposed as a web-accessible storage service (similar to a Solid Pod or S3 Bucket) but secured via UCAN (User Controlled Authorization Networks) instead of traditional OAuth/Accounts.

It differentiates technical interfaces based on the sensitivity of the data zone.

Table 6. Technical Interfaces
Interface Channel / Protocol Description

Public Read API

HTTPS (GET)
No Auth Headers

Serves data from the /.well-known/ or /public/ namespace. Used for statistical aggregations, public lookups, or other public data (e.g. number of employees, public keys).

Protected Data API

JSON-RPC 2.0 over HTTPS
No Authorization header — the UCAN invocation token IS the request body

Accesses the User’s private storage via signed UCAN 1.0 invocation tokens.
* Read: Invocation token with cmd: "/doc/read".
* Write / Create: Invocation token with cmd: "/doc/create" or cmd: "/doc/update".
* Encryption: The args.payload field carries the encrypted blob (application/octet-stream or JWE).
* Replay Protection: Each invocation carries a unique nonce, verified by the SDV to prevent replays.

Replication / Backup

HTTPS

Interface for migrating the entire vault to a new provider.
* Relies on the Portability quality goal.
* The structure is standardized (e.g. Tar file) to allow easy migration.

Table 7. Mapping Input/Output to Technical Channels
Interaction Technical Implementation

Publishing Identity

User → SDV: JSON-RPC invocation cmd: "/doc/create"
args.endpoint: "/public/did.json"
Invocation references Root UCAN delegation in prf.
Content: Plaintext JSON-LD (Non-identifiable).

Backing up Credentials

User → SDV: JSON-RPC invocation cmd: "/doc/create"
args.endpoint: "/private/credentials/vc-123.jwe"
args.payload.blob: XChaCha20-Poly1305 Encrypted Blob.
Invocation references Root UCAN delegation in prf.

Sharing with Bank

Bank → SDV: JSON-RPC invocation cmd: "/doc/read"
args.endpoint: "/private/credentials/vc-123.jwe"
Invocation prf contains the delegation chain User → Bank.
SDV checks the UCAN proof chain, validates cmd against delegation’s cmd pattern, and checks args.endpoint against delegation’s pol.

3.3. Deployment Context

TODO: in development…​

4. Solution Strategy

The solution strategy for the Secure Data Vault (SDV) rests on three fundamental pillars: Edge-Side Encryption, Capability-Based Authorization, and Dual-Zone Storage.

These strategies ensure that the Vault remains a passive storage utility while the intelligence and security controls reside entirely with the user.

4.1. Strategic Decisions

The following table maps the project’s key goals to the specific architectural strategies chosen to achieve them.

Table 8. Key Strategic Decisions
Goal Strategy Details

Confidentiality

Client-Side Encryption

The SDV stores data as opaque binary blobs. PQC Encryption is used with keys derived from the user’s biometric wallet (did:key). Even if the database is leaked, the data remains inaccessible as long as the underlying PQC algorithm stays secure. The server never touches the decryption keys.

User Control

UCAN Authorization

We replace traditional Access Control Lists (ACLs) with User Controlled Authorization Networks (UCAN). Access rights are embedded in cryptographically verified tokens (JWTs) issued by the user, not permissions tables stored in the server database. The user can grant access to a third party (Verifier) without the SDV’s intervention.

Discoverability vs. Privacy

Dual-Zone Storage

We separate data storage based on sensitivity but allow Public Data to index Private Data. The Public Zone stores unencrypted data that cannot be used to identify the user. The Private Zone stores encrypted Credentials and Files. Public data can be used to index private data by linking these nodes without revealing the identity of the user to the vault.

Portability

Decoupled Identity & Storage

The system uses DIDs (Decentralized Identifiers) as the primary user key, rather than an account/password managed by the provider. The user interacts via did:key (a cryptographic key pair) or a did:peer. So if the user switches providers, they simply upload their data to a new SDV and update their DID Document. No "account deletion" or "password reset" logic is needed.

Unlinkability

Derived Identity

The system derives new DIDs (Decentralized Identifiers) to share with third parties, rather than sharing their primary key. The user’s activity cannot be tracked across different providers/vaults.

4.2. Technology Decisions

The following technology stack was selected to implement these strategies, prioritizing open standards and browser/mobile compatibility.

Table 9. Technology Stack Overview
Component Technology Rationale

Identity

did:peer (Pairwise)

Used for private relationships between Users and Verifiers. It leaves no public footprint on any blockchain, ensuring maximum privacy (unlinkability). A peer DID essentially is a custom keypair used by both the User and the Verifier for a private relation, and public keys / DID documents are shared through third party channels (e.g., QR codes on first relation establishment).

Authorization

UCAN (User Controlled Authorization Networks)

Unlike OAuth2, UCANs support offline delegation. A user can generate a token for a Verifier (e.g., a Bank) directly from their phone without needing the Vault to be online at that exact moment. This makes migration of data easier as access control configuration does not need to be migrated.

Encryption

[Decision Pending]

The specific cryptographic primitive is currently under evaluation. Most likely, we will combine a post quantum cryptographic protocol with a classical protocol to ensure PQ readiness and mitigate against the risk of novel algorithm compromise. The most common modal for this is X25519 and ML-KEM (Kyber 768) combined into a hybrid KEM encryption: X25519MLKEM768. We intend to follow such industry standards.

Storage Backend

Graph Database (RDF / Triple Store)

Aligns with Solid specifications and enables rich queries. Instead of a flat file system, data is stored as Linked Data. This allows users to query metadata without the server needing to understand the encrypted content itself. Metadata is stored in the graph (Public Zone), payload is the encrypted blob (Private Zone).

API Surface

JSON-RPC 2.0 over HTTPS

UCAN 1.0 invocation tokens are self-contained, signed RPC calls where cmd is the method and args holds the parameters. JSON-RPC aligns the transport with this invocation model and avoids mapping UCAN capabilities to HTTP verb semantics. All vault operations are expressed as named commands rather than HTTP methods against resource paths.

4.3. Top-Level Decomposition Pattern

We follow the "Smart Client, Dumb Server" architectural pattern. The SDV Acts as a high-availability "Digital Locker." It enforces who can open a locker (via UCAN check) but has no visibility into what is inside. It has mechanisms to facilitate finding data. The Client (Wallet) handles all business logic, data validation, encryption, and key management. Additionally, some of the more heavy compute could be offloaded to the SDV by using TEE on the server. Through attestation of the SVD TEE, a user wallet could delegate some heavy computation to the server. We will only rely on this for rare cases as this does put more trust in the SDV provider.

decomposition-diagram

5. Building Block View

The Building Block View shows the static decomposition of the system. It explains the code structure and the high-level functional modules.

5.1. Level 1: Overall System

At the highest level, the Secure Data Vault (SDV) is a standalone server application that exposes HTTP interfaces to the outside world. It interacts with three primary external actors: the Data Owner (Wallet), the Verifier, and Public Entities.

level1-diagram
Motivation

The system is decomposed into a single deployable unit (the SDV) to ensure portability. The internal complexity is hidden behind a standard JSON-RPC API. The storage layer is abstracted so the SDV can run on local disks, cloud object stores, or decentralized networks.

Contained Building Blocks
Name Responsibility

Secure Data Vault (SDV)

The core system. Handles authentication, request routing, validation of UCANs, and data persistence.

Edge Wallet

The primary client. Handles all encryption, key management, and UCAN issuance.

Physical Storage

The underlying infrastructure where bytes are saved (e.g., AWS S3, local filesystem, or a hosted Graph Database).

Important Interfaces
  • Public API: Open access for did:web Documents and Biometric Helper Data (Read-Only for public).

  • Protected API: Authenticated access (via UCAN) for encrypted User Data.

5.2. Level 2: Internal Structure of the SDV

Here we zoom into the Secure Data Vault to see its internal software architecture.

The SDV follows a Layered Architecture:
1. Interface Layer: Handles HTTP and Routing.
2. Security Layer: Validates Authorization (UCAN).
3. Domain Layer: Logic for Linked Data and Blob management.
4. Persistence Layer: Translates domain objects to database calls.

level2-diagram

5.2.1. API Gateway (Interface Layer)

The entry point for all incoming traffic. It handles JSON-RPC dispatch: routing JSON-RPC 2.0 method calls to the correct domain handler based on the cmd field of the UCAN invocation token in the request body. It distinguishes between Public (unauthenticated HTTPS GET) and Private (JSON-RPC invocation) requests. While not implementing application-specific security controls, the API gateway presents the first line of defense against common attacks such as DoS, injection, and replay attacks. It also handles HTTP/2 and TLS termination.

5.2.2. UCAN Validator (Security Layer)

The security and permission engine heart of the vault. Since the SDV is "blind" to the data, it relies entirely on Delegated Authorization. The UCAN invocation token arrives as the JSON-RPC request body — there is no separate Authorization header. The validator verifies the EdDSA signature of the invocation token and every delegation token in its prf chain. Per request it validates:
- The time bounds (exp, nbf): Is the invocation still valid?
- The nonce: Has this exact invocation already been processed (replay protection)?
- The revocation list: Was any token in the delegation chain revoked by the Data Owner?
- The cmd: Does the invocation’s command match the pattern permitted by the delegation’s cmd field?
- The pol constraints: Do the invocation’s args (e.g. endpoint) satisfy all policy restrictions in the delegation chain?
- The sub: Does the invocation’s subject (sub) match the vault namespace being accessed?

5.2.3. Private User Data Store (Domain Layer)

This component manages the structured data and metadata of the vault, for the privately and encrypted data of the vault. This is the primary domain layer component that manages the private data of Data Owners and the potential relation between data elements. Where the data is not fully encrypted or public, the data store uses known data vocabularies and ontologies from the Solid or broader ecosystem. However, in many cases the data should be kept private from the server and thus becomes an encrypted data blob. To manage large chunks of encrypted data (e.g., personal videos) the Private User Data Store relies on the Blob Manager for scalable and performant blob data retrieval and streaming.

To make the most of the underlying linked data store for encrypted data, this component will rely on the concept of blind indexing to remain performant in retrieval but yet without any insight into the data it is storing.

5.2.4. Public Data Store (Domain Layer)

The Public Data Store handles data that can publicly be queried by participants in the ecosystem. This can include non-sensitive information about Data Owners or data elements they wish to share with the public, or it can be non-sensitive relations between data elements without revealing any of the underlying data. An example of this is the Biometric Helper Data for used for fuzzy matching that on it’s own does not reveal any identity information of the Data Owner. Since the amount of public data is expected to be relatively small, we do not foresee a need for scalable blob storage for this part of the architecture.

5.2.5. Blob Manager (Domain Layer)

Handles the storage of raw, large, encrypted binary files. Unlike the Graph Engine (which deals with metadata or data with encrypted 'fields'), the Blob Manager deals with the actual content (Verifiable Credentials, PDFs, Images). It treats every file as an opaque stream of bytes.

Whether the data is stored as blobs or as relational graph data is a choice by users of the system. To illustrate this we can take a Verifiable Credential that proves a Data Owner’s name, address, and age. This Verifiable Credential contains clear private data items that should always be stored encrypted, however it also contains metadata that could be stored unencrypted. Users of the system could decide to store the whole VC as an encrypted data blob, where the existence of individual fields or even metadata about the issuer is obfuscated. However, a user of the system could also decide to store only the final values of name, address, and age as encrypted data and to not obfuscate the structure and issuer metadata of the VC. Which choice is desirable depends entirely of the the use case and the type of data. As much data as possible should be stored encrypted, but for some use cases, having the metadata or data structure unencrypted can support more advanced use cases such as faster querying and data lookup, or statistic calculations.

5.2.6. Storage Adapters (Persistence Layer)

This is an abstraction layer that decouples the application from the physical storage technologies or providers. It manages either Blob data storage (File systems, Cloud storage buckets, Data lakes, …​) or Graph data storage (Apache Jena, Neo4j, Blazegraph, Amazon Neptune, …​) with specific adapters. The goal of this layer is to be technology-agnostic such that the SDV can run on different physical stacks or data storage technologies. New adapters can be added for different technologies to achieve maximal portability.

The Blob Storage Adapter is relatively simplistic in that it should be able to store arbitrary data streams with a unique unlinkable identifier. The Graph Storage Adapter will abstract graph data technologies to be able to store RDF data and run SPARQL-like queries.

5.3. Level 3: Details of Key Blocks

5.3.1. UCAN Validator

The UCAN Validator is complex enough to warrant a closer look. It does not just check a password, it validates a Chain of Trust.

level3-ucan

Responsibility:
1. Signature Checker: Ensures the invocation token and each delegation in the prf chain was signed by the key it claims.
2. Root DID Resolver: Fetches the public key of the issuer from the iss DID (no external registry needed for did:key / did:peer) to verify the EdDSA signature.
3. Command Matcher: Verifies the invocation’s cmd (e.g. /doc/read) is permitted by the delegation’s cmd pattern (e.g. /doc/ or /) and that args satisfy the delegation’s pol constraints (e.g. endpoint allowlist).

6. Runtime View

The runtime view describes the concrete behavior of the system. The SCAMPER Secure Data Vault relies on a "Smart Client" approach, meaning most complexity (Encryption, Key Management, Authorization) happens on the User’s device before data ever reaches the server.

The following scenarios cover the lifecycle of a user’s data vault: from creation to sharing and revocation. The architecture describes 5 concrete runtime scenarios in step-by-step sequence diagrams:

  • User signup & Pod creation: Happens the first time a new user interacts with the system and sets up their data vault linked to their personal Wallet.

  • Storing personal data: A user stores private data on the SDV that can later be shared to third parties or retrieved by themselves.

  • Sharing data: A user shares data with a service provider by issuing UCAN tokens.

  • Retrieving data: A service provider reads data in the user’s vault for items they have gotten permission to.

  • Writing data: A service provider write or creates data in the user’s vault.

  • Revoking access: A user revokes access to the service provider with whom they previously shared data.

  • Synchronization: The user’s Wallet synchronizes with the Secure Data Vault for caching information and data of the user at the edge.

6.1. Scenario 1: User Initialization & Pod Claiming

This flow describes "Cryptographic Signup." Unlike traditional web apps, there is no central admin to approve an account. The user asserts their existence by generating keys and claiming a namespace/pod.

sequence-signup

The SDV does not check a database table for users. It mathematically validates that the UCAN signature matches the DID for which the pod is created. If successful the SDV generates the basic necessities bootstrap a user pod, such as storage identifiers, or table entries in the graph database.

The Root UCAN is self-signed. It is the genesis delegation token that proves the user controls the private keys associated with the pod. The root delegation will look like this:

{
  "iss": "did:peer:zROOT_VAULT_KEY...",
  "aud": "did:peer:zROOT_VAULT_KEY...",
  "sub": "did:peer:zROOT_VAULT_KEY...",
  "cmd": "/*",
  "exp": 9999999999,
  "nonce": "a1b2c3d4",
  "pol": [],
  "prf": []
}

For every actual request to the SDV the User Wallet creates a per-request invocation token. The invocation token is the JSON-RPC call body and references the root delegation in its prf array:

{
  "iss": "did:peer:zROOT_VAULT_KEY...",
  "sub": "did:peer:zROOT_VAULT_KEY...",
  "exp": 9999999999,
  "nonce": "a1b2c3d4e5f6g7h8",
  "cmd": "/vault/init",
  "args": {
    "peer_did": "did:peer:zROOT_VAULT_KEY..."
  },
  "prf": [
    "eyJhbGciOiJFZERTQS...<Root Delegation UCAN above>"
  ]
}

The primary security concern for the SDV is a DoS attach in which many new, fake pods are requested for non real users. While not fully avoiding this attack vector, it is possible to mitigate the impact with classical IP-based rate limiting and routine cleanup of empty vaults.

6.2. Scenario 2: Storing Personal Data (Envelope Encryption)

This scenario details how the Vault stores data it cannot read. We use Envelope Encryption (Hybrid Encryption): data is encrypted with a symmetric key (DEK), and that key is encrypted for the user with the public key related to their DID.

sequence-store-data

A symmetric key (e.g., ChaCha20) should be used for the actual file content. Encrypted data is wrapped into a meta structure that contains info about the encryption and about the wrapping of the decryption key. This meta structure looks like this:

{
  "dataEncryption": [
    {
      "did": "did:peer:zkey1",
      "dek": "ENCRYPTED_DEK_BYTES",
    }
  ],
  "ciphertext": "BASE64_ENCODED_BYTES"
}

The invocation token sent as the JSON-RPC body places this encrypted structure in args.payload:

{
  "iss": "did:peer:zROOT_VAULT_KEY...",
  "sub": "did:peer:zROOT_VAULT_KEY...",
  "exp": 9999999999,
  "nonce": "a1b2c3d4e5f6g7h8",
  "cmd": "/doc/create",
  "args": {
    "endpoint": "/blog/posts",
    "headers": {
      "content-type": "application/octet-stream"
    },
    "payload": {
      "blob": "0xabcd..."
    }
  },
  "prf": [
    "eyJhbGciOiJFZERTQS...<Root Delegation UCAN>"
  ]
}

6.3. Scenario 3: Sharing Data with a Service Provider

To share data without revealing the user’s master key, the User Wallet performs a "Key Rewrapping" operation. It adds a new header entry encrypted for the Service Provider with whom they wish to share data. It is triggered if the user agrees to share or data with the service provider

To improve privacy, each data sharing relation between a Data Owner and a Service Provider should work with Peer DIDs. When a Data Owner and a Service Provider first initialize their 'relation', they generate new Peer DIDs derived from their Root DID. Each party generates one new DID and shares the public DID documents with the other party via other channels then the SDV (For examples through QR codes on the Service Provider onboarding website).

To avoid a range of attacks, it is important that the sharing of Peer DID documents happens through secure out-of-band channels. For example it is important to use a secure TLS communication channel to avoid well known attacks when sharing public keys (such as man-in-the-middle attacks).

For recovery purposes, we recommend a deterministic scheme to derive Peer DIDs from a Root DID, although this cannot be enforced by the SDV. A user needs to share it’s peer DIDs with the SDV such that the SDV can build aliases of a vault. Put simply, without this step, the user would leak it’s actual Root DID through the URI used for the vault. The SDV needs to be able to route JSON-RPC invocations for both the Root DID and the Peer DID as vault subjects, resolving through the Alias Registry.

sequence-share-data

The large data blob is not touched. Only the small key (DEK) is re-encrypted. The /doc/share call atomically registers the SP’s Peer DID as an alias in the vault’s Alias Registry and appends the new wrapped DEK entry to the document header. The file now has two valid headers: both the User and the Service Provider can open the envelope. Note that from the Service Provider’s point of view, they directly interact with the newly generated user’s Peer DID. This ensures they cannot link the DID they see with DIDs that another Service Provider sees when interacting with the same user. If we reflect on how this could work in practice, a Service Provider would generate a QR code on their onboarding page that contains their Peer DID and the access they request. The user wallet performs a lot of actions in the background, but in the end only one more interaction needs to happen with the Service Provider. At the end of the flow, the User Wallet calls the Service Provider back with a UCAN token that contains all the info for them to access the data on the SDV.

6.4. Scenario 4: Service Provider Retrieving Data

The Service Provider uses the UCAN and their own Private Key to read the data.

sequence-sp-read

The SDV validates the UCAN chain to ensure the Service Provider is authorized. The SDV also selects the wrapped key that matches the Peer DID of the requesting Service Provider in order not to expose additional encrypted data. Decryption happens entirely on the Service Provider’s side.

The User Wallet issues a delegation token to the Service Provider. The pol field restricts which endpoints the SP may access:

{
  "iss": "did:peer:zAliceVerifier1Key...",
  "aud": "did:key:zVerifier1Key...",
  "sub": "did:peer:zAliceVerifier1Key...",
  "cmd": "/doc/read",
  "exp": 1735689600,
  "nonce": "ffffffffffffffff",
  "pol": [[ "any", ".endpoint",
    ["/shared-resource-1", "/shared-resource-2"]
  ]],
  "prf": []
}

The Service Provider then creates a per-request invocation token (the JSON-RPC call body) referencing this delegation in prf:

{
  "iss": "did:key:zVerifier1Key...",
  "sub": "did:peer:zAliceVerifier1Key...",
  "exp": 1735689900,
  "nonce": "b1b2c3d4e5f6g7h8",
  "cmd": "/doc/read",
  "args": {
    "endpoint": "/shared-resource-1"
  },
  "prf": [
    "eyJhbGciOiJFZERTQS...<Delegation UCAN above>"
  ]
}

6.5. Scenario 5: Service Provider Writing Data

The Service Provider can also get access to a User’s data vault to write data. Writing of data can be both editing existing data items as well as creating data items. This is useful for use cases where a Service Provider produces data that actually belongs to an end user, such as a medical file of a patient. In the case of editing existing data, the encrypted data blob is fully replaced, but the DEK and it’s wrapped keys are not touched. In the case of putting new data items, the same procedure is followed with as a User creating data but immediately with two wrapped keys of the DEK.

sequence-sp-edit

Important to note is that the Secure Data Vault stores old versions of the same data item as well where possible. This prevents Service Providers from being able to "destroy" existing User Data. Secure Data Vault providers can themselves configure how many versions are stored or how old versions can be before getting clean up. This can be the SLA that Secure Data Vault providers can use to commercialize.

sequence-sp-write

The primary difference between the Service Provider creating data on behalf of the user versus a user creating their own data is the amount of DEK wrappings. In this case the Service Provider puts the data with already two wrapped keys. It is important that the SDV validates that these two wrapped keys make sense, one of the two should be linked to the DID of the requestor and one of the two should be a known Peer DID alias of the User. Additionally, more then two wrapped keys should be forbidden, as the Service Provider is not allowed to already select certain parties to share data with. The user always needs to be involved on deciding with whom they want to share their data.

6.6. Scenario 6: Revoking Access (Key Rotation)

Revocation in a cryptographic system requires Key Rotation. Revoking the issued UCAN is always the first step in revocation and will already provide enough protection in almost all cases. However, to become robust against hacks or bugs on the SDV, the architecture also proposes to do a full key rotation and re-encryption to make sure the revoked Service Provider can not gain access anymore through falsely leaked ciphertexts.

sequence-revocation

A big distinction between a typical write update and a access revocation is that not only the data blob is replaced but all the wrapped key headers as well. At the same time, old versions are not kept (or should not be kept long) as this would defeat the purpose of the mitigation against accidental leaks. Even if the Service Provider kept a copy of the old encrypted blob, they cannot decrypt the new updates. If they try to access the current file on the SDV, they will fail because their key no longer works (in the rare case their UCAN would still be accepted).

For simple data items, like an address or bank account number, the re-encryption is not a performance issue and can easily happen on the device of the end user. However, when the shared data that needs to be revoked is big, like large video files, this full download, decryption, re-encryption, and upload can become expensive operations or in some cases maybe even impossible due to the constraints of the end user’s device. To solve this issue, the SDV can use Trusted Execution Environments perform the bulk of the rotation on the Secure Data Vault. In essence, the end user delegates their work to the SDV. The flow would then look like the diagram below.

sequence-revocation-on-sdv

The scenario discards UCAN validation for every interaction of with the SDV for brevity.

A critical part of this process is the validation of the Secure Data Vault TEE key when the user gives the decrypted DEK. The user’s device needs to validate the full chain of trust of the TEE attestation to make sure that it is not giving the decryption keys of their data to an unauthorized third party or a system administrator of the SDV provider. Different technologies exist for the Trusted Execution Environments, such as Intel SGX, Intel TDX, AMD SEV, ARM CCA, AWS Nitro Enclaves, …​ Each attestation document for these technologies is different but follows a similar approach. There is a known root of trust, such as the Intel well known public keys or AWS’s Nitro Hypervisor public keys, enclave measurements, and attested data. The root of trust is well known across any user of TEE, but in this case the enclave measurements should also be known beforehand to the End User Wallet. The enclave measurements essentially prove the code that is running in the enclave. The End User can only trust code that is open-source, validated by independent researchers, and with known enclave measurements. This avoids a SDV Provider putting backdoors in the enclaves they run on behalf of users.

6.7. Scenario 7: Synchronization (Cache vs. Cloud)

This scenario occurs during the Authentication & Presentation flow. Before presenting credentials to a verifier, the Wallet must ensure its local cache is up to date with the Cloud Vault ("Source of Truth"). This is important because the user is offline or has multiple devices. An Issuer with write access might have updated a VC in the background.

sequence-sync

The /vault/sync call is bidirectional: the Wallet uploads any pending local mutations in the same request it fetches server-side changes, using since_hash as the common reference point. The Wallet always syncs before generating a proof to ensure it isn’t using revoked credentials. If the Vault is unreachable, the Wallet defaults to the local cache, accepting the risk of staleness.

6.8. Deep dive into UCAN permissions

In the different scenarios, different types of UCAN permissions need to be validated. This section summarizes the full validation chain for UCAN 1.0 invocation tokens and puts extra attention on what each command entails. Each request to the SDV is a JSON-RPC call whose body is a signed invocation token. Validation proceeds as follows:

  1. The public key of the iss DID in the invocation token is resolved (self-contained for did:key / did:peer, no external registry lookup required).

  2. The EdDSA signature of the invocation token is verified against the resolved public key.

  3. The exp (and optional nbf) time bounds are validated against the current time.

  4. The nonce of the invocation is checked against recently processed nonces to prevent replay attacks.

  5. The sub field of the invocation is checked to confirm it matches the vault namespace being accessed.

  6. Each delegation token in the prf chain is validated: its signature, time bounds, and that iss of the inner delegation matches the aud of the outer delegation, until the root self-signed delegation is reached.

  7. The invocation’s cmd is matched against the delegation’s cmd pattern (e.g. /doc/read satisfies /doc/ or /).

  8. The invocation’s args are checked against the delegation’s pol constraints (e.g. args.endpoint must be in the allowed endpoint list).

  9. Special restrictions apply per command:

    • cmd: /doc/read: The SDV returns the Ciphertext and only the wrapped DEK for the DID matching the invocation’s iss. No other wrapped keys are exposed.

    • cmd: /doc/update: When content.data_encryption is absent, only the Ciphertext is replaced and the wrapped key headers are not modified. When content.data_encryption is present (key rotation after revocation), the submitted wrapped key set must be a strict subset of the original set — new Peer DIDs may not be introduced.

    • cmd: /doc/create: A new Ciphertext with at most 2 wrapped keys is accepted. At least one wrapped key must correspond to a known User Peer DID or Root DID. A second wrapped key must match the DID of the requester.

    • cmd: /doc/delete: No additional restrictions on the payload.

    • cmd: /doc/share: The Ciphertext must not be modified; only a single new dek_entry may be appended and the associated peer_did is registered as an alias in the same atomic operation.

7. Deployment View

The Deployment View describes the technical infrastructure used to execute the system. It maps the SCAMPER software building blocks to physical hardware or containerized environments.

7.1. Infrastructure Level 1: The Solid-Based Vault

The SCAMPER Secure Data Vault (SDV) is built upon a W3C Solid implementation (e.g., Community Solid Server). It is deployed as a cloud-agnostic, containerized architecture that strictly separates metadata (LDP-RS) from raw binary files (LDP-NR). To support unlinkable identities, a custom DID resolution and UCAN authorization pipeline sits in front of the native Solid routing.

deployment-level1
Motivation

By adopting a W3C Solid foundation, the Vault inherits industry-standard data routing (Linked Data Platform). However, we replace the centralized WebID/OIDC paradigm with SCAMPER’s decentralized DID/UCAN model. Because we use Pairwise DIDs for privacy, we must inject a custom Namespace Resolution Layer (DID Alias Resolver) to seamlessly map multiple public-facing service endpoints to a single, deduplicated internal storage structure without exposing the user’s root identity.

Quality and Performance Features
  • Scalability: The SDV Application Container is entirely stateless. The UCAN Validator requires no session memory. We could deploy multiple instances behind the Reverse Proxy to handle high traffic without coordination issues.

  • Storage Optimization: Large files are pushed directly to cheap Object Storage. The Graph Database is kept small and fast, containing only URIs, blind indexes, and relationships.

  • Cloud Agnosticism: By targeting Docker, an S3-compatible API (MinIO), and a standard SPARQL endpoint, the entire Vault can be deployed on any cloud provider or self-hosted environment.

Mapping of Building Blocks to Infrastructure
Building Block Deployment Target Rationale

SCAMPER UCAN Middleware

Solid Server Pipeline

Intercepts requests before Solid routing. Handles DID verification and UCAN capability validation against the public Service DID requested.

DID Alias Resolver

Solid Server Pipeline

Acts as an IdentifierStrategy override. Once authorized, it translates the public did:peer namespace to the Vault’s internal Storage DID to enable deduplication.

LDP Router

Solid Server Core

The native Solid router. Inspects HTTP Content-Type on the translated internal path to route data to either the Graph DB or Object Storage.

LDP-RS Persistence

Graph Database Node

Stores RDF Sources (Linked Data, Verifiable Credentials, Alias Mappings). Accessible via SPARQL.

LDP-NR Persistence

MinIO / Local FS

Stores Non-RDF Sources (Images, Encrypted PDFs) using self-hosted S3-compatible APIs.

7.2. Infrastructure Level 2: LDP-RS vs LDP-NR Storage Routing

To ensure the Graph Database remains lightning-fast for blind-index queries and relation traversals, the Solid backend strictly routes data based on its mathematical footprint:

  • The Metadata Path (LDP-RS): Any payload submitted as application/ld+json or text/turtle is routed by the Solid server to the Graph Database. This handles the lightweight, highly connected structural data of the Vault.

  • The Binary Path (LDP-NR): Any unstructured payload (e.g., a 5MB encrypted application/pdf or standard JWE blob) bypasses the graph entirely. The Solid server streams these bytes directly to MinIO (or the local file system). The server then automatically generates a tiny RDF metadata node in the Graph Database acting as the "pointer" to the MinIO object, ensuring the graph remains aware of the file without hosting its weight.

8. Cross-Cutting Concepts

This section describes the overarching design principles, rules, and standardized solutions that apply across multiple building blocks within the SCAMPER Secure Data Vault (SDV) and Edge Wallet ecosystem.

8.1. 1. Privacy-Preserving Logging and Monitoring

Standard web server logging (e.g., Nginx default access logs) is highly dangerous in a zero-trust architecture, as it routinely captures URIs, IPs, and identifiers. To maintain the Unlinkability requirement of the SDV, all logging must be strictly sanitized.

  • No Plaintext Identifiers: did:peer values, IP addresses, and specific resource paths MUST NOT be written to application logs in plaintext.

  • Cryptographic Hashing in Logs: When tracing a request through the system (e.g., tracking a failed UCAN validation), the system must log a one-way cryptographic hash (e.g., SHA-256) of the DID or URI. This allows Vault administrators to debug traffic flows and trace errors without ever knowing who is making the request.

  • Scrubbed Payloads: The LDP Router and API Gateway must automatically redact HTTP request/response bodies from all error logs.

8.2. 2. "Dumb Server, Smart Client" Cryptography

To ensure absolute digital sovereignty, the cryptographic boundary is strictly enforced between the Edge Wallet and the SDV.

  • Edge-Side Key Generation: All private keys, including the Master Biometric Seed and deterministically derived Pairwise keys (HKDF), exist exclusively on the User’s Edge Wallet.

  • No Server-Side Decryption: The Vault application layer is mathematically incapable of decrypting the user’s LDP-NR file blobs or LDP-RS blind indexes. The Vault only performs public-key signature verification (via the UCAN Invocation Validator). The invocation token arrives as the JSON-RPC request body; there is no separate Authorization header.

  • Self-Certifying Revocation: State changes regarding access control (revocation) are not trusted by default. The Vault only accepts state changes that are cryptographically signed by the owning identity (e.g., the /access/revoke command).

8.3. 3. Unified Error Handling (Anti-Probing)

Error responses from the Vault must be carefully designed to prevent malicious actors from mapping the Vault’s internal structure or discovering Alias routes.

  • Opaque JSON-RPC Error Codes: If a Verifier submits a validly formatted invocation token but lacks the capability, or if the requested endpoint does not exist, the Vault must return a generic JSON-RPC error (e.g., code -32001 with a fixed message "vault error") without elaborating on the specific failure. The JSON-RPC data field must never expose whether a resource exists but is inaccessible versus simply not existing, as distinguishing the two allows an attacker to probe vault contents.

  • Sanitized Error Messages: Internal stack traces, database query failures (e.g., SPARQL timeouts), or storage adapter exceptions must be caught by a global exception handler and translated into generic JSON-RPC error codes before being returned to the Verifier or Wallet.

8.4. 4. Data Persistence & Routing Rules (Solid LDP)

The system enforces a strict bifurcation of data storage to maintain performance and privacy.

Data Profile Content-Type Examples Persistence Target Encryption Standard

Metadata / Graph (LDP-RS)

application/ld+json, text/turtle

RDF Triple Store

Blind Indexed (HMAC) or Public Plaintext (Helper Data)

Binary / Files (LDP-NR)

application/pdf, image/jpeg

Object Storage (MinIO)

Envelope Encrypted (JWE + DEK)

  • The Pointer Rule: Every LDP-NR binary stored in Object Storage must have a corresponding, lightweight LDP-RS node in the Graph Database acting as a pointer. This ensures the Solid routing logic can resolve the file’s metadata without downloading the raw binary.

8.5. 5. Idempotency and Offline-First Synchronization

Because the Edge Wallet operates on a mobile device, network connectivity is assumed to be intermittent.

  • Idempotent Vault Operations: All /doc/update, /doc/delete, and /doc/share JSON-RPC invocations to the SDV must be idempotent. If the Edge Wallet loses connection mid-upload and retries the exact same invocation (same nonce), the Vault must handle it gracefully without creating duplicate entries or throwing structural errors. Note: the nonce replay check prevents the SDV from processing a stale retransmission as a fresh invocation — the client must issue a new invocation with a fresh nonce for each genuinely new operation.

9. Architecture Decisions

This section documents the most critical architectural decisions that shape the SCAMPER Secure Data Vault (SDV) and Edge Wallet. These decisions focus on balancing strict zero-knowledge privacy with system performance and storage efficiency.

9.1. ADR 1: UCAN 1.0 over OAuth2 / OIDC for Authorization

  • Context: Traditional web architectures rely on centralized Authorization Servers (OAuth2) and Identity Providers (OIDC). SCAMPER requires a decentralized, offline-first approach where the Vault acts as a "dumb" resource server without knowing the user’s root identity. UCAN 1.0 introduces a formal separation between delegation tokens (granting capability from one DID to another) and invocation tokens (exercising a capability in a specific request), replacing the single-token model of earlier versions.

  • Decision: We use UCAN 1.0. The Edge Wallet generates and signs delegation tokens locally and creates a fresh, per-request invocation token for each JSON-RPC call. Invocation tokens are the JSON-RPC request body — no separate Authorization header is needed. The nonce field in each invocation provides per-request replay protection without requiring server-side session state.

  • Consequences:

  • Positive: Complete elimination of centralized identity bottlenecks. The Vault requires zero session state and minimal database tables to verify access. The pol field in delegation tokens enables fine-grained, declarative access policies (e.g., endpoint allowlists) without custom server logic.

  • Negative: Shifts immense cryptographic complexity to the Edge Wallet. The user must manage their own key material, requiring robust recovery mechanisms (e.g., Biometric/SFE seed recovery). Clients must implement UCAN 1.0 token generation and proof-chain construction.

9.2. ADR 2: Solid LDP-RS / LDP-NR Storage Split

  • Context: The Vault needs to support highly optimized Linked Data queries (SPARQL) to resolve verifiable credentials and helper data, while also storing heavy binary files (e.g., 5MB PDFs, medical scans).

  • Decision: We strictly adhere to the W3C Solid protocol’s data bifurcation. RDF Sources (LDP-RS) are routed to a dedicated Graph Database (e.g., Oxigraph/Jena), while Non-RDF Sources (LDP-NR) are routed to an S3-compatible Object Store (e.g., MinIO).

  • Consequences:

  • Positive: Prevents memory exhaustion in the Graph Database. Ensures the Vault can scale infinitely to handle heavy binary payloads without degrading query performance.

  • Negative: Introduces architectural complexity. The Solid router must manage "pointers" (metadata nodes in the graph that reference the S3 blobs) to keep the systems synchronized.

9.3. ADR 3: DID Alias Routing for Storage Deduplication

  • Context: To maintain Unlinkability, the Edge Wallet generates a unique did:peer for every Service Provider (Bank, University). Storing a separate physical copy of a 5MB file for every did:peer that has access would cause massive storage bloat.

  • Decision: We implemented a Namespace Resolution Layer (Alias Resolver) in the Vault’s application pipeline. The Wallet registers public Service DIDs as aliases that route to a single, hidden Storage DID.

  • Consequences:

  • Positive: Massive storage efficiency. Files are stored and encrypted only once (using Envelope Encryption), and access control is enforced at the alias level. Simplifies UCAN issuance for the Wallet.

  • Negative: We accept a localized correlation risk. While the Service Providers cannot collude, the Vault Administrator could theoretically look at the Alias Registry and deduce that multiple Service DIDs point to the same Storage DID. We accept this trade-off for operational practicality.

9.4. ADR 4: Cryptographic Append-Only Revocation Logs

  • Context: In a local-first architecture, the Edge Wallet cannot act as an always-online server to issue short-lived tokens. We must issue long-lived UCANs, which necessitates a robust revocation mechanism that survives server migrations.

  • Decision: Instead of using a standard database table for a Certificate Revocation List (CRL), the Vault uses an Append-Only Cryptographic Log. Revocations are processed as /access/revoke commands signed by the issuing DID.

  • Consequences:

  • Positive: The revocation state is self-certifying. The Vault can be migrated to new infrastructure without needing to establish trust in the old server’s database; the new server simply recalculates the cryptographic signatures on the log.

  • Negative: The Vault must perform signature verification on the revocation log every time a token is presented, slightly increasing compute overhead per request. The revocation logs are crucial during data migration. Long-lived tokens pose risks in situations where verifiers lose control over their keys without the user knowing.

9.5. ADR 5: JSON-RPC over REST for the Protected Data API

  • Context: UCAN 1.0 invocation tokens are self-contained, signed RPC calls: the cmd field is the method name and args holds the parameters. Mapping this model onto REST requires an artificial translation between HTTP verbs (GET/PUT/POST/PATCH/DELETE) and UCAN commands, adding complexity without benefit. JSON-RPC 2.0 aligns the transport directly with the UCAN invocation model.

  • Decision: The SDV exposes its protected data API as JSON-RPC 2.0 over HTTPS. Each request body is a UCAN 1.0 invocation token. The cmd field resolves to the JSON-RPC method; args provides the parameters. The Public Read API (unauthenticated discovery endpoints) remains plain HTTPS GET for maximum compatibility.

  • Consequences:

  • Positive: The transport is a natural fit for the UCAN invocation model. cmd and args are self-describing and version-stable; no mapping from HTTP verb semantics is needed. JSON-RPC error codes replace ambiguous HTTP status codes, and the data field can carry structured error details without leaking vault internals.

  • Negative: Loses HTTP caching and CDN edge-layer optimizations for read requests (since all protected calls are POST bodies). Clients require a JSON-RPC library and cannot rely on simple curl-style HTTP tooling for authenticated vault access.

10. Quality Requirements

This section defines the core quality attributes of the SCAMPER Secure Data Vault (SDV) and Edge Wallet. These requirements act as the benchmark for the system’s success, focusing on privacy, sovereign security, and edge-centric usability.

10.1. 10.1 Quality Tree

The quality tree provides a high-level overview of the system’s driving quality attributes:

  • Privacy (Unlinkability)

  • Service Provider Isolation: Verifiers cannot collude to track a user’s recources they do not have access to.

  • Zero-Knowledge Storage: The SDV cannot read the user’s files or search queries.

  • Security (Data Sovereignty)

  • Cryptographic State: All access control and revocation must be mathematically verifiable.

  • Edge Authority: The server cannot issue or modify capabilities without the Edge Wallet’s private keys.

  • Performance

  • Storage Efficiency: Deduplication of heavy binary files (LDP-NR).

  • Query Speed: Millisecond resolution of graph traversals (LDP-RS).

  • Usability (Resilience)

  • Device Recovery: Users can recover their full identity and vault access without centralized backups.

  • Portability (Vault Provider Independence)

  • Completeness: Users can migrate their vault data between different providers without losing any data or control.

  • Ease of Migration: The migration process does not require complex actions from the user or trust in the old provider’s infrastructure.

10.2. 10.2 Quality Scenarios

These concrete scenarios translate the abstract quality attributes into highly specific, testable conditions for the engineering and QA teams.

10.2.1. Scenario 1: Privacy (The Collusion Test)

  • Stimulus: The Bank and the University attempt to merge their data to build a master profile of Alice.

  • Response: Because the Edge Wallet deterministically generated unique Pairwise DIDs (did:peer) for each institution, and issued separate UCANs pointing to isolated Alias paths, the institutions have no matching identifiers (they could use Personal Data to match documents if they had access to the encrypted underlying data, mitigating the risk of this is out of scope for the SDV).

  • Measure: An external audit verifies that no overlapping identifiers exist between the two Service Providers' requests.

10.2.2. Scenario 2: Security (The Rogue Admin Test)

  • Stimulus: A malicious administrator at the cloud hosting provider gains direct database access to the Vault’s Graph Database and Object Storage.

  • Response: The administrator sees only opaque UUIDs in MinIO and blinded HMAC hashes in Oxigraph. They destroy or scramble data, but they cannot forge any meaningful data without the Edge Wallet’s private keys.

  • Measure: The API Gateway’s UCAN Validator immediately rejects the third party’s request because the Vault lacks the Edge Wallet’s private key required to sign a valid delegation chain (prf). Access is mathematically denied.

10.2.3. Scenario 3: Performance (The Heavy Load Test)

  • Stimulus: 1,000 Verifiers simultaneously request 5MB encrypted medical scans (LDP-NR) using valid UCANs.

  • Response: The Solid LDP Router intercepts the requests, verifies the UCANs, and streams the binaries directly from the S3-compatible MinIO storage.

  • Measure: The RDF Graph Database experiences no memory bloat during the file transfers. CPU usage on the SDV Application Container scales linearly with the signature verification load, and the system maintains a certain success rate under load.

10.2.4. Scenario 4: Usability (The Device Loss Protocol)

  • Stimulus: A user loses their smartphone and buys a completely new, blank device.

  • Response: The user scans their face/biometrics using Secure Fuzzy Extraction (SFE). The Edge Wallet reconstructs the Master Biometric Seed. The Wallet then uses HKDF to re-derive the Vault Storage DID and downloads the encrypted Active Grants backup file.

  • Measure: After the biometric scan and recovery of the biometric seed, the new device can successfully query the Vault’s Graph DB via blind indexing and find the user’s VCs, audit logs and active grants (recovering the backup).

10.2.5. Scenario 5: Migration (The Sovereignty Test)

  • Stimulus: The user decides they no longer trust their current SDV provider and wants to self-host their Vault.

  • Response: The user copies the raw Object Storage files and the RDF Triples to the new server.

  • Measure: Because the revocation list is an append-only log of cryptographically signed ucan/revoke tokens (not a fragile SQL state), the new server immediately enforces all previous revocations without needing to "trust" the old server’s data.

11. Risks and Technical Debt

This section outlines the known technical risks, accepted architectural compromises, and accumulated technical debt within the SCAMPER Secure Data Vault (SDV) and Edge Wallet ecosystem. Identifying these upfront allows the engineering team to prioritize mitigation strategies in future sprints.

11.1. 11.1 Architectural Risks

These are systemic risks introduced by the core design choices (documented in Chapter 9: Architecture Decisions).

  • Risk 1: Edge-Side Key Recovery Failure (The "Lockout" Risk)

  • Description: Because SCAMPER uses a strict Zero-Knowledge, Zero-Trust architecture, there is no centralized "Forgot Password" database. If the user loses their device and the Secure Fuzzy Extraction (SFE) biometric algorithm fails to perfectly reconstruct their Master Seed, the user is permanently locked out of their Vault. The data becomes mathematically unrecoverable cryptographic noise.

  • Mitigation: (WP2 responsibility)

  • Risk 2: Alias Registry Correlation

  • Description: As accepted in ADR 3, the SDV’s Alias Resolver maps multiple public-facing Service DIDs (e.g., Bank, University) to a single internal Storage DID to save disk space. A malicious Vault Administrator with database access could analyze this registry and deduce that the Bank and the University are interacting with the exact same human, degrading the theoretical Unlinkability of the system.

  • Mitigation: We accept this trade-off for storage efficiency. To mitigate it, the Vault infrastructure must employ strict internal access controls, disk encryption at rest, and automated audit logging for admin actions.

11.2. 11.2 Technical Debt

These are engineering compromises made to accelerate development or integrate with existing standards.

  • Debt 1: Building on Existing Solid Server

  • Description: By building upon a W3C Solid implementation and stripping out its native WebID/OIDC authentication to inject a custom UCAN middleware and did resolver, we are possibly diverging from the upstream repository.

  • Impact: Every time the core Solid project releases a major security patch or architectural update, our team will incur the technical debt of porting those changes into our heavily customized fork, specifically ensuring our UCAN/DID pipeline does not break the LDP router.

  • Debt 2: Revocation Log Computation Overhead

  • Description: Because we use an append-only cryptographic log for token revocations, the SDV must recalculate the signature chain of the log upon request. As the user revokes more connections over a span of years, this log will grow linearly.

  • Impact: Eventually, parsing the revocation log will introduce latency to standard file retrieval requests.

12. Glossary

This section defines the core terminology, acronyms, and domain-specific concepts used throughout the SCAMPER architecture. It serves as the ubiquitous language for the engineering, security, and product teams.

Term Definition

Alias Registry

A routing table inside the SDV that translates a public-facing Service DID (e.g., did:peer:bank) into a hidden internal Storage DID. Used to deduplicate files without exposing the user’s root identity.

Blind Indexing

A cryptographic technique (typically using HMAC) that hashes search terms into deterministic gibberish. It allows the Edge Wallet to search the Vault’s Graph Database for specific metadata without the Vault ever knowing what is being searched for.

DID (Decentralized Identifier)

A globally unique W3C standard identifier that resolves to a public key document without relying on a centralized registry (like a DNS provider or tech monopoly).

DEK (Data Encryption Key)

A randomly generated symmetric key used to encrypt a single data payload. The DEK itself is then encrypted (wrapped) with the public key of each authorized party, forming the core of SCAMPER’s Envelope Encryption structure. Multiple wrapped copies of the same DEK can coexist in a file’s header — one per authorized recipient — without exposing the plaintext to the Vault.

did:peer

A specific DID method used in SCAMPER. It generates pairwise, unique identifiers for every new connection, ensuring that Service Providers cannot collude or link a user’s activities across different domains.

Edge Wallet

The "Smart Client" mobile application controlled by the user. It is the sole cryptographic authority in the SCAMPER ecosystem, responsible for key derivation, data encryption, and UCAN issuance.

Envelope Encryption

A hybrid encryption pattern in which data is encrypted once with a fast symmetric key (the DEK), and the DEK is encrypted separately for each authorized recipient using their public key. This allows efficient multi-party sharing of large files: only the small DEK header needs to be re-encrypted when access is granted or revoked, not the file content itself.

HKDF (HMAC-based Key Derivation Function)

The mathematical algorithm used by the Edge Wallet to deterministically generate thousands of reproducible Pairwise DIDs and keys from a single Master Biometric Seed.

JWE (JSON Web Encryption)

The standard format used for Envelope Encryption in the Vault. The heavy file (e.g., PDF) is encrypted with a symmetric Data Encryption Key (DEK), and the DEK is encrypted for the specific Verifier.

Key Rewrapping

The operation of decrypting an existing DEK with one party’s private key and re-encrypting it with a different recipient’s public key, without touching the underlying ciphertext. In SCAMPER, the Edge Wallet performs Key Rewrapping to grant a Service Provider access to an already-stored file, appending a new wrapped DEK header for that provider’s Peer DID.

LDP-NR (Non-RDF Source)

A Solid protocol term for unstructured binary data (e.g., PDFs, JPEGs, encrypted JWE blobs). In SCAMPER, these are strictly routed to the Object Storage layer (MinIO).

LDP-RS (RDF Source)

A Solid protocol term for structured Linked Data (e.g., JSON-LD, Turtle, Triples). In SCAMPER, these are strictly routed to the Graph Database to enable lightning-fast metadata traversal.

PQC (Post-Quantum Cryptography)

Cryptographic algorithms designed to remain secure against attacks by quantum computers. SCAMPER uses a hybrid scheme combining a classical algorithm (X25519) with a post-quantum algorithm (ML-KEM / Kyber 768) into a single key exchange (X25519MLKEM768), ensuring security against both classical and future quantum adversaries without sacrificing compatibility with current systems.

SDV (Secure Data Vault)

The cloud-agnostic "Dumb Server" that hosts the user’s encrypted data. It possesses no private keys, cannot decrypt LDP-NR files, and cannot reverse LDP-RS blind indexes.

SFE (Secure Fuzzy Extraction)

The cryptographic process of turning noisy, analog biometric readings (like a 3D face scan) into a perfectly stable, reproducible cryptographic Master Seed, enabling deviceless key recovery.

SPARQL

The standard query language used to retrieve and manipulate data stored in Resource Description Framework (RDF) format. Used by the SDV’s Graph Database to execute blind searches and resolve helper data.

SSI (Self-Sovereign Identity)

A design philosophy in which individuals fully own and control their digital identities and personal data without relying on any centralized authority (government, corporation, or identity provider). SCAMPER is built on SSI principles: users hold their own cryptographic keys, control vault access via UCAN delegation, and can migrate their data at any time without dependence on their SDV provider.

TEE (Trusted Execution Environment)

A hardware-isolated, cryptographically attested compute context (e.g., Intel SGX, AMD SEV, ARM CCA, AWS Nitro Enclaves) whose code integrity can be verified by any external party through an attestation document. In SCAMPER, TEEs are optionally used by the SDV to perform expensive key rotation and re-encryption on behalf of the Edge Wallet for large files. The user’s wallet validates the TEE attestation — including enclave measurements against a known open-source build — before trusting it with DEK material.

UCAN 1.0 (User-Controlled Authorization Networks)

A decentralized authorization protocol based on JSON Web Tokens (JWTs). UCAN 1.0 uses two distinct token types: a delegation token (grants a named capability cmd to an aud DID, optionally constrained by pol) and an invocation token (exercises a delegated capability as an actual JSON-RPC request, carrying cmd, args, and the delegation chain in prf). Instead of an Authorization Server granting access, the Edge Wallet cryptographically signs delegations locally and creates a fresh invocation token per request. No Authorization: Bearer header is used — the invocation IS the request body.

Delegation Token

A UCAN 1.0 token issued by the Data Owner to grant a capability to another DID. Fields: iss (issuer), aud (recipient), sub (vault namespace), cmd (permitted command pattern, e.g. /doc/read or /*), pol (policy constraints on args, e.g. endpoint allowlist), nonce, exp, prf (proof chain to root). Self-signed root delegations have iss == aud == sub.

Invocation Token

A UCAN 1.0 token that constitutes an actual JSON-RPC request. Sent as the JSON-RPC 2.0 request body. Fields: iss (caller), sub (vault namespace), cmd (command being invoked, e.g. /doc/create), args (parameters: endpoint, headers, payload), nonce (per-request unique value for replay protection), exp, prf (delegation chain proving the caller has the right to invoke cmd). Has no aud field.

JSON-RPC

The transport protocol used for all authenticated interactions with the SDV’s Protected Data API. Each request is a UCAN 1.0 invocation token encoded as a JSON-RPC 2.0 method call over HTTPS. The invocation’s cmd field maps to the JSON-RPC method name; args contains the parameters. The unauthenticated Public Read API uses plain HTTPS GET.

VC (Verifiable Credential)

A tamper-evident digital credential (e.g., a digital passport or salary slip) whose authorship and integrity can be cryptographically verified by any party.

Verifier

A Service Provider (e.g., a bank, employer, or hospital) that receives a UCAN delegation from the Data Owner and uses it to retrieve and decrypt data from the SDV. Each Verifier establishes a unique Peer DID relationship with the Data Owner and holds its own private key to unwrap the copy of the DEK that was re-encrypted specifically for it during the Key Rewrapping step.