5. Claims Schema
5.1 Structure overview
A PCT consists of three components, following the JWT convention:
- Header. Metadata about the token itself: the signing algorithm, key identifier, and PCT specification version.
- Payload. The structured claims object encoding the data obligations.
- Signature. The cryptographic signature over the header and payload.
When serialised for transmission, the PCT must be encoded as a Base64URL-encoded string of the form header.payload.signature, consistent with RFC 7519 compact serialisation. Human-readable JSON representations may be used for documentation, debugging, and audit log storage.
5.2 Header fields
| Field | Type | Required | Description |
|---|---|---|---|
alg | string | REQUIRED | Signing algorithm. Must be RS256 (RSA + SHA-256) or HS256 (HMAC + SHA-256). RS256 is recommended for multi-party deployments. |
kid | string | REQUIRED | Key identifier. A reference to the signing key used, enabling key rotation without token invalidation. |
typ | string | REQUIRED | Token type. Must be the literal string PCT. |
pct_version | string | REQUIRED | The version of this specification the token conforms to. For this version, must be 0.1. |
5.3 Core payload fields
| Field | Type | Required | Description |
|---|---|---|---|
pct_id | string (UUID v4) | REQUIRED | Globally unique identifier for this PCT instance. Must be a UUID v4. |
issued_at | integer (Unix epoch) | REQUIRED | Timestamp at which the PCT was issued, in seconds since Unix epoch (UTC). |
valid_from | integer (Unix epoch) | REQUIRED | Timestamp from which the PCT is valid. May equal issued_at. |
expires_at | integer (Unix epoch) | REQUIRED | Timestamp after which the PCT is no longer valid. Verifiers must reject expired PCTs. |
issuer | string (URI) | REQUIRED | URI identifying the issuing entity. Should be a stable, resolvable identifier. |
subject_id | string | REQUIRED | Identifier for the dataset, data flow, or processing subject this PCT is attached to. |
subject_type | enum | REQUIRED | Category of subject. Permitted values: dataset, data_flow, api_request, ai_interaction, transfer. |
data_origin | string (ISO 3166-1 alpha-2) | REQUIRED | Two-letter country code of the jurisdiction where the data was originally collected. |
data_categories | array of enum | REQUIRED | The categories of data present. Permitted values include: personal, sensitive, special_category, health, financial, biometric, genetic, criminal, communications, children, pseudonymised, anonymised. |
lawful_basis | object | REQUIRED | The legal ground(s) for processing. See Section 5.4. |
allowed_purposes | array of string | REQUIRED | The purposes for which the data may be used. Values should be drawn from a controlled vocabulary (see Appendix B) or expressed as URIs. |
consent_status | boolean | CONDITIONAL | Required when lawful_basis includes consent. True indicates valid, informed, current consent exists. |
consent_scope | array of string | CONDITIONAL | Required when consent_status is true. The specific purposes covered by the consent, consistent with allowed_purposes. |
consent_record_ref | string (URI) | OPTIONAL | Reference to an external consent record, enabling verification against the system of record. |
jurisdiction_rules | object | REQUIRED | Constraints on where the data may be processed. See Section 5.5. |
transfer_restrictions | object | CONDITIONAL | Required when subject_type is transfer or when cross-border processing is anticipated. See Section 5.6. |
retention_limit | string (ISO 8601 duration) | OPTIONAL | The maximum period for which the data may be retained (e.g. P2Y for two years). |
automated_decision_flag | boolean | OPTIONAL | Set to true if the data may be used in automated decision-making subject to Article 22 GDPR or equivalent. |
data_hash | string | REQUIRED | Cryptographic hash of the canonical serialised form of the data payload at the time of token issuance. See Section 5.8 for canonicalisation requirements. |
hash_algorithm | enum | REQUIRED | Hashing algorithm used to produce data_hash. Permitted values: sha-256, sha-384, sha-512. sha-256 is RECOMMENDED. MD5 and SHA-1 are explicitly prohibited. |
hash_scope | enum | REQUIRED | Defines what was hashed. Permitted values: full_payload (entire data payload hashed as a single unit), merkle_root (Merkle tree hash structure; see Section 5.8.3). |
data_format | string | OPTIONAL | MIME type or format descriptor of the data payload at the time of hashing (e.g. application/json, text/csv, application/octet-stream). Assists verifiers in reproducing the canonical form. |
ai_context | object | CONDITIONAL | Required when subject_type is ai_interaction. See Section 5.9. |
extensions | object | OPTIONAL | Extension namespace claims. See Section 5.7. |
5.4 The lawful_basis object
The lawful_basis object must contain at least one basis. Where multiple bases apply, all must be listed.
| Field | Type | Required | Description |
|---|---|---|---|
bases | array of enum | REQUIRED | The applicable lawful basis or bases. Permitted values: consent, contract, legal_obligation, vital_interests, public_task, legitimate_interests, not_applicable (for anonymised data). |
legitimate_interests_assessment_ref | string (URI) | CONDITIONAL | Required when bases includes legitimate_interests. Reference to the Legitimate Interests Assessment (LIA) on record. |
legal_obligation_ref | string (URI) | CONDITIONAL | Required when bases includes legal_obligation. Reference to the specific legal instrument creating the obligation. |
framework | string | OPTIONAL | The regulatory framework under which the lawful basis is assessed (e.g. GDPR, UK_GDPR, HIPAA). Where omitted, GDPR is assumed. |
5.5 The jurisdiction_rules object
| Field | Type | Required | Description |
|---|---|---|---|
permitted_regions | array of string (ISO 3166-1 alpha-2) | REQUIRED | Country codes in which processing is permitted. Use * to indicate no restriction, though this is discouraged for sensitive data. |
restricted_regions | array of string (ISO 3166-1 alpha-2) | OPTIONAL | Country codes in which processing is explicitly prohibited, overriding any general permission. |
residency_required | boolean | OPTIONAL | If true, data must remain within permitted_regions at all times and may not be temporarily processed elsewhere. |
sovereignty_framework | string | OPTIONAL | Reference to a sovereignty or adequacy framework under which processing is permitted (e.g. GDPR_adequacy, UK_adequacy, APEC_CBPR). |
5.6 The transfer_restrictions object
| Field | Type | Required | Description |
|---|---|---|---|
permitted_destinations | array of string (ISO 3166-1 alpha-2) | REQUIRED | Country codes to which transfer is permitted. |
transfer_mechanism | enum | REQUIRED | The legal mechanism authorising the transfer. Permitted values: adequacy_decision, standard_contractual_clauses, binding_corporate_rules, derogation, not_required, other. |
transfer_mechanism_ref | string (URI) | OPTIONAL | Reference to the specific instrument (e.g. the executed SCCs) authorising the transfer. |
onward_transfer_permitted | boolean | OPTIONAL | Whether the recipient may further transfer the data to a third party. |
5.7 Extension namespaces
Extension claims are added to the extensions object using the prefix convention x-{framework}:{field}. Extension keys must not conflict with core schema field names. The following extension namespaces are defined in this version:
x-hipaa:Claims addressing HIPAA-specific obligations (minimum_necessary,phi_flag,permitted_disclosure,baa_in_place).x-dora:Claims addressing DORA-specific obligations (ict_risk_classification,third_party_flag,incident_trigger).x-duaa:Claims addressing UK Data Use and Access Act obligations (access_condition,trusted_research_env).x-pecr:Claims addressing PECR / ePrivacy obligations (tracking_consent,comms_data_flag,marketing_permission).x-ai-act:Claims addressing EU AI Act obligations (risk_tier,human_oversight_required,prohibited_use_check,training_data_flag,conformity_assessment_ref).
Any implementer may define additional extension namespaces using the x-{label}: prefix. Extension namespaces not defined in this specification must be documented publicly by the defining party. Verifiers encountering unknown extension namespaces must not fail silently — they must either evaluate the extension claim or flag the PCT as requiring human review.
5.8 Data binding and integrity verification
5.8.1 Purpose
The data binding mechanism ensures that a PCT token is cryptographically bound to the specific data payload it was issued to govern. A verifier receiving a PCT token and a data payload can confirm:
- That the data has not been modified since the token was issued
- That the token has not been detached from its original data and reattached to a different payload
- That the token's claims apply to the data presented, and no other data
5.8.2 Canonicalisation requirement
To ensure consistent and reproducible hash values across different systems and implementations, the data payload MUST be serialised into a canonical form before hashing. The canonical form is defined as follows:
- For JSON payloads: RFC 8785 JSON Canonicalisation Scheme (JCS). All keys MUST be sorted lexicographically. All whitespace outside string values MUST be removed. Unicode characters MUST be encoded consistently per RFC 8785.
- For binary payloads: The raw byte sequence as transmitted, with no transformation applied.
- For structured data in other formats (CSV, XML, etc.): The implementation MUST document the specific canonicalisation method applied in the
data_formatfield and MUST apply it consistently across all issuance and verification operations.
Failure to use a canonical form risks hash verification failures caused by insignificant formatting differences rather than genuine data modification. This would undermine the utility of the binding mechanism and MUST be avoided.
5.8.3 Large dataset handling — Merkle tree hashing
For large datasets where computing a hash of the entire payload at every verification event is computationally impractical, implementations MAY use a Merkle tree hash structure. In this case:
- The data payload is divided into chunks of a consistent, implementation-defined size
- Each chunk is hashed individually using the algorithm specified in
hash_algorithm - The hashes are combined into a Merkle tree and the root hash is stored in
data_hash - The field
hash_scopeMUST be set tomerkle_root - The chunk size and tree construction method MUST be documented in the implementation's conformance statement
Merkle tree hashing allows individual chunks of a large dataset to be verified independently without requiring the entire dataset to be re-hashed, which is particularly valuable in streaming and pipeline processing scenarios.
5.8.4 Token issuance with data binding
When issuing a PCT token with data binding, the issuer MUST:
- Serialise the data payload into its canonical form as defined in Section 5.8.2
- Compute the hash of the canonical form using the algorithm specified in
hash_algorithm - Set
data_hashto the Base64url-encoded hash value - Set
hash_algorithmto the algorithm identifier - Set
hash_scopetofull_payloadormerkle_rootas appropriate - Include all data binding fields in the PCT payload before signing
- Sign the complete payload including the data binding fields using the signing mechanism defined in Section 6
The data binding fields are part of the signed payload and are therefore protected by the token signature. Any modification to data_hash, hash_algorithm, or hash_scope after signing will cause signature verification to fail.
5.8.5 Verification of data binding
When verifying a PCT token and its associated data payload, the verifier MUST:
- Verify the token signature as defined in Section 6
- Extract
data_hash,hash_algorithm, andhash_scopefrom the verified payload - Serialise the received data payload into its canonical form using the same method as the issuer
- Compute the hash of the canonical form using the algorithm identified in
hash_algorithm - Compare the computed hash with the value in
data_hash - If the hashes do not match, the verification MUST fail and the data MUST NOT be processed under the claims in the token
- If the hashes match, the verifier MAY proceed to evaluate the token's claims
A verification failure at step 6 indicates one of the following conditions:
- The data payload has been modified since the token was issued
- The token has been detached from its original data and presented with a different payload
- The canonical serialisation method used by the verifier differs from that used by the issuer (implementation error)
In all cases, processing MUST be halted and the event MUST be recorded in the audit log as a data integrity failure.
5.8.6 Legitimate data transformation
Some processing operations permitted by a PCT token may materially change the data payload, for example anonymisation, aggregation, pseudonymisation, or format conversion. Such transformations will invalidate the original data binding because the transformed data will produce a different hash.
Where a permitted transformation produces a materially different data payload, the following rules apply:
- A new PCT token MUST be issued for the transformed payload, with a new
data_hashcomputed from the transformed data in its canonical form - The new token SHOULD reference the original token's
pct_idin aderived_fromfield to maintain the audit chain - The original token SHOULD be explicitly deprecated by the issuer
- The transformation event MUST be recorded in the audit log, referencing both the original and new token identifiers
Minor transformations that do not change the logical content of the data, such as re-encoding from JSON to CBOR while preserving all field values, require re-issuance only if the canonical serialisation produces a different byte sequence. Implementations SHOULD test canonical equivalence before determining whether re-issuance is required.
5.8.7 Algorithm selection and deprecation
Implementations MUST use one of the permitted hash algorithms listed in the hash_algorithm field definition. The following algorithms are explicitly prohibited:
md5: Vulnerable to collision attacks. MUST NOT be used.sha-1: Vulnerable to collision attacks. MUST NOT be used.
The RECOMMENDED algorithm is sha-256. Implementations requiring additional collision resistance MAY use sha-384 or sha-512.
The list of permitted algorithms will be reviewed with each major version of the PCT specification. Implementations SHOULD be designed to support algorithm agility, meaning the ability to update the hashing algorithm without requiring a full re-implementation of the data binding mechanism.
5.9 The ai_context object
Required when subject_type is ai_interaction. This object addresses the specific obligations arising when personal or sensitive data is used in connection with an AI model.
| Field | Type | Required | Description |
|---|---|---|---|
model_id | string | REQUIRED | Identifier for the AI model being invoked. |
model_region | string (ISO 3166-1 alpha-2) | REQUIRED | The jurisdiction in which the model will process the data. |
risk_tier | enum | REQUIRED | AI risk classification under the EU AI Act or equivalent. Permitted values: minimal, limited, high, unacceptable. |
prohibited_use_check | boolean | REQUIRED | Attests that the intended use has been checked against the list of prohibited AI applications under applicable law. Must be true to permit use. |
human_oversight_required | boolean | OPTIONAL | Indicates whether human review of the AI output is required before any decision is actioned. |
training_data_flag | boolean | OPTIONAL | Set to true if the data may be used to train, fine-tune, or evaluate the model. |
output_retention_permitted | boolean | OPTIONAL | Whether AI-generated outputs derived from this data may be retained. |
conformity_assessment_ref | string (URI) | OPTIONAL | For high-risk AI systems, reference to the conformity assessment documentation. |