Version: 0.1 (Current)

5. Claims Schema

5.1 Structure overview

A PCT consists of three components, following the JWT convention:

Header. Metadata about the token itself: the signing algorithm, key identifier, and PCT specification version.
Payload. The structured claims object encoding the data obligations.
Signature. The cryptographic signature over the header and payload.

When serialised for transmission, the PCT must be encoded as a Base64URL-encoded string of the form header.payload.signature, consistent with RFC 7519 compact serialisation. Human-readable JSON representations may be used for documentation, debugging, and audit log storage.

5.2 Header fields

Field	Type	Required	Description
`alg`	string	REQUIRED	Signing algorithm. Must be `RS256` (RSA + SHA-256) or `HS256` (HMAC + SHA-256). RS256 is recommended for multi-party deployments.
`kid`	string	REQUIRED	Key identifier. A reference to the signing key used, enabling key rotation without token invalidation.
`typ`	string	REQUIRED	Token type. Must be the literal string `PCT`.
`pct_version`	string	REQUIRED	The version of this specification the token conforms to. For this version, must be `0.1`.

5.3 Core payload fields

Field	Type	Required	Description
`pct_id`	string (UUID v4)	REQUIRED	Globally unique identifier for this PCT instance. Must be a UUID v4.
`issued_at`	integer (Unix epoch)	REQUIRED	Timestamp at which the PCT was issued, in seconds since Unix epoch (UTC).
`valid_from`	integer (Unix epoch)	REQUIRED	Timestamp from which the PCT is valid. May equal `issued_at`.
`expires_at`	integer (Unix epoch)	REQUIRED	Timestamp after which the PCT is no longer valid. Verifiers must reject expired PCTs.
`issuer`	string (URI)	REQUIRED	URI identifying the issuing entity. Should be a stable, resolvable identifier.
`subject_id`	string	REQUIRED	Identifier for the dataset, data flow, or processing subject this PCT is attached to.
`subject_type`	enum	REQUIRED	Category of subject. Permitted values: `dataset`, `data_flow`, `api_request`, `ai_interaction`, `transfer`.
`data_origin`	string (ISO 3166-1 alpha-2)	REQUIRED	Two-letter country code of the jurisdiction where the data was originally collected.
`data_categories`	array of enum	REQUIRED	The categories of data present. Permitted values include: `personal`, `sensitive`, `special_category`, `health`, `financial`, `biometric`, `genetic`, `criminal`, `communications`, `children`, `pseudonymised`, `anonymised`.
`lawful_basis`	object	REQUIRED	The legal ground(s) for processing. See Section 5.4.
`allowed_purposes`	array of string	REQUIRED	The purposes for which the data may be used. Values should be drawn from a controlled vocabulary (see Appendix B) or expressed as URIs.
`consent_status`	boolean	CONDITIONAL	Required when `lawful_basis` includes consent. True indicates valid, informed, current consent exists.
`consent_scope`	array of string	CONDITIONAL	Required when `consent_status` is true. The specific purposes covered by the consent, consistent with `allowed_purposes`.
`consent_record_ref`	string (URI)	OPTIONAL	Reference to an external consent record, enabling verification against the system of record.
`jurisdiction_rules`	object	REQUIRED	Constraints on where the data may be processed. See Section 5.5.
`transfer_restrictions`	object	CONDITIONAL	Required when `subject_type` is `transfer` or when cross-border processing is anticipated. See Section 5.6.
`retention_limit`	string (ISO 8601 duration)	OPTIONAL	The maximum period for which the data may be retained (e.g. `P2Y` for two years).
`automated_decision_flag`	boolean	OPTIONAL	Set to true if the data may be used in automated decision-making subject to Article 22 GDPR or equivalent.
`data_hash`	string	REQUIRED	Cryptographic hash of the canonical serialised form of the data payload at the time of token issuance. See Section 5.8 for canonicalisation requirements.
`hash_algorithm`	enum	REQUIRED	Hashing algorithm used to produce `data_hash`. Permitted values: `sha-256`, `sha-384`, `sha-512`. `sha-256` is RECOMMENDED. MD5 and SHA-1 are explicitly prohibited.
`hash_scope`	enum	REQUIRED	Defines what was hashed. Permitted values: `full_payload` (entire data payload hashed as a single unit), `merkle_root` (Merkle tree hash structure; see Section 5.8.3).
`data_format`	string	OPTIONAL	MIME type or format descriptor of the data payload at the time of hashing (e.g. `application/json`, `text/csv`, `application/octet-stream`). Assists verifiers in reproducing the canonical form.
`ai_context`	object	CONDITIONAL	Required when `subject_type` is `ai_interaction`. See Section 5.9.
`extensions`	object	OPTIONAL	Extension namespace claims. See Section 5.7.

5.4 The lawful_basis object

The lawful_basis object must contain at least one basis. Where multiple bases apply, all must be listed.

Field	Type	Required	Description
`bases`	array of enum	REQUIRED	The applicable lawful basis or bases. Permitted values: `consent`, `contract`, `legal_obligation`, `vital_interests`, `public_task`, `legitimate_interests`, `not_applicable` (for anonymised data).
`legitimate_interests_assessment_ref`	string (URI)	CONDITIONAL	Required when `bases` includes `legitimate_interests`. Reference to the Legitimate Interests Assessment (LIA) on record.
`legal_obligation_ref`	string (URI)	CONDITIONAL	Required when `bases` includes `legal_obligation`. Reference to the specific legal instrument creating the obligation.
`framework`	string	OPTIONAL	The regulatory framework under which the lawful basis is assessed (e.g. `GDPR`, `UK_GDPR`, `HIPAA`). Where omitted, GDPR is assumed.

5.5 The jurisdiction_rules object

Field	Type	Required	Description
`permitted_regions`	array of string (ISO 3166-1 alpha-2)	REQUIRED	Country codes in which processing is permitted. Use `*` to indicate no restriction, though this is discouraged for sensitive data.
`restricted_regions`	array of string (ISO 3166-1 alpha-2)	OPTIONAL	Country codes in which processing is explicitly prohibited, overriding any general permission.
`residency_required`	boolean	OPTIONAL	If true, data must remain within `permitted_regions` at all times and may not be temporarily processed elsewhere.
`sovereignty_framework`	string	OPTIONAL	Reference to a sovereignty or adequacy framework under which processing is permitted (e.g. `GDPR_adequacy`, `UK_adequacy`, `APEC_CBPR`).

5.6 The transfer_restrictions object

Field	Type	Required	Description
`permitted_destinations`	array of string (ISO 3166-1 alpha-2)	REQUIRED	Country codes to which transfer is permitted.
`transfer_mechanism`	enum	REQUIRED	The legal mechanism authorising the transfer. Permitted values: `adequacy_decision`, `standard_contractual_clauses`, `binding_corporate_rules`, `derogation`, `not_required`, `other`.
`transfer_mechanism_ref`	string (URI)	OPTIONAL	Reference to the specific instrument (e.g. the executed SCCs) authorising the transfer.
`onward_transfer_permitted`	boolean	OPTIONAL	Whether the recipient may further transfer the data to a third party.

5.7 Extension namespaces

Extension claims are added to the extensions object using the prefix convention x-{framework}:{field}. Extension keys must not conflict with core schema field names. The following extension namespaces are defined in this version:

x-hipaa: Claims addressing HIPAA-specific obligations (minimum_necessary, phi_flag, permitted_disclosure, baa_in_place).
x-dora: Claims addressing DORA-specific obligations (ict_risk_classification, third_party_flag, incident_trigger).
x-duaa: Claims addressing UK Data Use and Access Act obligations (access_condition, trusted_research_env).
x-pecr: Claims addressing PECR / ePrivacy obligations (tracking_consent, comms_data_flag, marketing_permission).
x-ai-act: Claims addressing EU AI Act obligations (risk_tier, human_oversight_required, prohibited_use_check, training_data_flag, conformity_assessment_ref).

Any implementer may define additional extension namespaces using the x-{label}: prefix. Extension namespaces not defined in this specification must be documented publicly by the defining party. Verifiers encountering unknown extension namespaces must not fail silently — they must either evaluate the extension claim or flag the PCT as requiring human review.

5.8 Data binding and integrity verification

5.8.1 Purpose

The data binding mechanism ensures that a PCT token is cryptographically bound to the specific data payload it was issued to govern. A verifier receiving a PCT token and a data payload can confirm:

That the data has not been modified since the token was issued
That the token has not been detached from its original data and reattached to a different payload
That the token's claims apply to the data presented, and no other data

5.8.2 Canonicalisation requirement

To ensure consistent and reproducible hash values across different systems and implementations, the data payload MUST be serialised into a canonical form before hashing. The canonical form is defined as follows:

For JSON payloads: RFC 8785 JSON Canonicalisation Scheme (JCS). All keys MUST be sorted lexicographically. All whitespace outside string values MUST be removed. Unicode characters MUST be encoded consistently per RFC 8785.
For binary payloads: The raw byte sequence as transmitted, with no transformation applied.
For structured data in other formats (CSV, XML, etc.): The implementation MUST document the specific canonicalisation method applied in the data_format field and MUST apply it consistently across all issuance and verification operations.

Failure to use a canonical form risks hash verification failures caused by insignificant formatting differences rather than genuine data modification. This would undermine the utility of the binding mechanism and MUST be avoided.

5.8.3 Large dataset handling — Merkle tree hashing

For large datasets where computing a hash of the entire payload at every verification event is computationally impractical, implementations MAY use a Merkle tree hash structure. In this case:

The data payload is divided into chunks of a consistent, implementation-defined size
Each chunk is hashed individually using the algorithm specified in hash_algorithm
The hashes are combined into a Merkle tree and the root hash is stored in data_hash
The field hash_scope MUST be set to merkle_root
The chunk size and tree construction method MUST be documented in the implementation's conformance statement

Merkle tree hashing allows individual chunks of a large dataset to be verified independently without requiring the entire dataset to be re-hashed, which is particularly valuable in streaming and pipeline processing scenarios.

5.8.4 Token issuance with data binding

When issuing a PCT token with data binding, the issuer MUST:

Serialise the data payload into its canonical form as defined in Section 5.8.2
Compute the hash of the canonical form using the algorithm specified in hash_algorithm
Set data_hash to the Base64url-encoded hash value
Set hash_algorithm to the algorithm identifier
Set hash_scope to full_payload or merkle_root as appropriate
Include all data binding fields in the PCT payload before signing
Sign the complete payload including the data binding fields using the signing mechanism defined in Section 6

The data binding fields are part of the signed payload and are therefore protected by the token signature. Any modification to data_hash, hash_algorithm, or hash_scope after signing will cause signature verification to fail.

5.8.5 Verification of data binding

When verifying a PCT token and its associated data payload, the verifier MUST:

Verify the token signature as defined in Section 6
Extract data_hash, hash_algorithm, and hash_scope from the verified payload
Serialise the received data payload into its canonical form using the same method as the issuer
Compute the hash of the canonical form using the algorithm identified in hash_algorithm
Compare the computed hash with the value in data_hash
If the hashes do not match, the verification MUST fail and the data MUST NOT be processed under the claims in the token
If the hashes match, the verifier MAY proceed to evaluate the token's claims

A verification failure at step 6 indicates one of the following conditions:

The data payload has been modified since the token was issued
The token has been detached from its original data and presented with a different payload
The canonical serialisation method used by the verifier differs from that used by the issuer (implementation error)

In all cases, processing MUST be halted and the event MUST be recorded in the audit log as a data integrity failure.

5.8.6 Legitimate data transformation

Some processing operations permitted by a PCT token may materially change the data payload, for example anonymisation, aggregation, pseudonymisation, or format conversion. Such transformations will invalidate the original data binding because the transformed data will produce a different hash.

Where a permitted transformation produces a materially different data payload, the following rules apply:

A new PCT token MUST be issued for the transformed payload, with a new data_hash computed from the transformed data in its canonical form
The new token SHOULD reference the original token's pct_id in a derived_from field to maintain the audit chain
The original token SHOULD be explicitly deprecated by the issuer
The transformation event MUST be recorded in the audit log, referencing both the original and new token identifiers

Minor transformations that do not change the logical content of the data, such as re-encoding from JSON to CBOR while preserving all field values, require re-issuance only if the canonical serialisation produces a different byte sequence. Implementations SHOULD test canonical equivalence before determining whether re-issuance is required.

5.8.7 Algorithm selection and deprecation

Implementations MUST use one of the permitted hash algorithms listed in the hash_algorithm field definition. The following algorithms are explicitly prohibited:

md5: Vulnerable to collision attacks. MUST NOT be used.
sha-1: Vulnerable to collision attacks. MUST NOT be used.

The RECOMMENDED algorithm is sha-256. Implementations requiring additional collision resistance MAY use sha-384 or sha-512.

The list of permitted algorithms will be reviewed with each major version of the PCT specification. Implementations SHOULD be designed to support algorithm agility, meaning the ability to update the hashing algorithm without requiring a full re-implementation of the data binding mechanism.

5.9 The ai_context object

Required when subject_type is ai_interaction. This object addresses the specific obligations arising when personal or sensitive data is used in connection with an AI model.

Field	Type	Required	Description
`model_id`	string	REQUIRED	Identifier for the AI model being invoked.
`model_region`	string (ISO 3166-1 alpha-2)	REQUIRED	The jurisdiction in which the model will process the data.
`risk_tier`	enum	REQUIRED	AI risk classification under the EU AI Act or equivalent. Permitted values: `minimal`, `limited`, `high`, `unacceptable`.
`prohibited_use_check`	boolean	REQUIRED	Attests that the intended use has been checked against the list of prohibited AI applications under applicable law. Must be `true` to permit use.
`human_oversight_required`	boolean	OPTIONAL	Indicates whether human review of the AI output is required before any decision is actioned.
`training_data_flag`	boolean	OPTIONAL	Set to true if the data may be used to train, fine-tune, or evaluate the model.
`output_retention_permitted`	boolean	OPTIONAL	Whether AI-generated outputs derived from this data may be retained.
`conformity_assessment_ref`	string (URI)	OPTIONAL	For high-risk AI systems, reference to the conformity assessment documentation.

5.1 Structure overview​

5.2 Header fields​

5.3 Core payload fields​

5.4 The lawful_basis object​

5.5 The jurisdiction_rules object​

5.6 The transfer_restrictions object​

5.7 Extension namespaces​

5.8 Data binding and integrity verification​

5.8.1 Purpose​

5.8.2 Canonicalisation requirement​

5.8.3 Large dataset handling — Merkle tree hashing​

5.8.4 Token issuance with data binding​

5.8.5 Verification of data binding​

5.8.6 Legitimate data transformation​

5.8.7 Algorithm selection and deprecation​

5.9 The ai_context object​