Skip to main content
Version: 0.1 (Current)

5. Claims Schema

5.1 Structure overview

A PCT consists of three components, following the JWT convention:

  • Header. Metadata about the token itself: the signing algorithm, key identifier, and PCT specification version.
  • Payload. The structured claims object encoding the data obligations.
  • Signature. The cryptographic signature over the header and payload.

When serialised for transmission, the PCT must be encoded as a Base64URL-encoded string of the form header.payload.signature, consistent with RFC 7519 compact serialisation. Human-readable JSON representations may be used for documentation, debugging, and audit log storage.

5.2 Header fields

FieldTypeRequiredDescription
algstringREQUIREDSigning algorithm. Must be RS256 (RSA + SHA-256) or HS256 (HMAC + SHA-256). RS256 is recommended for multi-party deployments.
kidstringREQUIREDKey identifier. A reference to the signing key used, enabling key rotation without token invalidation.
typstringREQUIREDToken type. Must be the literal string PCT.
pct_versionstringREQUIREDThe version of this specification the token conforms to. For this version, must be 0.1.

5.3 Core payload fields

FieldTypeRequiredDescription
pct_idstring (UUID v4)REQUIREDGlobally unique identifier for this PCT instance. Must be a UUID v4.
issued_atinteger (Unix epoch)REQUIREDTimestamp at which the PCT was issued, in seconds since Unix epoch (UTC).
valid_frominteger (Unix epoch)REQUIREDTimestamp from which the PCT is valid. May equal issued_at.
expires_atinteger (Unix epoch)REQUIREDTimestamp after which the PCT is no longer valid. Verifiers must reject expired PCTs.
issuerstring (URI)REQUIREDURI identifying the issuing entity. Should be a stable, resolvable identifier.
subject_idstringREQUIREDIdentifier for the dataset, data flow, or processing subject this PCT is attached to.
subject_typeenumREQUIREDCategory of subject. Permitted values: dataset, data_flow, api_request, ai_interaction, transfer.
data_originstring (ISO 3166-1 alpha-2)REQUIREDTwo-letter country code of the jurisdiction where the data was originally collected.
data_categoriesarray of enumREQUIREDThe categories of data present. Permitted values include: personal, sensitive, special_category, health, financial, biometric, genetic, criminal, communications, children, pseudonymised, anonymised.
lawful_basisobjectREQUIREDThe legal ground(s) for processing. See Section 5.4.
allowed_purposesarray of stringREQUIREDThe purposes for which the data may be used. Values should be drawn from a controlled vocabulary (see Appendix B) or expressed as URIs.
consent_statusbooleanCONDITIONALRequired when lawful_basis includes consent. True indicates valid, informed, current consent exists.
consent_scopearray of stringCONDITIONALRequired when consent_status is true. The specific purposes covered by the consent, consistent with allowed_purposes.
consent_record_refstring (URI)OPTIONALReference to an external consent record, enabling verification against the system of record.
jurisdiction_rulesobjectREQUIREDConstraints on where the data may be processed. See Section 5.5.
transfer_restrictionsobjectCONDITIONALRequired when subject_type is transfer or when cross-border processing is anticipated. See Section 5.6.
retention_limitstring (ISO 8601 duration)OPTIONALThe maximum period for which the data may be retained (e.g. P2Y for two years).
automated_decision_flagbooleanOPTIONALSet to true if the data may be used in automated decision-making subject to Article 22 GDPR or equivalent.
data_hashstringREQUIREDCryptographic hash of the canonical serialised form of the data payload at the time of token issuance. See Section 5.8 for canonicalisation requirements.
hash_algorithmenumREQUIREDHashing algorithm used to produce data_hash. Permitted values: sha-256, sha-384, sha-512. sha-256 is RECOMMENDED. MD5 and SHA-1 are explicitly prohibited.
hash_scopeenumREQUIREDDefines what was hashed. Permitted values: full_payload (entire data payload hashed as a single unit), merkle_root (Merkle tree hash structure; see Section 5.8.3).
data_formatstringOPTIONALMIME type or format descriptor of the data payload at the time of hashing (e.g. application/json, text/csv, application/octet-stream). Assists verifiers in reproducing the canonical form.
ai_contextobjectCONDITIONALRequired when subject_type is ai_interaction. See Section 5.9.
extensionsobjectOPTIONALExtension namespace claims. See Section 5.7.

5.4 The lawful_basis object

The lawful_basis object must contain at least one basis. Where multiple bases apply, all must be listed.

FieldTypeRequiredDescription
basesarray of enumREQUIREDThe applicable lawful basis or bases. Permitted values: consent, contract, legal_obligation, vital_interests, public_task, legitimate_interests, not_applicable (for anonymised data).
legitimate_interests_assessment_refstring (URI)CONDITIONALRequired when bases includes legitimate_interests. Reference to the Legitimate Interests Assessment (LIA) on record.
legal_obligation_refstring (URI)CONDITIONALRequired when bases includes legal_obligation. Reference to the specific legal instrument creating the obligation.
frameworkstringOPTIONALThe regulatory framework under which the lawful basis is assessed (e.g. GDPR, UK_GDPR, HIPAA). Where omitted, GDPR is assumed.

5.5 The jurisdiction_rules object

FieldTypeRequiredDescription
permitted_regionsarray of string (ISO 3166-1 alpha-2)REQUIREDCountry codes in which processing is permitted. Use * to indicate no restriction, though this is discouraged for sensitive data.
restricted_regionsarray of string (ISO 3166-1 alpha-2)OPTIONALCountry codes in which processing is explicitly prohibited, overriding any general permission.
residency_requiredbooleanOPTIONALIf true, data must remain within permitted_regions at all times and may not be temporarily processed elsewhere.
sovereignty_frameworkstringOPTIONALReference to a sovereignty or adequacy framework under which processing is permitted (e.g. GDPR_adequacy, UK_adequacy, APEC_CBPR).

5.6 The transfer_restrictions object

FieldTypeRequiredDescription
permitted_destinationsarray of string (ISO 3166-1 alpha-2)REQUIREDCountry codes to which transfer is permitted.
transfer_mechanismenumREQUIREDThe legal mechanism authorising the transfer. Permitted values: adequacy_decision, standard_contractual_clauses, binding_corporate_rules, derogation, not_required, other.
transfer_mechanism_refstring (URI)OPTIONALReference to the specific instrument (e.g. the executed SCCs) authorising the transfer.
onward_transfer_permittedbooleanOPTIONALWhether the recipient may further transfer the data to a third party.

5.7 Extension namespaces

Extension claims are added to the extensions object using the prefix convention x-{framework}:{field}. Extension keys must not conflict with core schema field names. The following extension namespaces are defined in this version:

  • x-hipaa: Claims addressing HIPAA-specific obligations (minimum_necessary, phi_flag, permitted_disclosure, baa_in_place).
  • x-dora: Claims addressing DORA-specific obligations (ict_risk_classification, third_party_flag, incident_trigger).
  • x-duaa: Claims addressing UK Data Use and Access Act obligations (access_condition, trusted_research_env).
  • x-pecr: Claims addressing PECR / ePrivacy obligations (tracking_consent, comms_data_flag, marketing_permission).
  • x-ai-act: Claims addressing EU AI Act obligations (risk_tier, human_oversight_required, prohibited_use_check, training_data_flag, conformity_assessment_ref).

Any implementer may define additional extension namespaces using the x-{label}: prefix. Extension namespaces not defined in this specification must be documented publicly by the defining party. Verifiers encountering unknown extension namespaces must not fail silently — they must either evaluate the extension claim or flag the PCT as requiring human review.

5.8 Data binding and integrity verification

5.8.1 Purpose

The data binding mechanism ensures that a PCT token is cryptographically bound to the specific data payload it was issued to govern. A verifier receiving a PCT token and a data payload can confirm:

  1. That the data has not been modified since the token was issued
  2. That the token has not been detached from its original data and reattached to a different payload
  3. That the token's claims apply to the data presented, and no other data

5.8.2 Canonicalisation requirement

To ensure consistent and reproducible hash values across different systems and implementations, the data payload MUST be serialised into a canonical form before hashing. The canonical form is defined as follows:

  • For JSON payloads: RFC 8785 JSON Canonicalisation Scheme (JCS). All keys MUST be sorted lexicographically. All whitespace outside string values MUST be removed. Unicode characters MUST be encoded consistently per RFC 8785.
  • For binary payloads: The raw byte sequence as transmitted, with no transformation applied.
  • For structured data in other formats (CSV, XML, etc.): The implementation MUST document the specific canonicalisation method applied in the data_format field and MUST apply it consistently across all issuance and verification operations.

Failure to use a canonical form risks hash verification failures caused by insignificant formatting differences rather than genuine data modification. This would undermine the utility of the binding mechanism and MUST be avoided.

5.8.3 Large dataset handling — Merkle tree hashing

For large datasets where computing a hash of the entire payload at every verification event is computationally impractical, implementations MAY use a Merkle tree hash structure. In this case:

  • The data payload is divided into chunks of a consistent, implementation-defined size
  • Each chunk is hashed individually using the algorithm specified in hash_algorithm
  • The hashes are combined into a Merkle tree and the root hash is stored in data_hash
  • The field hash_scope MUST be set to merkle_root
  • The chunk size and tree construction method MUST be documented in the implementation's conformance statement

Merkle tree hashing allows individual chunks of a large dataset to be verified independently without requiring the entire dataset to be re-hashed, which is particularly valuable in streaming and pipeline processing scenarios.

5.8.4 Token issuance with data binding

When issuing a PCT token with data binding, the issuer MUST:

  1. Serialise the data payload into its canonical form as defined in Section 5.8.2
  2. Compute the hash of the canonical form using the algorithm specified in hash_algorithm
  3. Set data_hash to the Base64url-encoded hash value
  4. Set hash_algorithm to the algorithm identifier
  5. Set hash_scope to full_payload or merkle_root as appropriate
  6. Include all data binding fields in the PCT payload before signing
  7. Sign the complete payload including the data binding fields using the signing mechanism defined in Section 6

The data binding fields are part of the signed payload and are therefore protected by the token signature. Any modification to data_hash, hash_algorithm, or hash_scope after signing will cause signature verification to fail.

5.8.5 Verification of data binding

When verifying a PCT token and its associated data payload, the verifier MUST:

  1. Verify the token signature as defined in Section 6
  2. Extract data_hash, hash_algorithm, and hash_scope from the verified payload
  3. Serialise the received data payload into its canonical form using the same method as the issuer
  4. Compute the hash of the canonical form using the algorithm identified in hash_algorithm
  5. Compare the computed hash with the value in data_hash
  6. If the hashes do not match, the verification MUST fail and the data MUST NOT be processed under the claims in the token
  7. If the hashes match, the verifier MAY proceed to evaluate the token's claims

A verification failure at step 6 indicates one of the following conditions:

  • The data payload has been modified since the token was issued
  • The token has been detached from its original data and presented with a different payload
  • The canonical serialisation method used by the verifier differs from that used by the issuer (implementation error)

In all cases, processing MUST be halted and the event MUST be recorded in the audit log as a data integrity failure.

5.8.6 Legitimate data transformation

Some processing operations permitted by a PCT token may materially change the data payload, for example anonymisation, aggregation, pseudonymisation, or format conversion. Such transformations will invalidate the original data binding because the transformed data will produce a different hash.

Where a permitted transformation produces a materially different data payload, the following rules apply:

  1. A new PCT token MUST be issued for the transformed payload, with a new data_hash computed from the transformed data in its canonical form
  2. The new token SHOULD reference the original token's pct_id in a derived_from field to maintain the audit chain
  3. The original token SHOULD be explicitly deprecated by the issuer
  4. The transformation event MUST be recorded in the audit log, referencing both the original and new token identifiers

Minor transformations that do not change the logical content of the data, such as re-encoding from JSON to CBOR while preserving all field values, require re-issuance only if the canonical serialisation produces a different byte sequence. Implementations SHOULD test canonical equivalence before determining whether re-issuance is required.

5.8.7 Algorithm selection and deprecation

Implementations MUST use one of the permitted hash algorithms listed in the hash_algorithm field definition. The following algorithms are explicitly prohibited:

  • md5: Vulnerable to collision attacks. MUST NOT be used.
  • sha-1: Vulnerable to collision attacks. MUST NOT be used.

The RECOMMENDED algorithm is sha-256. Implementations requiring additional collision resistance MAY use sha-384 or sha-512.

The list of permitted algorithms will be reviewed with each major version of the PCT specification. Implementations SHOULD be designed to support algorithm agility, meaning the ability to update the hashing algorithm without requiring a full re-implementation of the data binding mechanism.

5.9 The ai_context object

Required when subject_type is ai_interaction. This object addresses the specific obligations arising when personal or sensitive data is used in connection with an AI model.

FieldTypeRequiredDescription
model_idstringREQUIREDIdentifier for the AI model being invoked.
model_regionstring (ISO 3166-1 alpha-2)REQUIREDThe jurisdiction in which the model will process the data.
risk_tierenumREQUIREDAI risk classification under the EU AI Act or equivalent. Permitted values: minimal, limited, high, unacceptable.
prohibited_use_checkbooleanREQUIREDAttests that the intended use has been checked against the list of prohibited AI applications under applicable law. Must be true to permit use.
human_oversight_requiredbooleanOPTIONALIndicates whether human review of the AI output is required before any decision is actioned.
training_data_flagbooleanOPTIONALSet to true if the data may be used to train, fine-tune, or evaluate the model.
output_retention_permittedbooleanOPTIONALWhether AI-generated outputs derived from this data may be retained.
conformity_assessment_refstring (URI)OPTIONALFor high-risk AI systems, reference to the conformity assessment documentation.