DGOV DTT Architecture Decision Records

Architecture records and current decision state to support infrastructure and platform operations for Office of Digital Government (DGOV) Digital Transformation and Technology Unit (DTT).

Supporting training material is available at the DGOV Technical - DevSecOps Induction (guided by the WA Cyber Security Policy).

The Architecture Principles guide all decisions documented here.

Structure

This project uses mdBook to generate documentation from markdown files:

ADRs are organized in directories by domain (development/, operations/, security/)
Navigation is defined in SUMMARY.md
Build output goes to book/ directory

Quick Start

just setup    # One-time setup
just serve    # Preview locally (port 8080)

Run just to see all available commands.

View as PDF | Browse online

Development

First time setup requires installing tools - run just setup for automated installation.

Contributing

New ADRs follow a structured format documenting the context (problem), decision (solution), and consequences (risks).

Create new markdown file in appropriate directory (e.g., security/019-new-topic.md)
Add entry to SUMMARY.md in the correct section
Follow the ADR template structure

See the Contributing Guide for complete workflow and quality standards.

Architecture Principles

Status: Accepted | Date: 2025-03-07

1. Establish secure foundations

Integrate security practices from the outset, and throughout the design, development and deployment of products and services, per the ACSC Foundations for modern defensible architecture.

Use authoritative data sources to ensure data consistency, integrity, and quality. Embed data management and governance practices, including information classification, records management, and Privacy and Responsible Information Sharing, throughout information lifecycles.

3. Prioritise user experience

Apply user-centered design principles to simplify tasks and establish intuitive mappings between user intentions and system responses. Involve users throughout design and development to iteratively evaluate and refine product goals and requirements.

4. Preference tried and tested approaches

Adopt sustainable opensource software, and mature managed services where capabilities closely match business needs. When necessary, bespoke service development should be led by internal technical capabilities to ensure appropriate risk ownership. Bespoke software should preference open standards and code to avoid vendor lock-in.

5. Embrace change, release early, release often

Design services as loosely coupled modules with clear boundaries and responsibilities. Release often with tight feedback loops to test assumptions, learn, and iterate. Enable frequent and predictable high-impact changes (your service does not deliver or add value until it is in the hands of users) per the CNCF Cloud Native Definition.

6. Default to open

Encourage transparency, inclusivity, adaptability, collaboration, and community by defaulting to permissive licensing of code and artifacts developed with public funding.

ADR 001: Application Isolation

Status: Accepted | Date: 2025-02-17

Context

Not isolating applications and environments can lead to significant security risks. The risk of lateral movement means threats of vulnerability exposure of a single application can compromise other applications or the entire environment. This lack of isolation can enable the spread of malware, unauthorised access, and data breaches.

Decision

To mitigate the risks associated with shared environments, all applications and environments should isolate by default. This isolation can achieve through the following approaches:

Dedicated Accounts: Use separate cloud accounts / resource groups for different environments (for example, development, testing, production) to ensure complete isolation of resources and data.
Kubernetes Clusters: Deploy separate Kubernetes clusters for different applications or environments to isolate workloads and manage resources independently.
Kubernetes Namespaces: Within a Kubernetes cluster, use namespaces to logically separate different applications or environments, providing a level of isolation for network traffic, resource quotas, and access controls.

The preferred approach for isolation should drive by data sensitivity and product boundaries.

Consequences

Benefits:

Network microsegmentation preventing lateral movement
Simplified incident containment and forensic analysis
Compliance with regulatory isolation requirements

Risks if not implemented:

Single vulnerability compromising multiple applications
Difficult incident response across shared environments
Data breaches through unauthorised cross-system access

ADR 005: Secrets Management

Status: Accepted | Date: 2025-02-25

Context

Per the Open Web Application Security Project (OWASP) Secrets Management Cheat Sheet:

Organizations face a growing need to centralize the storage, provisioning, auditing, rotation and management of secrets to control access to secrets and prevent them from leaking and compromising the organization. Often, services share the same secrets, which makes identifying the source of compromise or leak challenging.

To address these challenges, we need a standardised, auditable approach to managing and rotating secrets within our environments. Secrets should be accessed at runtime by workloads and should never be hard-coded or stored in plain text.

Decision

Use AWS Secrets Manager to store and manage secrets.

Secrets should be fetched and securely injected into AWS resources at runtime.
The secret rotation period (lifetime) must be captured in the system design.
- Rotate secrets automatically where possible, or ensure that a manual rotation process is documented and followed.
Use Identity and Access Management (IAM) policy statements to use least-privilege access to secrets.
ADR 002: AWS EKS for Cloud Workloads kubernetes workloads should use EKS Key Management Service (KMS) secrets encryption with namespace local secrets by default.
- If secrets need to be accessed by several clusters, use External Secrets Operator to synchronise them from the primary secret store in AWS Secrets Manager.

Consequences

Benefits:

Automated secret rotation reduces human error
Meets compliance and auditing requirements
Enhanced security through centralized management

Risks if not implemented:

Security exposure from manual secret handling
Operational overhead and error-prone processes
Non-compliance with security requirements

Trade-offs:

AWS vendor dependency may complicate future migrations

ADR 008: Email Authentication Protocols

Status: Accepted | Date: 2025-08-15

Context

Government email domains are prime targets for cybercriminals who exploit them for phishing attacks, business email compromise, and brand impersonation. Citizens and businesses expect government emails to be trustworthy, making email authentication critical for maintaining public confidence and preventing fraud.

Without proper email authentication, attackers can easily spoof government domains to conduct social engineering attacks, distribute malware, or harvest credentials from unsuspecting recipients.

References:

ACSC How to combat fake emails

Decision

Implement email authentication standards for all government domains:

Required Standards:

SPF: Publish records defining authorized mail servers with strict policies (“~all” or “-all”)
DKIM: Sign all outbound email with minimum 2048-bit RSA keys, rotate annually
DMARC: Progress from “p=none” to “p=reject” with subdomain policies and reporting
BIMI: Implement verified brand logos with Verified Mark Certificates (VMCs)

Implementation:

Monitor DNS records for tampering
Regular authentication testing and effectiveness reviews
Incident response procedures for authentication failures
Integration with email security gateways

Consequences

Benefits:

Automated email authentication blocking domain spoofing
Enhanced brand protection and citizen trust
Comprehensive threat visibility through DMARC reporting

Risks if not implemented:

Phishing attacks exploiting government domain reputation
Reduced email deliverability affecting citizen communications
Non-compliance with government security requirements

ADR 011: AI Tool Governance

Status: Accepted | Date: 2025-08-15

Context

Generative AI tool use for development and operation functions process sensitive data and can make automated decisions affecting security, privacy, and compliance. Without governance ensuring human oversight of AI decisions, these tools pose significant risks including unauthorized data exposure, biased decision-making, and compliance violations. The highest risk areas are AI tools that process sensitive data or make decisions without explicit human approval.

Current high-risk scenarios include:

Automated Decision-Making: AI tools making policy, approval, or resource allocation decisions without human review
Government Data Processing: Sensitive organisational data processed by offshore AI services
Uncontrolled AI Outputs: AI-generated content, code, or analysis used without human validation
Privacy Violations: Personal information processed by AI without appropriate consent or controls

References:

Decision

Implement mandatory human oversight for all AI tool usage with pre-approval for any AI tools that process organisational data or generate outputs used in official capacity.

Human Oversight Requirements:

All AI tools must ensure:

Human-in-the-Loop: No automated AI decisions without explicit human review and approval
Output Validation: All AI-generated content must be reviewed by qualified humans before use
Decision Accountability: Clear human responsibility for all AI-assisted decisions

Covered AI Tools:

Development and coding assistance tools
Content generation and writing assistants
Data analysis and business intelligence platforms

Implementation Examples:

Endorsed: Independently assured developer tooling with human code review processes for all generated code
Rejected: Automated bug fixing tools that merge pull requests or push releases without human review

Mandatory Requirements:

Any generative AI tools must not:

Automatically perform actions in customer facing environments or on production infrastructure.
Process sensitive data with third parties that are not governed by a formal contractual arrangement.
Automatically merge or release changes without human review and approval.

Technical Implementation:

AI tools must run in isolated or local environments (refer to ADR 001: Application Isolation) with minimal permissions
No network access to internal systems or databases
Technical guardrails to protect against exfiltration and malicious commands

Consequences

Benefits:

Ensures human accountability for all AI-assisted decisions
Maintains compliance with Privacy Act and data sovereignty requirements
Prevents automated actions in production environments without approval
Establishes clear audit trail for responsible AI usage

Risks if not implemented:

Unauthorized data exposure to offshore AI services
AI making critical decisions without human oversight
Compliance violations and regulatory breaches
Operational errors from unchecked AI outputs

ADR 012: Privileged Remote Access

Status: Accepted | Date: 2025-08-15

Context

Traditional privileged access methods using jump boxes, bastion hosts, and shared credentials create security risks through persistent network connections and broad administrative access. Modern cloud-native alternatives provide better security controls and audit capabilities for administrative tasks.

Decision

Replace traditional bastion hosts and jump boxes with cloud-native privileged access solutions:

Prohibited Methods:

Bastion hosts and jump boxes with persistent SSH access
Direct SSH/RDP access to production systems
Shared administrative credentials and keys
VPN-based administrative access

Required Access Methods:

Server Access: AWS Systems Manager Session Manager (replaces SSH to EC2)
Infrastructure Management: AWS CLI with temporary credentials (replaces persistent VPN)
Kubernetes Access: kubectl with IAM authentication (replaces cluster SSH)
Infrastructure Deployment: Terraform Cloud with audit trails (replaces manual deployment)

Access Controls:

Multi-factor authentication for all access
Time-limited sessions
Identity-based access through cloud IAM
Approval workflows for privileged access
Session recording and audit logging per ADR 007: Centralised Security Logging

Implementation:

All sessions initiated through APIs only
Short-lived credentials
Real-time monitoring and alerting
Integration with SIEM systems

Consequences

Benefits:

Zero-trust network access with session recording
Enhanced audit capabilities through centralised logging
Short-lived credential security reducing persistent threats

Risks if not implemented:

Unauthorised lateral movement across network systems
Prolonged security breaches from persistent access
Non-compliance with government zero-trust requirements

ADR 013: Identity Federation Standards

Status: Accepted | Date: 2025-08-15

Context

Applications need to integrate with multiple identity providers including jurisdiction citizen identity services, enterprise directories, and cloud identity platforms. Current approaches use inconsistent protocols (SAML, OIDC, proprietary) creating integration complexity and security inconsistencies.

Modern identity federation requires support for emerging standards like verifiable credentials while maintaining compatibility with legacy enterprise systems.

Decision

Standardize on OpenID Connect (OIDC) as the primary federation protocol for all new identity integrations, with SAML 2.0 support only for legacy systems that cannot support OIDC.

Protocol Standards:

Primary: OpenID Connect for modern identity providers and new integrations
Legacy Support: SAML 2.0 only when upstream providers require it and OIDC is unavailable
Security: Implement PKCE for OIDC public clients and proper token validation
Compliance: Support Digital ID Act 2024 requirements for jurisdiction identity services

Architecture Requirements:

Applications should integrate through managed identity platforms, not directly with identity providers
Separate privileged and standard user domains for clear administrative access isolation
Support multiple upstream identity providers per application
Maintain audit trails distinguishing privileged from standard user activities per ADR 007: Centralised Security Logging

Identity Federation Flow:

Emerging Standards:

Client applications can leverage emerging OIDC standards around verifiable credentials to simplify adoption of federated identity with jurisdiction providers

Implementation Requirements:

Implement fallback authentication mechanisms for critical systems
Choose identity platforms with high availability and data export capabilities
Maintain audit trails following ADR 007: Centralised Security Logging

Consequences

Benefits:

Consistent modern federation standard across all applications
Better security through OIDC’s improved token handling and PKCE support
Simplified integration with jurisdiction citizen identity services
Clear separation of administrative and standard user access

Risks if not implemented:

Fragmented authentication systems across applications
Legacy SAML limitations hindering citizen service integration
Inconsistent security posture across identity touchpoints

ADR 016: Web Application Edge Protection

Status: Accepted | Date: 2025-08-15

Context

Government web applications face heightened security threats including state-sponsored attacks, DDoS campaigns by activist groups, and sophisticated application-layer exploits targeting public services. These attacks can disrupt critical citizen services and damage public trust.

Traditional perimeter security is insufficient for protecting modern web applications that serve millions of citizens. Edge protection through CDNs and WAFs provides the first line of defense, filtering malicious traffic before it reaches application infrastructure.

References:

Decision

All public web applications and APIs must use CDN with integrated WAF protection:

CDN Requirements:

Geographic distribution with SSL/TLS termination at edge
Cache optimization and origin shielding
IPv6 dual-stack support on edge (internal use of IPv4 allowed)

WAF Protection:

OWASP Top 10 protection rules enabled
Layer 7 DDoS protection and rate limiting
Geo-blocking and bot management
Custom rules for application-specific threats

DDoS Protection:

AWS Shield Advanced or equivalent
Real-time attack monitoring and alerting
DDoS Response Team access

Implementation:

WAF logs integrated with SIEM systems
Fail-secure configuration (no fail-open)
Regular penetration testing and rule tuning
CI/CD integration for automated deployments

Consequences

Benefits:

Automated threat detection and mitigation at network edge
Global content delivery and caching capabilities
Comprehensive attack surface reduction through filtering
Real-time traffic analysis and bot management

Risks if not implemented:

Critical citizen services disrupted by attacks
Direct server exposure to malicious traffic
Slow response times affecting user adoption
No early warning of emerging attack patterns

ADR 002: AWS EKS for Cloud Workloads

Status: Accepted | Date: 2025-02-17

Context

Organisations want to efficiently manage and scale bespoke workloads in a secure and scalable manner. Traditional server management can be cumbersome and inefficient for dynamic workloads. Provider specific control planes can result in lock-in and artificial constraints limiting technology options.

Decision

To address these challenges, use a CNCF Certified Kubernetes platform with automatically managed infrastructure resources. Due to hyperscaler availability and size AWS EKS (Elastic Kubernetes Service) in auto mode is the preferred option. This leverages Kubernetes for orchestration, AWS EKS for managed Kubernetes services, AWS Elastic Block Store (EBS) for storage and AWS load balancers for traffic management.

AWS EKS Auto Mode: Provide a managed Kubernetes service, that automatically scales the infrastructure based on workload demands.
Managed Storage and NodePools: Ensure that the underlying infrastructure is maintained and updated by AWS.
Load Balancers: Standardise ingress and traffic management.
Persistent Storage Databases and object storage should be DBaaS to enable higher resilience for PITR/backups with lower overheads as per ADR 018: Database Patterns

Consequences

Benefits:

Efficient resource utilisation through managed scaling
Clear boundaries for shared responsibilities with a small operational overhead
Enhanced security through automatic updates and patches
Improved availability with managed storage and node pools

Risks if not implemented:

Resource inefficiency from manual scaling
High operational overhead managing custom infrastructure
Security vulnerabilities from delayed updates
Service downtime during traffic spikes

Strategic Research

Analysis like Cloud services and government digital sovereignty in Australia and beyond. / Mitchell, Andrew D.; Samlidis, Theodore. in the International Journal of Law and Information Technology, Vol. 29, No. 4, 2021, p. 364-394 highlights the ongoing issues with depending on hyperscalers in a single foreign jurisdiction. Based on this changing landscape, exploring simplified options for secure sovereign owned hosting options such as Australian Dedicated Servers and local colo in Tier 3+ datacentres (designed for 99.98% uptime) is warranted and touched on below.

Bare metal management

Use a platform like Proxmox VE to run standalone clusters at multiple facilities with multiple 2U servers per location. Example hardware (starts approx $15k AUD per server) - Dell PowerEdge R7725, HPE ProLiant DL385 Gen11, Lenovo ThinkSystem SR665 V3

Year 1 estimated costs:

Hardware: ~$200k for 6x ~$33k servers
Colo (2 sites, Tier 3+): ~$50k for 2x 5kw racks with 1 Gbit IP Transit
Total: $250k for ~2-3TB ram, ~500 cores, 100TB disk across 2 sites (reduce by a factor of 2-3 for redundancy)

ADR 006: Automated Policy Enforcement

Status: Proposed | Date: 2025-07-29

Context

Cloud infrastructure requires automated policy enforcement to prevent misconfigurations, ensure compliance, and provide secure network access patterns. Manual checking cannot scale effectively across multiple accounts and services.

Decision

Implement comprehensive automated policy enforcement using AWS native services for governance, network security, and access control.

Governance Foundation

AWS Control Tower: Account factory, guardrails, and compliance monitoring across organisation
Service Control Policies: Preventive controls blocking non-compliant resource creation
AWS Config Rules: Detective controls for compliance monitoring and drift detection

Network Security & Access

Transit Gateway: Central hub for intra-account resource exposure via security groups
Security Group References: Use security group IDs instead of hardcoded IP addresses for dynamic, maintainable access policies
Shield Advanced: DDoS protection per ADR 016: Web Application Edge Protection and egress intrusion detection for public-facing resources
VPC Flow Logs: Complete egress traffic monitoring and analysis per WA SOC Cyber Network Management Guideline

Note: This approach creates dependency on AWS for traffic and network protection. Open-source equivalents include Security Onion for network security monitoring, OPNsense and pfSense for firewall and intrusion detection capabilities.

Core Policy Areas

Encryption: Mandatory encryption for all data stores and communications
Access Control: IAM least-privilege access and security group-based resource access
Resource Tagging: Governance and cost allocation requirements
Data Sovereignty: Geographic restrictions for jurisdiction compliance
Network Segmentation: Security group-based micro-segmentation over IP-based rules

Implementation Requirements:

Implement policy validation in CI/CD pipelines following ADR 010: Infrastructure as Code
Use security group references over hardcoded IP addresses for maintainable policies
Monitor VPC Flow Logs for egress traffic analysis and anomaly detection

Consequences

Benefits:

Proactive security misconfiguration prevention through automated guardrails
Comprehensive egress traffic visibility via ADR 007: Centralised Security Logging
Centralised network access management reducing operational complexity

Risks if not implemented:

Security misconfigurations deploying to production environments
Unmonitored egress traffic enabling data exfiltration
Fragmented access policies creating security gaps

ADR 007: Centralised Security Logging

Status: Accepted | Date: 2025-02-25

Context

Security logs should be centrally collected to support monitoring, detection, and response capabilities across workloads. Sensitive information logging must minimize to follow data protection regulations and reduce the risk of data breaches. Audit and authentication logs are critical for security monitoring and should collect by default.

Decision

Use centralized logging using Microsoft Sentinel and Amazon CloudWatch.

Configure default collection for audit and authentication logs to simplify security investigations.
Container workloads should configure Container insights with enhanced observability + EKS control plane logging of audit and authentication logs by default.
Logging should configure to avoid capturing and exposing Personally Identifiable Information (PII).
Review and update logging configurations to ensure coverage and privacy requirements meet.
Log information used during an investigation should extract and archive to an appropriate location (in alignment with record keeping requirements).

Consequences

Benefits:

Faster incident detection and response
Simplified compliance with data protection regulations
Centralised security log management reduces operational overhead

Risks if not implemented:

Delayed security incident detection from decentralised logs
Sensitive information exposure leading to data breaches
Incomplete audit trails hindering forensic investigations

ADR 010: Infrastructure as Code

Status: Accepted | Date: 2025-03-10

Context

All environments must be reproducible from source to minimize drift and security risk. Manual changes and missing version control create deployment failures and vulnerabilities.

Compliance Requirements:

Decision

Golden Path

Git Repository Structure: Single repo per application with environments/{dev,staging,prod} folders matching AWS account names (e.g., app-a-infra repo → app-a-dev, app-a-staging, app-a-prod accounts)
State Management: Terraform remote state with locking, separate state per environment
CI Pipeline:
- Validate: Trivy scan + terraform plan/kubectl diff drift check
- Plan: Show proposed changes on PR
- Apply: Deploy on tagged release only
Versioning: Git tags = semantic versions (x.y.z) deployable to any environment
Disaster Recovery: Checkout tag + run just deploy --env=prod with static artifacts from ADR 004

Required Tools & Practices

Tool	Purpose	Stage	Mandatory
Trivy	Vulnerability scanning	Validate	Yes
Terraform or kubectl/kustomize	Configuration management	Deploy	Yes
Justfiles	Task automation	All	Recommended
devcontainer-base	Dev environment	Local	Recommended
k3d	Local testing	Dev	Optional

Infrastructure as Code Workflow:

Consequences

Benefits:

Reproducible infrastructure deployments with version control
Automated drift detection and prevention mechanisms
Reliable disaster recovery through infrastructure as code

Risks if not implemented:

Configuration drift creating security vulnerabilities
Failed rollbacks during critical incident recovery
Inconsistent environments affecting application reliability

References

ADR 014: Object Storage Backups

Status: Proposed | Date: 2025-07-22

Context

Current backup approaches lack cross-region redundancy and automated lifecycle management, creating single points of failure and compliance risks for government data retention requirements. Traditional storage systems do not provide the durability and geographic distribution needed for critical government systems.

Key challenges:

Single region backup storage creating vulnerability to regional outages
Manual backup processes prone to human error
Lack of automated recovery testing
Insufficient geographic separation for disaster recovery

References:

Decision

Implement standardized object storage backup solution with automated cross-region replication and lifecycle management for all critical systems and data.

Storage Requirements:

Object storage with versioning and immutable storage capabilities
Database, application data, and infrastructure configuration backups
Encryption at rest and in transit per ADR 005: Secrets Management
Access controls aligned with ADR 001: Application Isolation

Critical Systems Definition:

Production databases containing citizen or business data
Application source code and deployment configurations
Security logs and audit trails
Infrastructure as Code templates and state files

Geographic Distribution:

Cross-region replication within Australia
Multi-cloud backup for critical systems
Monitoring integration per ADR 007: Centralised Security Logging

Lifecycle Management:

Automated storage tiering based on age and access patterns
Compliance-based retention policies
Recovery testing and validation procedures

Recovery Objectives:

Recovery Time Objective (RTO): 4 hours for critical systems, 24 hours for standard systems
Recovery Point Objective (RPO): 1 hour for databases, 24 hours for static content
Implementation Example: AWS S3 Cross-Region Replication to Australian regions

Consequences

Benefits:

Automated disaster recovery meeting defined RTO/RPO objectives
Geographic redundancy protecting against regional outages
Compliance with government data retention requirements

Risks if not implemented:

Permanent data loss from infrastructure failures
Extended service recovery times affecting citizen services
Regulatory violations from inadequate data protection

ADR 015: Data Governance Standards

Status: Proposed | Date: 2025-07-28

Context

Data transformation pipelines require basic governance to ensure quality and compliance. SQLMesh provides built-in capabilities for contracts and lineage that reduce governance overhead.

Decision

Use SQLMesh’s semantic understanding for automated data governance with git-based workflows.

Priority Focus Areas

Data Contracts: SQLMesh model definitions serve as schema contracts with automatic breaking change detection
Column-Level Lineage: Automatic lineage tracking through SQLMesh semantic analysis
Quality Validation: SQLMesh unit tests and data diff for transformation validation
Audit Integration: Follow ADR 007: Centralized Security Logging for transformation logs

Consequences

Benefits:

Automated data quality validation and contract enforcement
Git-based workflow with familiar version control processes
Immediate feedback preventing data governance violations

Risks if not implemented:

Manual data governance creating operational overhead
Inconsistent data quality across analytics pipelines
Undetected contract violations affecting downstream systems

ADR 017: Analytics Tooling Standards

Status: Proposed | Date: 2025-07-28

Context

Organisations need simple, secure reporting that avoids complex JavaScript toolchains. Static reports reduce maintenance and security overhead.

Decision

Use Quarto for static reporting with Evidence BI for high-interactivity cases only.

Primary: Quarto Framework

Static Reports: Markdown-based with embedded visualizations
Multi-format: HTML, PDF outputs from single source
Git Integration: Reports alongside code in version control
Compliance Ready: Accessibility and professional formatting

Secondary: Evidence BI

Limited Use: Only useful for interactive drilldown reporting (simpler to host than e.g. PowerBI)
SQL-based: Minimal JavaScript complexity

Integration

Data Sources: Connect via ADR 018: Database Patterns
Deployment: Use ADR 016: Web Application Edge Protection for secure distribution

Consequences

Benefits:

Static reports with minimal operational maintenance overhead
Enhanced security posture with reduced attack surface
Version-controlled analytics ensuring reproducible results

Risks if not implemented:

High operational overhead maintaining dynamic dashboard infrastructure
Security vulnerabilities from complex interactive analytics platforms
Inconsistent analytics outputs affecting decision making

ADR 018: Database Patterns

Status: Proposed | Date: 2025-07-28

Context

Applications need managed persistent storage (databases, datalakes, files, objects) with automatic scaling and jurisdiction-compliant backup strategies.

AWS Aurora Serverless v2 Documentation
Percona Everest Documentation and Pigsty Documentation for development/non-AWS environments
s3proxy and rclone serve s3 for development/non-AWS object storage

Decision

Use Aurora Serverless v2 outside EKS clusters with automated scaling, multi-AZ deployment, and dual backup strategy. For datalakes, use SQL engines over object storage like DuckLake over AWS S3 or Trino over S3 tables

Implementation

Database: Aurora Serverless v2 (PostgreSQL/MySQL) with built-in connection pooling and automatic scaling
Object Storage: Amazon S3 and Amazon S3 Tables for datalakes, files and objects
Deployment: Outside EKS cluster (handles complexity automatically)
Credentials: Follow ADR 005: Secrets Management for endpoint and credential management
Backup: Follow ADR 014: Object Storage Backups plus AWS automated snapshots
Security: Follow ADR 007: Centralized Security Logging and ADR 012: Privileged Remote Access

Consequences

Benefits:

Serverless scaling reducing operational costs during low usage periods
Automated high availability with managed backup strategies per ADR 014: Object Backup
Compliance with jurisdiction requirements through dual backup approach

Risks if not implemented:

High operational overhead managing database infrastructure
Inconsistent backup strategies across database systems
Cost inefficiency from overprovisioned database resources

ADR 003: API Documentation Standards

Status: Accepted | Date: 2025-03-26

Context

Secure, maintainable APIs require mature frameworks with low complexity and industry standard compliance. Where existing standards exist, prefer them over bespoke REST APIs.

Compliance Requirements:

Decision

API Requirements

Requirement	Standard	Mandatory
Documentation	OpenAPI Specification	Yes
Testing	Restish CLI scripts	Yes
Framework	Huma or Litestar	Recommended
Naming	Consistent convention	Yes
Security	OWASP API security coverage	Yes
Exposure	No admin APIs on Internet	Yes

Development Guidelines

Self-Documenting: Use frameworks that auto-generate OpenAPI specs
Data Types: Prefer standard types over custom formats
Segregation: Separate APIs by purpose
Testing: Include security vulnerability checks in test scripts

API Development Flow:

Consequences

Benefits:

Standardised API documentation automatically generated from code
Enhanced security through consistent validation patterns
Reduced maintenance overhead via automated testing integration

Risks if not implemented:

Documentation drift creating integration difficulties
Security vulnerabilities from inconsistent API patterns
Increased development time debugging undocumented APIs

ADR 004: CI/CD Quality Assurance

Status: Accepted | Date: 2025-03-10

Context

Ensure security and integrity of software artifacts that are consumed by infrastructure repositories per ADR 010. Threat actors exploit vulnerabilities in code, dependencies, container images, and exposed secrets.

Compliance Requirements:

Decision

CI/CD Pipeline Requirements

Pipeline Flow: Code Commit → Build & Test → Quality Assurance → Release

Stage	Tools	Purpose	Mandatory
Build	Railpack and Docker Bake	Multi-platform builds with SBOM/provenance	Yes
Scan	scc and Trivy	Complexity and Vulnerability scanning	Yes
Analysis	Semgrep	Static code analysis	Yes
Test	Playwright	End-to-end testing	Recommended
Performance	Grafana K6	Load testing	Optional
API	Restish	API validation per ADR 003	Optional

Development Environment

Use devcontainer-base for standardized tooling
Use Railpack and Docker Bake to define and standardise build processes
Use Justfiles for task automation
Use GitHub Actions for CI/CD automation

CI/CD Pipeline:

Consequences

Benefits:

Automated security scanning and vulnerability remediation
Standardised artifact integrity and compliance alignment
Consistent deployment pipelines with audit trails

Risks if not implemented:

Vulnerable containers deployed to production
Exposed secrets in application artifacts
Manual security processes prone to human error
Compliance violations and audit failures

References

ADR 009: Release Documentation Standards

Status: Accepted | Date: 2025-03-04

Context

To ensure clear communication of changes and updates to security and infrastructure operations teams, release notes should standardize. The release notes should succinctly capture key information, including new features, improvements, bug fixes, security updates, and infrastructure changes, with links to relevant changelogs.

Australian Cyber Security Centre (ACSC) Guidelines for Software Development

Decision

Adopt a standardised release notes template in Markdown format. Brief descriptions should include the security implications and operational impacts of changes such as vulnerability fixes, compliance improvements, or changes to authentication and authorization mechanisms. Descriptions should also detail operational aspects, including deployment processes, logging & monitoring considerations, and any modifications to Infrastructure as Code (IaC).

Git Tagging Requirements:

Create a git tag for each release following semantic versioning (v1.0.0, v1.1.0, etc.)
Tags must annotate with release notes summary
Tags should create after all ADR acceptance and README updates
Tag message should reference the release documentation

A template provide below that can tailor per project. A completed release notes Markdown document should provide with all proposed changes.

## Release Notes

### Overview
- **Name:** Name
- **Version:** [Version Number](#)
- **Previous Version:** [Version Number](#)

### Changes and Testing

High level summary

**New Features & Improvements**:
- [Feature/Improvement 1]: Brief description including testing.
- [Feature/Improvement 2]: Brief description including testing.

**Bug Fixes & Security Updates**:
- [Bug Fix/Security Update 1]: Brief description with severity level and response timeline.
- [Bug Fix/Security Update 2]: Brief description with severity level and response timeline.
- **Response Timelines**: Critical (24h), High (7d), Medium (30d), Low (90d)

### Changelogs

*Only include list items changed by this release*

- **Code**: Brief description. [View Changes](#)
- **Infrastructure**: Brief description. [View Changes](#)
- **Configuration & Secrets**: Brief description.

### Known Issues
- [Known Issue 1]: Brief description.
- [Known Issue 2]: Brief description.

### Action Required
- [Action 1]: Brief description of any action required by users or stakeholders.
- [Action 2]: Brief description of any action required by users or stakeholders.

### Contact
For any questions or issues, please contact [Contact Information].

Consequences

Benefits:

Standardised release communication improving cross-team coordination
Comprehensive change tracking supporting ADR 007: Centralised Security Logging
Enhanced collaboration through consistent documentation processes

Risks if not implemented:

Critical release information lost between development teams
Poor decision making from insufficient release context
Security incidents from undocumented system changes

Reference Architecture: Content Management

Status: Proposed | Date: 2025-07-28

When to Use This Pattern

Use when building websites, content portals, or applications requiring structured content management with editorial workflows.

Overview

Template for implementing content management systems to meet jurisdiction compliance requirements.

Core Components

Project Kickoff Steps

Foundation Setup

Apply Isolation - Follow ADR 001: Application Isolation for CMS service network and runtime separation
Deploy Infrastructure - Follow ADR 002: AWS EKS for Cloud Workloads for CMS container deployment
Configure Infrastructure - Follow ADR 010: Infrastructure as Code for reproducible deployments
Setup Database - Follow ADR 018: Database Patterns for Aurora Serverless v2 content storage

Security & Operations

Configure Secrets Management - Follow ADR 005: Secrets Management for database credentials and API keys
Setup Logging - Follow ADR 007: Centralized Security Logging for audit trails and editorial tracking
Setup Backup Strategy - Follow ADR 014: Object Storage Backups for content and media backup
Configure Edge Protection - Follow ADR 016: Web Application Edge Protection for CDN and WAF setup
Identity Integration - Follow ADR 012: Privileged Remote Access for editorial authentication

Implementation Details

Content Workflows & Editorial:

Configure content workflows and editorial approval processes
Setup media asset management and CDN integration per ADR 016: Web Application Edge Protection
Implement headless CMS API following ADR 003: API Documentation Standards
Configure content moderation and approval workflows

Compliance & Accessibility:

Configure WCAG 2.1 AA accessibility compliance and automated testing
Setup jurisdiction-specific compliance requirements (e.g., privacy policies, cookie consent)
Implement content governance and retention policies per ADR 015: Data Governance Standards
Configure multilingual content management if required

Performance & SEO:

Setup SEO metadata management and structured data (JSON-LD)
Implement content performance monitoring per ADR 007: Centralized Security Logging
Configure CDN caching strategies and cache invalidation
Setup content analytics and user behavior tracking

Reference Architecture: Data Pipelines

Status: Proposed | Date: 2025-01-28

When to Use This Pattern

Use when building data processing systems for analytics, ETL workloads, or real-time data transformation across organisational systems.

Overview

Template for implementing scalable data pipelines using SQLMesh framework with S3 object storage, Aurora database patterns, and modern analytics engines (DuckDB/DuckLake) or legacy engines (Iceberg/Trino/Athena) based on existing data lake alignment.

Core Components

Data Sources -> SQLMesh -> S3 Storage SQLMesh -> Aurora PostgreSQL S3 Storage -> Analytics Engine Analytics Engine -> Reports

Data Sources: { Databases APIs Files }

SQLMesh: { Orchestration Transformation Quality Checks }

S3 Storage: Raw & Processed Data Aurora PostgreSQL: Metadata & Schema Analytics Engine: DuckDB/Athena

Reports: { Quarto Evidence BI Data API }

Project Kickoff Steps

Foundation Setup

Apply Isolation - Follow ADR 001: Application Isolation for data processing network and runtime separation
Deploy Infrastructure - Follow ADR 002: AWS EKS for Cloud Workloads for SQLMesh container deployment
Configure Infrastructure - Follow ADR 010: Infrastructure as Code for reproducible data infrastructure
Setup Database - Follow ADR 018: Database Patterns for Aurora Serverless v2 as data warehouse

Security & Operations

Configure Secrets Management - Follow ADR 005: Secrets Management for data source credentials and API keys
Setup Logging - Follow ADR 007: Centralized Security Logging for transformation audit trails
Setup Backup Strategy - Follow ADR 014: Object Storage Backups for data warehouse backup
Data Governance - Follow ADR 015: Data Governance Standards for SQLMesh data contracts and lineage

Development Process

Configure CI/CD - Follow ADR 004: CI/CD Quality Assurance for automated testing of data transformations
Setup Release Process - Follow ADR 009: Release Documentation Standards for SQLMesh model versioning
Analytics Tools - Follow ADR 017: Analytics Tooling Standards for Quarto and Evidence BI integration

Implementation Details

Data Processing & Quality:

Configure SQLMesh for data transformation and orchestration
Setup S3 object storage for data files and Aurora for metadata
Implement data quality validation rules and testing frameworks
Configure DuckDB/DuckLake or Iceberg/Trino based on existing data lake alignment

Cost Optimization & Performance:

Configure Aurora Serverless v2 autoscaling for cost-effective metadata storage
Implement S3 lifecycle policies for data archival (Intelligent Tiering → Glacier)
Setup DuckDB for cost-effective analytics vs Athena for large-scale queries
Configure SQLMesh incremental processing to minimize compute costs

Data Governance & API Access:

Setup data API following ADR 003: API Documentation Standards
Implement PII handling, data classification, and retention policies per ADR 015: Data Governance Standards
Configure data lineage tracking and impact analysis through SQLMesh
Setup automated data profiling and anomaly detection

Reference Architecture: Identity Management

Status: Proposed | Date: 2025-07-29

When to Use This Pattern

Use when building systems requiring federated identity management, single sign-on across multiple services, or integration with jurisdiction identity providers using OIDC standards.

Overview

Template for implementing OIDC-based identity federation with upstream identity providers (verifiable credentials, Australian Government Digital ID) and downstream identity consumers (AWS Cognito, Microsoft Entra ID) for comprehensive identity management across organisational boundaries.

Identity Federation Pattern

This pattern implements a broker-based identity federation that translates between upstream identity providers (Government Digital ID, verifiable credentials) and downstream identity consumers (AWS Cognito, Microsoft Entra ID).

Key Benefits:

Single integration point for multiple upstream providers
Standardised OIDC/SAML interface for downstream consumers
Centralised policy enforcement and audit logging
Support for both government and commercial identity ecosystems

Core Components

The architecture consists of three main layers:

Upstream Identity Providers supply user identities through standards-based protocols:

Government Digital ID for Australian citizens
Verifiable Credentials providers for professional credentials
External OIDC providers for commercial identity systems

Identity Broker acts as the central translation layer that:

Accepts authentication from multiple upstream providers
Normalises identity claims and attributes across providers
Issues standardised tokens to downstream consumers

Downstream Identity Consumers consume the normalised identity tokens:

AWS Cognito for cloud-native applications
Microsoft Entra ID for enterprise applications
Custom applications using OIDC/SAML protocols

Project Kickoff Steps

Infrastructure Foundation - Follow ADR 001: Application Isolation, ADR 002: AWS EKS for Cloud Workloads, and ADR 018: Database Patterns for identity service deployment and data separation
Security & Secrets - Follow ADR 005: Secrets Management for OIDC client secrets and ADR 007: Centralized Security Logging for authentication audit trails
Identity Federation - Follow ADR 013: Identity Federation Standards for upstream provider integration and downstream consumer configuration
Privileged Administration - Follow ADR 012: Privileged Remote Access for identity service administration access

Implementation Considerations

Privacy & PII Protection (Digital ID Act 2024):

Data minimisation: Prohibit collection beyond identity verification requirements
No single identifiers: Prevent tracking across services using persistent identifiers
Marketing restrictions: Prohibit disclosure of identity information for marketing purposes
Voluntary participation: Users cannot be required to create Digital ID for service access
Biometric safeguards: Restrict collection, use, and disclosure of biometric information
Breach notification: Implement cyber security and fraud incident management processes

Identity Proofing Level Selection:

IP1-IP2: Low-risk transactions with minimal personal information exposure
IP2+: Higher-risk services requiring biometric verification and stronger assurance
Risk assessment: Match proofing level to transaction risk and data sensitivity
Credential binding: Ensure authentication levels align with proofing requirements

Standards Compliance:

Verifiable credentials: ISO/IEC 18013-5:2021 and W3C Verifiable Credentials
Government Digital ID: Digital ID Act 2024 privacy and security requirements
International interoperability: eIDAS Regulation patterns

Reference Architecture: OpenAPI Backend

Status: Proposed | Date: 2025-07-28

When to Use This Pattern

Use when you need clear separation between user-facing and administrative operations in REST APIs requiring structured documentation and testing.

Overview

Template for implementing OpenAPI-first API services with complete separation between user-facing and administrative operations.

Core Components

API Clients -> Edge Protection -> Routing

Routing -> Standard APIs: api.domain Routing -> Admin APIs: admin.domain

Standard APIs: { style: { fill: “#e3f2fd” stroke: “#1565c0” } }

Admin APIs: { style: { fill: “#ffebee” stroke: “#c62828” } }

Standard APIs -> User Database Admin APIs -> System Database Admin APIs -> User Database: admin access Standard APIs (/api/v1/*): Business operations for authenticated users

Examples: /api/v1/users/profile, /api/v1/orders, /api/v1/documents
Domain: api.domain with Standard Realm authentication

Administrative APIs (/admin/v1/*): System management for privileged users

Examples: /admin/v1/users, /admin/v1/system/config, /admin/v1/audit-logs
Domain: admin.domain with Admin Realm authentication

Complete Separation: Two separate routers provide full network and authentication isolation between standard and administrative operations per ADR 013: Identity Federation Standards.

Project Kickoff Steps

Infrastructure Foundation - Follow ADR 001: Application Isolation and ADR 002: AWS EKS for Cloud Workloads
API Standards - Follow ADR 003: API Documentation Standards for OpenAPI specification
Identity Federation - Follow ADR 013: Identity Federation Standards for domain separation
Edge Protection - Follow ADR 016: Web Application Edge Protection for rate limiting and security
Database & Secrets - Follow ADR 018: Database Patterns and ADR 005: Secrets Management
Logging & Monitoring - Follow ADR 007: Centralized Security Logging for audit trails

ADR ###: Specific Decision Title

Status: Proposed | Date: YYYY-MM-DD

Context

What problem are we solving? Include background and constraints.

Decision

What we decided and how to implement it:

Requirement 1: Specific implementation detail
Requirement 2: Configuration specifics
Requirement 3: Monitoring approach

Consequences

Positive:

Benefit 1 with explanation
Benefit 2 with explanation

Negative:

Risk 1 with mitigation
Risk 2 with mitigation

Reference Architecture: Pattern Name

Status: Proposed | Date: YYYY-MM-DD

When to Use This Pattern

Clear use case description for when to apply this architecture.

Overview

Brief template description focusing on practical implementation.

Core Components

Project Kickoff Steps

Step Name - Follow relevant ADRs for implementation
Next Step - ADR needed for missing standards
Final Step - Reference to existing practices

Contributing Guide

When to Create ADRs

Create ADRs for foundational decisions only:

High cost to change mid/late project
Architectural patterns and technology standards
Security frameworks and compliance requirements
Infrastructure patterns that affect multiple teams

Don’t create ADRs for:

Implementation details (use documentation)
Project-specific configurations
Operational procedures that change frequently
Tool-specific guidance that belongs in user manuals

Quick Workflow

Open in Codespaces → Automatic tool setup
Get number → just next-number
Create file → ###-short-name.md in correct directory (see content types)
Write content → Follow template below
Lint → just lint to fix formatting, check SUMMARY.md, and validate links
Add to SUMMARY.md → Include new ADR in navigation (required for mdBook)
Submit PR → Ready for review

ADR Creation Workflow:

Directory Structure

Directory	Content
`development/`	API standards, CI/CD, releases
`operations/`	Infrastructure, logging, config
`security/`	Isolation, secrets, AI governance
`reference-architectures/`	Project kickoff templates

Content Types: When to Use What

ADRs (Architecture Decision Records)

Purpose: Document foundational technology decisions that are expensive to change
Format: ###-decision-name.md in development/, operations/, or security/
Examples: “AWS EKS for workloads”, “Secrets management approach”, “API standards”

Reference Architectures

Purpose: Project kickoff templates that combine multiple existing ADRs
Format: descriptive-name.md in reference-architectures/
Examples: “Content Management”, “Data Pipelines”, “Identity Management”

Rule: Reference architectures should only link to existing ADRs, not create new ones.

ADR Template

See templates/adr-template.md for the complete template.

Note: ADR numbers are globally unique across all directories (gaps from removed drafts are normal)

Reference Architecture Template

See templates/reference-architecture-template.md for the complete template.

Quality Standards

Before submitting:

Title is concise (under 50 characters) and actionable
All acronyms defined on first use
Active voice (not passive)
Passes just lint without errors

Title Examples:

GOOD: “ADR 002: AWS EKS for Cloud Workloads” (concise, ~30 chars)
GOOD: “ADR 008: Email Authentication Protocols” (specific, clear)
BAD: “ADR 004: Enforce release quality with CI/CD prechecks and build attestation” (too long)
BAD: “Container stuff” or “Security improvements” (too vague)

Status Guide

Status	Meaning
`Proposed`	Under review
`Accepted`	Active decision
`Superseded`	Replaced by newer ADR

ADR References

Reference format:

[ADR 005: Secrets Management](../security/005-secrets-management.md)
Quick reference: per ADR 005
Multiple refs: aligned with ADR 001 and ADR 005

Examples:

“Encryption handled per ADR 005: Secrets Management”
“Access controls aligned with ADR 001”

Writing Tips

Be specific: “Use AWS EKS auto mode” not “Use containers”
Include implementation: How, not just what
Define scope: What’s included and excluded
Reference standards: Link to external docs
Australian English: Use “organisation” not “organization”, “jurisdiction” not “government”
Character usage: Use plain-text safe Unicode - avoid emoji, smart quotes, em-dashes for PDF compatibility
D2 diagrams: Use D2 format for diagrams with clean syntax and universal compatibility
- Use when text alone isn’t sufficient (system relationships, data flows, workflows)
- Keep simple: 5-7 components max, clear labels, logical flow, consistent colors

Glossary

Acronyms and Definitions

ACSC - Australian Cyber Security Centre
ADR - Architecture Decision Record
API - Application Programming Interface
ATT&CK - Adversarial Tactics, Techniques & Common Knowledge (MITRE)
AWS - Amazon Web Services
BIMI - Brand Indicators for Message Identification
CDN - Content Delivery Network
CI/CD - Continuous Integration/Continuous Deployment
CNCF - Cloud Native Computing Foundation
DGOV - Office of Digital Government (Western Australia)
DKIM - DomainKeys Identified Mail
DMARC - Domain-based Message Authentication, Reporting and Conformance
DNS - Domain Name System
DTT - Digital Transformation and Technology Unit
EKS - Elastic Kubernetes Service (AWS)
GCP - Google Cloud Platform
IAM - Identity and Access Management
IAP - Identity-Aware Proxy
JIT - Just-In-Time
OWASP - Open Web Application Security Project
RDP - Remote Desktop Protocol
RPO - Recovery Point Objective
RTO - Recovery Time Objective
SIEM - Security Information and Event Management
SPF - Sender Policy Framework
TLS - Transport Layer Security
VPN - Virtual Private Network
WAF - Web Application Firewall

Keyboard shortcuts

DGOV DTT Architecture Decision Records