Designing Safer Production Releases: A Practical Journey with Azure DevOps

Production systems don’t usually fail because of missing tools.
They fail because too much happens implicitly.

A merge triggers a deploy.
A fix goes live unintentionally.
Weeks later, no one is entirely sure what version is actually running.

This article documents a deliberate shift I made in how production releases are handled—moving from implicit deployment behavior to explicit, intentional releases using Git tags and infrastructure templates in Azure DevOps.

This wasn’t about adding complexity.
It was about removing ambiguity.


The Problem I Wanted to Solve

Before the change, the release model had familiar weaknesses:

  • Merges to main were tightly coupled to deployment
  • Production changes could happen without a conscious “release decision”
  • Version visibility in production was inconsistent
  • Pipelines mixed application logic and platform concerns

None of this caused daily failures—but it created latent risk.

The question I asked was simple:

How do I make production boring, predictable, and explainable?


The Guiding Principles

Instead of starting with tooling, I started with principles:

  1. Production changes must be intentional
  2. Releases must be immutable and auditable
  3. Application code and platform logic should not live together
  4. Developers should not need to understand deployment internals
  5. The system should scale from solo to enterprise without redesign

Everything else followed from these.


The Core Decision: Tag-Based Releases

The single most important change was this:

Production deployments are triggered only by Git tags.

Not by merges.
Not by branch updates.
Not by UI clicks.

A release now requires an explicit action:

git tag vX.Y.Z
git push origin vX.Y.Z

That’s the moment a human says: “This is production.”


Separating Responsibilities with Repositories

To support this model cleanly, responsibilities were split across two repositories:

Application Repository

  • Contains UI, APIs, and business logic
  • Has a single, thin pipeline entry file
  • Decides when to release (via tags)

Infrastructure Repository

  • Contains pipeline templates and deployment logic
  • Builds and deploys applications
  • Defines how releases happen

This separation ensures:

  • Platform evolution doesn’t pollute application repos
  • Multiple applications can share the same release model
  • Infrastructure changes are treated as infrastructure—not features

Pipelines as Infrastructure, Not Code

A key mindset shift was treating pipelines as platform infrastructure.

That meant:

  • Pipeline entry files are locked behind PRs
  • Changes are rare and intentional
  • Developers generally don’t touch them
  • Deployment logic lives outside the app repo

This immediately reduced accidental breakage and cognitive load.


Versioning: Moving from Build-Time to Runtime

Once releases were driven by tags, traditional assembly-based versioning stopped being useful—especially for static web applications.

Instead, version information is now injected at build time into a runtime artifact:

/version.json

Example:

{ "version": "v2.0.5" }

The application reads this file at runtime to display its version.

This approach:

  • Works cleanly with static hosting
  • Reflects exactly what was released
  • Is easy to extend with commit hashes or timestamps
  • Decouples versioning from build tooling

The Day-to-Day Experience

After the setup, daily work became simpler—not more complex.

  • Developers work in feature branches
  • Code is merged into main without fear
  • Nothing deploys automatically
  • Production changes require an explicit tag

Releases are boring.
And that’s exactly the goal.


Rollbacks and Auditability

Because releases are immutable:

  • Redeploying a version is trivial
  • Rollbacks are predictable
  • There’s always a clear answer to: “What code is running in production?”

This is especially valuable in regulated or client-facing environments.


Tradeoffs and Honest Costs

This approach isn’t free.

Costs:

  • Initial setup takes time
  • Azure DevOps YAML has sharp edges
  • Pipelines must exist before tags will trigger
  • Early experimentation may require tag resets

Benefits:

  • Zero accidental prod deploys
  • Clear ownership and accountability
  • Clean separation of concerns
  • Reusable platform foundation
  • Long-term operational confidence

For long-lived systems, the tradeoff is worth it.


When This Pattern Makes Sense

This model works best when:

  • Production stability matters
  • Systems are long-lived
  • Auditability or compliance is a concern
  • Teams want clarity over convenience

It’s less suitable for:

  • Hackathons
  • Throwaway prototypes
  • “Merge = deploy” cultures

The Leadership Lesson

The most important takeaway wasn’t technical.

Good systems make intent explicit.
Great systems remove ambiguity from critical outcomes.

Production safety doesn’t come from moving slower.
It comes from designing systems where important changes happen on purpose.


Final Thoughts

This wasn’t about Azure DevOps specifically.
The same principles apply anywhere.

If you can answer these questions clearly, you’re on the right path:

  • Who decided this went to production?
  • When did that decision happen?
  • What exactly was released?

If those answers are obvious, production becomes boring.

And boring production is a feature.

WordPress on Azure Container Apps (ACA)

Architecture, Backup, and Recovery Design

1. Overview

This document describes the production architecture for WordPress running on Azure Container Apps (ACA) with MariaDB, including backup, recovery, monitoring, and automation. The design prioritizes:

  • Low operational overhead
  • Cost efficiency
  • Clear separation of concerns
  • Fast, predictable recovery
  • No dependency on VM-based services or Backup Vault

This architecture is suitable for long-term operation (multi‑year) with minimal maintenance.


2. High-Level Architecture

Core Components

  • Azure Container Apps Environment
    • Hosts WordPress and MariaDB container apps
  • WordPress Container App (ca-wp)
    • Apache + PHP WordPress image
    • Stateless container
    • Persistent content via Azure Files
  • MariaDB Container App (ca-mariadb)
    • Dedicated container app
    • Internal-only access
    • Database for WordPress
  • Azure Files (Storage Account: st4wpaca)
    • File share: wpcontent
    • Mounted into WordPress container
    • Stores plugins, themes, uploads, logs
  • Azure Blob Storage
    • Stores MariaDB logical backups (.sql.gz)

3. Data Persistence Model

WordPress Files

  • wp-content directory is mounted to Azure Files
  • Includes:
    • Plugins
    • Themes
    • Uploads
    • Logs (debug.log)

Database

  • MariaDB runs inside its own container
  • No local persistence assumed
  • Database durability ensured via daily logical backups

4. Backup Architecture

4.1 WordPress Files Backup (Primary)

Method: Azure Files Share Snapshots

  • Daily snapshots of wpcontent file share
  • Snapshot creation automated via Azure Automation Runbook
  • Retention enforced (e.g., 14 days)

Why this works well:

  • Instant snapshot creation
  • Very fast restore
  • Extremely low cost
  • No application involvement

4.2 MariaDB Backup (Primary)

Method: Logical database dumps (mysqldump)

  • Implemented via Azure Container App Jobs
  • Backup job runs on schedule (daily)
  • Output compressed SQL file
  • Stored in Azure Blob Storage

Additional Jobs:

  • Cleanup job to enforce retention
  • Restore job for controlled database recovery

4.3 Backup Automation

Azure Automation Account (aa-wp-backup)

  • Central automation control plane
  • Uses system-assigned managed identity
  • Hosts multiple runbooks:
    • Azure Files snapshot creation
    • Snapshot retention cleanup

Key Vault Integration:

  • Secrets stored in kv-tanolis-app
    • Storage account key
    • MariaDB host
    • MariaDB user
    • MariaDB password
    • MariaDB database name
  • Automation and jobs retrieve secrets securely

5. Restore Scenarios

Scenario 1: Restore WordPress Files Only

Use case:

  • Plugin or theme deletion
  • Media loss

Steps:

  1. Select Azure Files snapshot for wpcontent
  2. Restore entire share or specific folders
  3. Restart WordPress container app

Scenario 2: Restore Database Only

Use case:

  • Content corruption
  • Bad plugin update

Steps:

  1. Download appropriate SQL backup from Blob
  2. Execute restore job or import via MariaDB container
  3. Restart WordPress container
  4. Save permalinks in WordPress admin

Scenario 3: Full Site Restore

Use case:

  • Major failure
  • Security incident
  • Rollback to known-good state

Steps:

  1. Restore Azure Files snapshot
  2. Restore matching MariaDB backup
  3. Restart WordPress container
  4. Validate site and permalinks

6. Monitoring & Alerting

Logging

  • Azure Container Apps logs
  • WordPress debug log (wp-content/debug.log)

Alerts

  • MariaDB backup job failure alert
  • Container restart alerts
  • Optional resource utilization alerts

External Monitoring

  • HTTP uptime checks for site availability

7. Security Considerations

  • No public access to MariaDB container
  • Secrets stored only in Azure Key Vault
  • Managed Identity used for automation
  • No credentials embedded in scripts
  • Optional IP restrictions for /wp-admin

8. Cost Characteristics

  • Azure Files snapshots: very low cost (delta-based)
  • Azure Blob backups: pennies/month
  • Azure Automation: within free tier for typical usage
  • No Backup Vault protected-instance fees

Overall cost remains low single-digit USD/month for backups.


9. Operational Best Practices

  • Test restore procedures quarterly
  • Keep file and DB backups aligned by date
  • Maintain at least 7–14 days retention
  • Restart WordPress container after restores
  • Document restore steps for operators

10. Summary

This architecture delivers:

  • Reliable backups without over-engineering
  • Fast and predictable recovery
  • Minimal cost
  • Clear operational boundaries
  • Long-term maintainability

It is well-suited for WordPress workloads running on Azure Container Apps and avoids VM-centric or legacy backup models.

Building a Practical Azure Landing Zone for a Small Organization — My Hands-On Journey

Over the past few weeks, I went through the full process of designing and implementing a lean but enterprise-grade Azure Landing Zone for a small organization. The goal wasn’t to build a complex cloud platform — it was to create something secure, governed, and scalable, while remaining simple enough to operate with a small team.

This experience helped me balance cloud architecture discipline with practical constraints, and it clarified what really matters at this scale.

Here’s what I built, why I built it that way, and what I learned along the way.


🧭 Starting with the Foundation: Management Groups & Environment Separation

The first step was establishing a clear environment structure. Instead of allowing resources to sprawl across subscriptions, I organized everything under a Landing Zones management group:

Tenant Root
 └─ Landing Zones
     ├─ Development
     │   └─ Dev Subscription
     └─ Production
         └─ Prod Subscription

This created clear separation of environments, enforced consistent policies, and gave the platform team a single place to manage governance.

For a small org, this structure is lightweight — but future-proof.


🔐 Designing RBAC the Right Way — Without Over-Permissioning

Next came access control — usually the most fragile part of small Azure environments.

I replaced ad-hoc permissions with a clean RBAC model:

  • tanolis-platform-adminsOwner at Landing Zones MG (inherited)
  • Break-glass account → Direct Owner for emergencies only
  • Dev users → Contributor or RG-scoped access only in Dev
  • Prod users → Reader by default, scoped contributor only when justified

No direct Owner permissions on subscriptions.
No developers in Prod by default.
Everything through security groups, not user assignments.

This drastically reduced risk, while keeping administration simple.


🧯 Implementing a Real Break-Glass Model

Many organizations skip this — until they get locked out.

I created a dedicated break-glass account with:

  • Direct Owner at the Landing Zones scope
  • Strong MFA + secure offline credential storage
  • Sign-in alerts for monitoring
  • A documented recovery runbook

We tested recovery scenarios to ensure it could restore access safely and quickly.

It wasn’t about giving more power — it was about preventing operational dead-ends.


🛡️ Applying Policy Guardrails — Just Enough Governance

Instead of trying to deploy every policy possible, I applied a starter baseline:

  • Required resource tags (env, owner, costCenter)
  • Logging and Defender for Cloud enabled
  • Key Vault protection features
  • Guardrails against unsafe exposure where reasonable

The focus was risk-reduction without friction — especially important in small teams where over-governance leads to shadow IT.


🧱 Defining a Simple, Scalable Access Model for Workloads

For Dev workloads, I adopted Contributor at subscription or RG level, depending on the need.
For Prod, I enforced least privilege and scoped access.

To support this, I created a naming convention for access groups:

<org>-<env>-<workload>-rg-<role>

Examples:

  • tanolis-dev-webapi-rg-contributors
  • tanolis-prod-data-rg-readers

This makes group intent self-documenting and audit-friendly — which matters more as environments grow.


📘 Documenting the Platform — Turning Architecture into an Operating Model

Technology wasn’t the final deliverable — operability was.

I created lightweight but meaningful platform artifacts:

  • Platform Operations Runbook
  • Subscription & Environment Register
  • RBAC and access governance model
  • Break-glass SOP and validation checklist

The goal was simple:

The platform should be understandable, supportable, and repeatable — not just functional.


🎯 What This Experience Reinforced

This project highlighted several key lessons:

  • 🟢 Small orgs don’t need complex cloud — they need clear boundaries and discipline
  • 🟢 RBAC and identity design matter more than tools or services
  • 🟢 A working break-glass model is not optional
  • 🟢 Policies should guide, not obstruct
  • 🟢 Documentation doesn’t have to be heavy — just intentional
  • 🟢 Good foundations reduce future migration and security pain

A Landing Zone is not just a technical construct — it’s an operating model for the cloud.


🚀 What’s Next

With governance and identity foundations in place, the next evolution will focus on:

  • Network & connectivity design (simple hub-lite or workload-isolated)
  • Logging & monitoring baselines
  • Cost governance and budgets
  • Gradual shift toward Infrastructure-as-Code
  • Backup, DR, and operational resilience

Each step can now be layered safely — because the core platform is stable.


🧩 Final Thought

This experience reinforced that even in small environments, doing cloud “the right way” is absolutely achievable.

You don’t need a massive platform team — you just need:

  • good structure
  • intentional governance
  • and a mindset of sustainability over quick wins.

That’s what turns an Azure subscription into a true Landing Zone.

TLS on a Simple Dockerized WordPress VM (Certbot + Nginx)

This note documents how TLS was issued, configured, and made fully automatic for a WordPress site running on a single Ubuntu VM with Docker, Nginx, PHP-FPM, and MariaDB.

The goal was boring, predictable HTTPS — no load balancers, no Front Door, no App Service magic.


Architecture Context

  • Host: Azure Ubuntu VM (public IP)
  • Web server: Nginx (Docker container)
  • App: WordPress (PHP-FPM container)
  • DB: MariaDB (container)
  • TLS: Let’s Encrypt via Certbot (host-level)
  • DNS: Azure DNS → VM public IP
  • Ports:
    • 80 → HTTP (redirect + ACME challenge)
    • 443 → HTTPS

1. Certificate Issuance (Initial)

Certbot was installed on the VM (host), not inside Docker.

Initial issuance was done using standalone mode (acceptable for first issuance):

sudo certbot certonly \
  --standalone \
  -d shahzadblog.com

This required:

  • Port 80 temporarily free
  • Docker/nginx stopped during issuance

Resulting certs live at:

/etc/letsencrypt/live/shahzadblog.com/
  ├── fullchain.pem
  └── privkey.pem

2. Nginx TLS Configuration (Docker)

Nginx runs in Docker and mounts the host cert directory read-only.

Docker Compose (nginx excerpt)

nginx:
  image: nginx:alpine
  ports:
    - "80:80"
    - "443:443"
  volumes:
    - ./wordpress:/var/www/html
    - ./nginx/default.conf:/etc/nginx/conf.d/default.conf
    - /etc/letsencrypt:/etc/letsencrypt:ro

Nginx config (key points)

  • Explicit HTTP → HTTPS redirect
  • TLS configured with Let’s Encrypt certs
  • HTTP left available only for ACME challenges
# HTTP (ACME + redirect)
server {
    listen 80;
    server_name shahzadblog.com;

    location ^~ /.well-known/acme-challenge/ {
        root /var/www/html;
        allow all;
    }

    location / {
        return 301 https://$host$request_uri;
    }
}

# HTTPS
server {
    listen 443 ssl;
    http2 on;

    server_name shahzadblog.com;

    ssl_certificate     /etc/letsencrypt/live/shahzadblog.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/shahzadblog.com/privkey.pem;

    root /var/www/html;
    index index.php index.html;

    location / {
        try_files $uri $uri/ /index.php?$args;
    }

    location ~ \.php$ {
        fastcgi_pass wordpress:9000;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    }
}

3. Why Standalone Renewal Failed

Certbot auto-renew initially failed with:

Could not bind TCP port 80

Reason:

  • Docker/nginx already listening on port 80
  • Standalone renewal always tries to bind port 80

This is expected behavior.


4. Switching to Webroot Renewal (Correct Fix)

Instead of stopping Docker every 60–90 days, renewal was switched to webroot mode.

Key Insight

Certbot (host) and Nginx (container) must point to the same physical directory.

  • Nginx serves:
    ~/wp-docker/wordpress → /var/www/html (container)
  • Certbot must write challenges into:
    ~/wp-docker/wordpress/.well-known/acme-challenge

5. Renewal Config Fix (Critical Step)

Edit the renewal file:

sudo nano /etc/letsencrypt/renewal/shahzadblog.com.conf

Change:

authenticator = standalone

To:

authenticator = webroot
webroot_path = /home/azureuser/wp-docker/wordpress

⚠️ Do not use /var/www/html here — that path exists only inside Docker.


6. Filesystem Permissions

Because Docker created WordPress files as root, the ACME path had to be created with sudo:

sudo mkdir -p /home/azureuser/wp-docker/wordpress/.well-known/acme-challenge
sudo chmod -R 755 /home/azureuser/wp-docker/wordpress/.well-known

Validation test:

echo test | sudo tee /home/azureuser/wp-docker/wordpress/.well-known/acme-challenge/test.txt
curl http://shahzadblog.com/.well-known/acme-challenge/test.txt

Expected output:

test

7. Final Renewal Test (Success Condition)

sudo certbot renew --dry-run

Success message:

Congratulations, all simulated renewals succeeded!

At this point:

  • Certbot timer is active
  • Docker/nginx stays running
  • No port conflicts
  • No manual intervention required

Final State (What “Done” Looks Like)

  • 🔒 HTTPS works in all browsers
  • 🔁 Cert auto-renews in background
  • 🐳 Docker untouched during renewals
  • 💸 No additional Azure services
  • 🧠 Minimal moving parts

Key Lessons

  • Standalone mode is fine for first issuance, not renewal
  • In Docker setups, filesystem alignment matters more than ports
  • Webroot renewal is the simplest long-term option
  • Don’t fight permissions — use sudo intentionally
  • “Simple & boring” scales better than clever abstractions

This setup is intentionally non-enterprise, low-cost, and stable — exactly what a long-running personal site needs.

Rebuilding My Personal Blog on Azure: Lessons From the Trenches

In January, I decided to rebuild my personal WordPress blog on Azure.

Not as a demo.
Not as a “hello world.”
But as a long-running, low-cost, production-grade personal workload—something I could realistically live with for years.

What followed was a reminder of why real cloud engineering is never about just clicking “Create”.


Why I Didn’t Use App Service (Again)

I initially explored managed options like Azure App Service and Azure Container Apps. On paper, they’re perfect. In practice, for a personal blog:

  • Storage behavior mattered more than storage size
  • Hidden costs surfaced through SMB operations and snapshots
  • PHP versioning and runtime controls were more rigid than expected

Nothing was “wrong” — but it wasn’t predictable enough for a small, fixed budget site.

So I stepped back and asked a simpler question:

What is the most boring, controllable architecture that will still work five years from now?


The Architecture I Settled On

I landed on a single Ubuntu VM, intentionally small:

  • Azure VM: B1ms (1 vCPU, 2 GB RAM)
  • OS: Ubuntu 22.04 LTS
  • Stack: Docker + Nginx + WordPress (PHP-FPM) + MariaDB
  • Disk: 30 GB managed disk
  • Access: SSH with key-based auth
  • Networking: Basic NSG, public IP

No autoscaling. No magic. No illusions.

Just something I fully understand.


Azure Policy: A Reality Check

The first thing that blocked me wasn’t Linux or Docker — it was Azure Policy.

Every resource creation failed until I added mandatory tags:

  • env
  • costCenter
  • owner

Not just on the VM — but on:

  • Network interfaces
  • Public IPs
  • NSGs
  • Disks
  • VNets

Annoying? Slightly.
Realistic? Absolutely.

This is what production Azure environments actually look like.


The “Small” Issues That Matter

A few things that sound trivial — until you hit them at 2 AM:

  • SSH keys rejected due to incorrect file permissions on Windows/WSL
  • PHP upload limits silently capped at 2 MB
  • Nginx + PHP-FPM + Docker each enforcing their own limits
  • A 129 MB WordPress backup restore failing until every layer agreed
  • Choosing between Premium vs Standard disks for a low-IO workload

None of these are headline features.
All of them determine whether the site actually works.


Cost Reality

My target budget: under $150/month total, including:

  • A static site (tanolis.us)
  • This WordPress blog

The VM-based approach keeps costs:

  • Predictable
  • Transparent
  • Easy to tune (disk tier, VM size, shutdown schedules)

No surprises. No runaway meters.


Why This Experience Matters

This wasn’t about WordPress.

It was about:

  • Designing for longevity, not demos
  • Understanding cost behavior, not just pricing
  • Respecting platform guardrails instead of fighting them
  • Choosing simplicity over abstraction when it makes sense

The cloud is easy when everything works.
Engineering starts when it doesn’t.


What’s Next

For now, the site is up.
Backups are restored.
Costs are under control.

Next steps — when I feel like it:

  • TLS with Let’s Encrypt
  • Snapshot or off-VM backups
  • Minor hardening

But nothing urgent. And that’s the point.

Sometimes the best architecture is the one that lets you stop thinking about it.