Thinking Like an Azure Architect: The 4-Question Framework I Use to Evaluate Any System

In cloud engineering, tools are easy.

Azure Application Insights. Log Analytics. Key Vault. Entra ID. ADF. Kubernetes. You name it.

But tools don’t create good architecture.

Thinking does.

Over time — across Azure landing zones, identity refactoring, incident recovery, and cost governance work — I noticed something consistent:

Senior Azure architects evaluate systems using a simple mental model.

Not documentation-heavy frameworks.
Not 40-page design templates.

Just four questions.

This article captures that framework for future reference.


The 4-Question Azure Architect Framework

You can apply this to:

  • A file router
  • Monitoring strategy
  • Identity design
  • Networking segmentation
  • SaaS MVP architecture
  • Even a small internal utility

1️⃣ What Happens When It Fails?

Most engineers ask:

“Does it work?”

Architects ask:

“What happens when it breaks?”

Failure-first thinking changes everything.

For example:

  • If a file router crashes, is the file retried?
  • If a background job fails silently, who detects it?
  • If a dependency times out, does it cascade?
  • If logging is disabled, can we reconstruct events?

In Azure environments, this usually translates to:

  • Proper use of Azure Application Insights
  • Dead-letter queues
  • Retry policies
  • Correlation IDs
  • Alert rules

Resilience is not about uptime — it’s about recoverability and visibility.


2️⃣ Who Feels the Impact?

Not all failures are equal.

Ask:

  • Is this internal tooling?
  • Does it affect customers?
  • Is revenue tied to it?
  • Is compliance exposure involved?

For example:

If a low-risk internal service fails, default telemetry in Azure Application Insights might be sufficient.

If the system routes financial transactions or regulatory documents, monitoring maturity must increase.

Architecture maturity should match business criticality.

Over-engineering internal tools wastes cost.
Under-engineering customer-facing systems creates risk.


3️⃣ Can We Evolve This Without Rebuilding It?

This is where architecture becomes strategy.

Perfect systems don’t exist.
Evolvable systems do.

Ask:

  • Can we add custom telemetry later without refactoring?
  • Can we scale logging without rewriting the app?
  • Can we introduce alerts without redesigning the service?
  • Can we move from single-region to multi-region if needed?

Good Azure design allows layering.

For example:

  • Start with default App Insights.
  • Later add custom events.
  • Then introduce dashboards.
  • Then configure alerting rules.
  • Eventually integrate with SIEM if required.

If improvement requires a rewrite, the original design was brittle.


4️⃣ Is Complexity Justified Right Now?

Azure makes it easy to add services.

It’s also easy to overspend and overbuild.

Before adding complexity, ask:

  • Are we solving today’s real problem?
  • Or anticipating hypothetical risk?
  • Is there operational pain?
  • Is the cost proportional?

This question protects teams from unnecessary engineering.

Many environments only need:

  • Baseline monitoring
  • Basic alerting
  • Clear logging structure

Not every service needs enterprise-grade observability from day one.

Maturity should evolve with operational pressure.


Applying This to a Real Scenario

Imagine someone says:

“We just use default App Insights. We don’t go much further.”

Instead of reacting, run the framework:

  1. What happens when it fails?
  2. Who feels the impact?
  3. Can we evolve monitoring later?
  4. Is deeper observability justified now?

The answer might be:

  • Baseline telemetry is fine today.
  • Add lifecycle logging only if routing becomes business-critical.
  • Keep architecture flexible.

That’s architect thinking.

Not reactive.
Not dramatic.
Not tool-obsessed.


Why This Framework Matters

In my experience working across Azure infrastructure, identity, DevOps pipelines, and operational recovery scenarios:

The biggest difference between mid-level engineers and senior architects is not tool knowledge.

It’s:

  • Systems thinking
  • Failure awareness
  • Tradeoff evaluation
  • Calm decision-making

Architects don’t chase perfection.

They design for evolution.


Final Thought

Cloud architecture is not about using more services.

It’s about asking better questions.

Before adding monitoring.
Before redesigning identity.
Before introducing complexity.

Ask the four questions.

They work every time.

Most .NET Developers Allocate Memory They Don’t Need To — Span Fixes That

If you work with .NET long enough, you eventually discover that performance issues rarely come from complex algorithms.

They come from small allocations happening millions of times.

And many of those allocations come from code that looks perfectly harmless.


The Hidden Problem: Unnecessary Heap Allocations

Consider common operations like:

  • .Substring()
  • .Split()
  • .ToArray()

These methods feel lightweight, but each one creates new objects on the heap.

That means:

  • More memory usage
  • More work for the garbage collector
  • More latency under load

In an API handling thousands of requests — or inside a tight parsing loop — these tiny costs accumulate quickly.


Enter Span<T>

Span<T> solves this by letting you work with existing memory instead of allocating new memory.

Think of it as:

A lightweight window into data that already exists.

No copying.
No allocations.
No extra GC pressure.


A Simple Example

Imagine you have a date string:

string date = "2025-06-15";

Most developers extract the year like this:

var year = date.Substring(0, 4);

This creates a brand-new string "2025" on the heap.

Now compare that with:

ReadOnlySpan<char> year = date.AsSpan(0, 4);

Same logical result — but zero allocation.

You’re simply pointing to a slice of the original string.


The Core Mental Model

A Span<T> does not own memory.

It only references memory that already exists.

Think of it like:

Original data  ───────────────
[ window ]
Span

You move the window around instead of copying the data.


Where Span<T> Really Shines

Once you understand the concept, you’ll start seeing opportunities everywhere:

Parsing workloads

  • CSV or log file parsing without generating thousands of temporary strings.

HTTP processing

  • Parse headers without copying byte arrays.

Binary protocols

  • Slice buffers directly instead of creating intermediate arrays.

String processing

  • Replace Split() calls that create multiple arrays and strings.

Real-World Impact

In production parsing-heavy services, teams commonly see:

  • 40–60% fewer allocations
  • Noticeably reduced GC pauses
  • Higher throughput under load

Less copying means more CPU time spent doing real work.


The Three Rules You Need to Remember

1️⃣ Span<T> lives on the stack

You cannot store it on the heap or in class fields.

2️⃣ Use ReadOnlySpan<T> for read-only data

Most string scenarios fall into this category.

3️⃣ Use Memory<T> when persistence is required

If you need to store or pass the reference beyond stack scope, use Memory<T>.


How to Adopt It Without a Big Rewrite

You don’t need to refactor your entire codebase.

Start small:

  1. Profile your application
  2. Identify hot paths
  3. Look for repeated Substring, Split, or ToArray calls
  4. Replace them with Span slicing
  5. Measure again

Performance improvements here are often immediate and measurable.


Final Thought

Most .NET performance problems aren’t about writing clever code.

They’re about avoiding unnecessary work.

Span<T> gives you a simple, safe way to reduce allocations and let your application scale more efficiently — without changing how your logic works.

Once you start using it in hot paths, it becomes difficult to go back.

The Modern .NET Developer in 2026: From Code Writer to System Builder

There was a time when being a .NET developer mostly meant writing solid C# code, building APIs, and shipping features. If the application worked and the database queries were fast enough, the job was done.

That world is gone.

In 2026, a modern .NET developer isn’t just a coder. They’re a system builder, balancing application development, cloud architecture, DevOps, security, and increasingly, AI-driven decisions.

One Feature, Many Disciplines

Consider a typical modern feature:

  • A scheduled job populates data into a database.
  • That data feeds reporting tools like Power BI.
  • Deployment pipelines push updates across environments worldwide.
  • Cloud services scale automatically under load.
  • Monitoring and security controls are part of the delivery.

One feature now touches multiple domains. Delivering it requires understanding infrastructure, automation, data, deployment, and operations—not just application logic.

The scope of the role has expanded dramatically.

Fundamentals Still Matter

Despite all the change, the core skills haven’t disappeared.

Developers still need to:

  • Build REST APIs that handle real-world load
  • Write efficient Entity Framework queries
  • Understand async/await and concurrency
  • Maintain clean, maintainable codebases

Bad fundamentals still break systems, regardless of how modern the infrastructure is.

But fundamentals alone are no longer enough.

Cloud Decisions Are Now Developer Decisions

In many teams, developers now influence—or directly make—architecture decisions:

  • Should this workload run in App Service, Containers, or Functions?
  • Should data live in SQL Server or Cosmos DB?
  • Do we need messaging via Service Bus or event-driven patterns?

These choices affect cost, scalability, reliability, and operational complexity. Developers increasingly need architectural awareness, not just coding ability.

DevOps Is Part of the Job

Deployment is no longer someone else’s responsibility.

Modern developers are expected to:

  • Build CI/CD pipelines that deploy automatically
  • Containerize services using Docker
  • Ensure logs, metrics, and monitoring are available
  • Support production reliability

The boundary between development and operations has largely disappeared.

Security Is Developer-Owned

Security has shifted left.

Developers now regularly deal with:

  • OAuth and identity flows
  • Microsoft Entra ID integration
  • Secure data handling
  • API protection and access control

Security mistakes are expensive, and modern developers are expected to understand the implications of their implementations.

AI Changes How We Work

Another shift is happening quietly.

In the past, developers searched for how to implement something. Today, AI tools increasingly help answer higher-level questions:

  • What are the long-term tradeoffs of this architecture?
  • How will this scale?
  • What operational risks am I introducing?

The developer’s role moves from solving isolated technical problems to designing sustainable systems.

From Specialist to Swiss Army Knife

The modern .NET developer is no longer just a backend specialist. They are expected to be adaptable:

  • Application developer
  • Cloud architect
  • DevOps contributor
  • Security implementer
  • Systems thinker

Not every developer must master every area—but awareness across domains is increasingly required.

The New Reality

The job has evolved from writing features to building systems.

And while that can feel overwhelming, it’s also exciting. Developers now influence architecture, scalability, reliability, and user experience at a system-wide level.

The industry hasn’t just changed what we build.

It’s changed what it means to be a developer.

And in 2026, being versatile isn’t optional—it’s the job.

WordPress on Azure Container Apps (ACA)

Architecture, Backup, and Recovery Design

1. Overview

This document describes the production architecture for WordPress running on Azure Container Apps (ACA) with MariaDB, including backup, recovery, monitoring, and automation. The design prioritizes:

  • Low operational overhead
  • Cost efficiency
  • Clear separation of concerns
  • Fast, predictable recovery
  • No dependency on VM-based services or Backup Vault

This architecture is suitable for long-term operation (multi‑year) with minimal maintenance.


2. High-Level Architecture

Core Components

  • Azure Container Apps Environment
    • Hosts WordPress and MariaDB container apps
  • WordPress Container App (ca-wp)
    • Apache + PHP WordPress image
    • Stateless container
    • Persistent content via Azure Files
  • MariaDB Container App (ca-mariadb)
    • Dedicated container app
    • Internal-only access
    • Database for WordPress
  • Azure Files (Storage Account: st4wpaca)
    • File share: wpcontent
    • Mounted into WordPress container
    • Stores plugins, themes, uploads, logs
  • Azure Blob Storage
    • Stores MariaDB logical backups (.sql.gz)

3. Data Persistence Model

WordPress Files

  • wp-content directory is mounted to Azure Files
  • Includes:
    • Plugins
    • Themes
    • Uploads
    • Logs (debug.log)

Database

  • MariaDB runs inside its own container
  • No local persistence assumed
  • Database durability ensured via daily logical backups

4. Backup Architecture

4.1 WordPress Files Backup (Primary)

Method: Azure Files Share Snapshots

  • Daily snapshots of wpcontent file share
  • Snapshot creation automated via Azure Automation Runbook
  • Retention enforced (e.g., 14 days)

Why this works well:

  • Instant snapshot creation
  • Very fast restore
  • Extremely low cost
  • No application involvement

4.2 MariaDB Backup (Primary)

Method: Logical database dumps (mysqldump)

  • Implemented via Azure Container App Jobs
  • Backup job runs on schedule (daily)
  • Output compressed SQL file
  • Stored in Azure Blob Storage

Additional Jobs:

  • Cleanup job to enforce retention
  • Restore job for controlled database recovery

4.3 Backup Automation

Azure Automation Account (aa-wp-backup)

  • Central automation control plane
  • Uses system-assigned managed identity
  • Hosts multiple runbooks:
    • Azure Files snapshot creation
    • Snapshot retention cleanup

Key Vault Integration:

  • Secrets stored in kv-tanolis-app
    • Storage account key
    • MariaDB host
    • MariaDB user
    • MariaDB password
    • MariaDB database name
  • Automation and jobs retrieve secrets securely

5. Restore Scenarios

Scenario 1: Restore WordPress Files Only

Use case:

  • Plugin or theme deletion
  • Media loss

Steps:

  1. Select Azure Files snapshot for wpcontent
  2. Restore entire share or specific folders
  3. Restart WordPress container app

Scenario 2: Restore Database Only

Use case:

  • Content corruption
  • Bad plugin update

Steps:

  1. Download appropriate SQL backup from Blob
  2. Execute restore job or import via MariaDB container
  3. Restart WordPress container
  4. Save permalinks in WordPress admin

Scenario 3: Full Site Restore

Use case:

  • Major failure
  • Security incident
  • Rollback to known-good state

Steps:

  1. Restore Azure Files snapshot
  2. Restore matching MariaDB backup
  3. Restart WordPress container
  4. Validate site and permalinks

6. Monitoring & Alerting

Logging

  • Azure Container Apps logs
  • WordPress debug log (wp-content/debug.log)

Alerts

  • MariaDB backup job failure alert
  • Container restart alerts
  • Optional resource utilization alerts

External Monitoring

  • HTTP uptime checks for site availability

7. Security Considerations

  • No public access to MariaDB container
  • Secrets stored only in Azure Key Vault
  • Managed Identity used for automation
  • No credentials embedded in scripts
  • Optional IP restrictions for /wp-admin

8. Cost Characteristics

  • Azure Files snapshots: very low cost (delta-based)
  • Azure Blob backups: pennies/month
  • Azure Automation: within free tier for typical usage
  • No Backup Vault protected-instance fees

Overall cost remains low single-digit USD/month for backups.


9. Operational Best Practices

  • Test restore procedures quarterly
  • Keep file and DB backups aligned by date
  • Maintain at least 7–14 days retention
  • Restart WordPress container after restores
  • Document restore steps for operators

10. Summary

This architecture delivers:

  • Reliable backups without over-engineering
  • Fast and predictable recovery
  • Minimal cost
  • Clear operational boundaries
  • Long-term maintainability

It is well-suited for WordPress workloads running on Azure Container Apps and avoids VM-centric or legacy backup models.

How Azure Handles Large File Uploads: From Blob Storage to Event-Driven Processing (and What Breaks at 2AM)

Uploading a large file to Azure sounds simple — until you need to process it reliably, at scale, with retries, alerts, and zero surprises at 2AM.

This article walks through how Azure actually handles large file uploads, using a 10-GB video as a concrete example, and then dives into real-world failure modes that show up only in production.

We’ll cover:

  • How Azure uploads large files safely
  • When and how events are emitted
  • How Functions and queues fit together
  • Why retries and poison queues exist
  • What silently breaks when nobody is watching

Azure Blob Storage: Large Files, Small Pieces

Azure Blob Storage supports extremely large files — but never uploads them in a single request.

Most files are stored as block blobs, which are composed of many independently uploaded blocks.

Block blob limits (the important ones)

  • Max block size: 4 GiB
  • Max blocks per blob: 50,000
  • Max blob size: ~190 TiB

Example: Uploading a 10-GB video

A 10-GB video is uploaded as:

  • Block 1: 4 GB
  • Block 2: 4 GB
  • Block 3: ~2 GB

Each block is uploaded with Put Block, and once all blocks are present, a final Put Block List call commits the blob.

Key insight: Blocks are an upload implementation detail. Once committed, the blob is treated as a single file.

Client tools like AzCopy, Azure SDKs, and Storage Explorer handle this chunking automatically.


When Does Azure Emit an Event?

Uploading blocks does not trigger processing.

Events are emitted only after the blob is fully committed.

This is where Azure Event Grid comes in.

BlobCreated event flow

  1. Final Put Block List completes
  2. Blob Storage emits a BlobCreated event
  3. Event Grid routes the event to subscribers

Important: Event Grid fires once per blob, not once per block.

This guarantees downstream systems never see partial uploads.


Azure Functions: Reacting to Blob Uploads

Azure Functions does not poll Blob Storage in modern designs. Instead, it reacts to events.

Two trigger models (only one you should use)

  • Event Grid trigger (recommended)
    Push-based, near real-time, scalable
  • Classic Blob trigger (legacy)
    Polling-based, slower, less predictable

In production architectures, Event Grid–based triggers are the standard.


Why Queues Are Inserted into the Pipeline

Direct processing works — until load increases or dependencies slow down.

This is why many designs add a queue:

Azure Storage Queue

Blob uploaded
   ↓
Event Grid event
   ↓
Azure Function
   ↓
Message written to queue

Queues provide:

  • Backpressure
  • Retry handling
  • Isolation between ingestion and processing
  • Protection against traffic spikes

Visibility Timeouts: How Retries Actually Work

Storage queues don’t use acknowledgments. Instead, they rely on visibility timeouts.

What is a visibility timeout?

When a worker dequeues a message:

  • The message becomes invisible for a configured period
  • If processing succeeds → message is deleted
  • If processing fails → message becomes visible again

Each retry increments DequeueCount.

This is the foundation of retry behavior in Azure Storage Queues.


Poison Queues: When Retries Must Stop

Retries should never be infinite.

With Azure Functions + Storage Queues:

  • Once maxDequeueCount is exceeded
  • The message is automatically moved to: <queue-name>-poison

Poison queues:

  • Prevent endless retry loops
  • Preserve failed messages for investigation
  • Enable alerting and replay workflows

Failure Modes: “What Breaks at 2AM?”

This is where systems separate happy-path demos from production-ready architectures.

Most failures don’t look like outages — they look like silent degradation.


1️⃣ Event Grid Delivery Failures

Symptom: Blob exists, but processing never starts.

Cause

  • Subscription misconfiguration
  • Endpoint unavailable
  • Permission or auth issues

Mitigation

  • Enable Event Grid dead-lettering
  • Monitor delivery failure metrics
  • Build replay logic

2AM reality: Files are uploaded — nothing processes them.


2️⃣ Duplicate Event Delivery

Symptom: Same file processed twice.

Why
Event Grid guarantees at-least-once delivery, not exactly-once.

Mitigation

  • Idempotent processing
  • Track blob names, ETags, or IDs
  • Reject duplicates at the application layer

2AM reality: Duplicate records, duplicate invoices, duplicate emails.


3️⃣ Function Timeouts on Large Files

Symptom: Processing restarts or never completes.

Cause

  • Large file downloads
  • CPU-heavy transformations
  • Insufficient plan sizing

Mitigation

  • Increase visibility timeout
  • Stream blobs instead of loading into memory
  • Offload heavy work to batch or container jobs

2AM reality: Queue backlog grows quietly.


4️⃣ Queue Backlog Explosion

Symptom: Queue depth grows uncontrollably.

Cause

  • Ingestion spikes
  • Downstream throttling
  • Scaling limits

Mitigation

  • Monitor queue length and age
  • Scale consumers
  • Add rate limiting or backpressure

2AM reality: Customers ask why files are “stuck.”


5️⃣ Poison Queue Flood

Symptom: Many messages land in -poison.

Cause

  • Bad file formats
  • Schema changes
  • Logic bugs

Mitigation

  • Alert on poison queue count > 0
  • Log full failure context
  • Build replay workflows

2AM reality: Work is failing — but nobody is alerted.


6️⃣ Storage Cost Spikes from Retries

Symptom: Azure Storage bill jumps unexpectedly.

Cause

  • Short visibility timeouts
  • Repeated blob downloads
  • Excessive retries

Mitigation

  • Tune visibility timeouts
  • Cache progress
  • Monitor transaction counts, not just data size

2AM reality: Finance notices before engineering does.


7️⃣ Partial or Corrupted Uploads

Symptom: Function triggers but input file is invalid.

Cause

  • Client aborted uploads
  • Corrupted block lists
  • Non-atomic upload logic

Mitigation

  • Validate file size and checksum
  • Enforce minimum size thresholds
  • Delay processing until integrity checks pass

8️⃣ Downstream Dependency Failures

Symptom: Upload succeeds — final destination fails (SharePoint, APIs, DBs).

Mitigation

  • Exponential backoff
  • Dead-letter after max retries
  • Store intermediate results for replay

2AM reality: Azure is healthy — the external system isn’t.


9️⃣ Silent Failure (The Worst One)

Symptom: System is broken — nobody knows.

Fix
Monitor:

  • Function failure rates
  • Queue depth and age
  • Poison queue counts
  • Event Grid delivery failures

Final Takeaway

Large files in Azure Blob Storage are uploaded in blocks, but Event Grid emits a single event only after the blob is fully committed. Azure Functions react to that event, often enqueueing work for durable processing. Visibility timeouts handle retries, poison queues stop infinite failures, and production readiness depends on designing for duplicate events, backlogs, cost creep, and observability — not just the happy path.