The Taxonomy of Enterprise Agentic Systems with Microsoft Agent Framework

The pace of change in the agentic AI ecosystem over the past 18 months has been relentless.

New frameworks, orchestration models, SDKs, runtimes, and tooling ecosystems are appearing almost weekly. Alongside Microsoft’s own evolution through Semantic Kernel and AutoGen, the broader market has rapidly adopted frameworks such as LangChain, LangGraph, CrewAI, and an expanding ecosystem of open-source orchestration runtimes.

At the same time, many teams are bypassing frameworks entirely. Some are directly integrating with model APIs. Others are rapidly vibe-coding assistants, stitching together orchestration with custom Python or JavaScript, or experimenting with highly autonomous systems built on emerging runtimes.

That experimentation is valuable because it has accelerated the industry’s understanding of what agentic systems can become. But as enterprise adoption increases, a different challenge begins to emerge. The challenge is no longer simply how to build agents. The challenge is how to build governed, observable, scalable, and operationally reliable agentic systems that can operate as enterprise software rather than isolated demos.

Enterprise agent systems are not prompts wrapped in chat interfaces. They are distributed runtime systems composed of reasoning, orchestration, middleware, memory, tools, grounding systems, governance boundaries, deterministic execution paths, and operational controls.

Over the past year, I’ve been experimenting heavily with Microsoft Agent Framework (MAF) as Microsoft’s long-term direction for enterprise agent orchestration. MAF represents the convergence of lessons learned from Semantic Kernel and AutoGen, combined with the operational requirements Microsoft has identified through enterprise adoption of AI systems on Microsoft Foundry.

I first started working with MAF shortly after its early previews in mid-2025 and have followed its evolution closely through to General Availability earlier this year. One of the key reasons I gravitated toward Microsoft Agent Framework was not simply the framework itself, but its first-class alignment to Microsoft Foundry and the broader Microsoft AI platform strategy.

With the current rate of change across the industry, I increasingly wanted:

a framework aligned to the platform where enterprise AI services are actually operationalised
consistent orchestration primitives
integrated governance and evaluation capabilities
first-class support for hosted runtimes
and an approach that felt strategically aligned to where Microsoft is clearly investing long term

What has become increasingly clear throughout this journey is that enterprise AI systems require architectural discipline. The organisations that succeed with agentic systems will not necessarily be the ones building the most autonomous agents. They will be the ones building systems that combine orchestration, governance, observability, memory, and deterministic execution into something the business can actually trust.

The Enterprise Agent Stack

One of the biggest challenges in the current agentic AI landscape is that many terms are used interchangeably. “Agent”, “copilot”, “workflow”, “orchestration”, “memory”, “RAG”, “tool calling”, and “automation” are often blended together despite representing very different responsibilities within a system.

As solutions mature beyond simple chat experiences, having a consistent architectural vocabulary becomes increasingly important, particularly when multiple engineering teams, architects, and operational stakeholders are involved.

When working with Microsoft Agent Framework and Microsoft Foundry, I’ve found it useful to think about enterprise agent systems as a layered architecture composed of distinct responsibilities.

MAF Stack

Foundry as the Control Plane

Microsoft Foundry is evolving beyond being simply a place to deploy models and experiment with prompts. Increasingly, it is becoming the operational control plane for enterprise AI systems, bringing together model routing, evaluations, tracing, lifecycle management, governance, and identity integration into a unified platform surface.

This matters because enterprise AI systems require far more than model hosting. They need operational visibility, governance controls, security boundaries, evaluation frameworks, deployment management, and runtime observability.

Microsoft Agent Framework as the Orchestration Layer

At the centre of the stack sits Microsoft Agent Framework. Rather than viewing MAF purely as an SDK, I increasingly view it as a composable orchestration abstraction for enterprise agent systems. The framework introduces architectural primitives such as agents, threads, runs, workflows, tools, middleware, orchestration patterns, and runtime controls.

What I’ve appreciated most while experimenting with MAF over the past year is that it feels increasingly aligned to the operational realities of enterprise AI systems rather than simply optimising for standalone chatbot experiences.

Understanding Microsoft’s Agent Development Models

Microsoft Foundry currently supports three primary agent development models, each representing a different level of orchestration ownership, runtime control, and operational abstraction. You can read more about these models in the official Microsoft documentation, but at a high level they can be thought of as progressively increasing levels of orchestration ownership.

Agent Types

Prompt agents are configuration-driven agents built using instructions, models, and tools, with Microsoft Foundry handling orchestration and hosting automatically. This model provides the highest level of abstraction and the fastest path to building functional AI experiences, making it well suited to lightweight assistants, internal productivity scenarios, and rapid prototyping.
Workflow agents introduce declarative orchestration for coordinating multi-step execution and multi-agent interactions. Workflows can include branching logic, approvals, sequential execution paths, and human-in-the-loop patterns. This model shifts focus away from pure prompt engineering and toward explicit orchestration design for structured automation scenarios.
Hosted agents are code-based agents deployed as containers on Microsoft Foundry Agent Service. In this model, developers own the orchestration logic directly while Foundry manages the underlying hosting, scaling, and runtime infrastructure. This model provides the highest degree of runtime flexibility and operational ownership, making it better suited to enterprise-scale agentic systems with complex orchestration requirements.

The important architectural distinction is not simply how the agent is built, but who owns orchestration complexity. Prompt agents optimise for speed and abstraction. Workflow agents optimise for structured orchestration. Hosted agents optimise for runtime flexibility and operational ownership.

DevUI and Local Orchestration Development

One of the challenges with agentic systems is that orchestration can quickly become difficult to reason about once multiple agents, workflows, tools, memory systems, and middleware layers are involved. Unlike traditional applications, the execution path is often dynamic and partially non-deterministic. Understanding why an agent selected a tool, how context evolved during execution, or where a workflow diverged becomes critically important during development.

This is where DevUI becomes extremely valuable. Microsoft Agent Framework includes DevUI as a local orchestration development and debugging experience for inspecting workflows, runs, threads, tool execution, events, state transitions, and orchestration behaviour during development. Rather than treating orchestration as a black box, DevUI gives developers visibility into how the system is actually behaving while workflows execute.

This becomes especially useful when experimenting with multi-agent orchestration patterns, middleware execution, memory updates, human-in-the-loop flows, or complex tool interactions. Developers can inspect orchestration state in real time, validate execution paths, observe intermediate outputs, and identify where reasoning or workflow behaviour diverges from expectation.

Importantly, this reinforces a broader architectural point. Enterprise agentic systems still require mature engineering workflows. As orchestration complexity increases, teams need ways to test, debug, observe, and validate agent behaviour before promoting systems into production. DevUI helps close the gap between local experimentation and operational runtime behaviour while supporting the broader goal of building observable and governable enterprise agentic systems.

DevUI

Runtime Architecture

Enterprise agentic systems already introduce complexity through orchestration, memory, grounding, reasoning, and tool execution. The surrounding runtime architecture should compensate with simplicity, reliability, and operational predictability rather than amplifying complexity further. While Microsoft Foundry provides the foundational AI platform - models, observability, evaluations, governance, and hosted agent capabilities - enterprise orchestration itself should remain an explicit runtime concern.

A pragmatic and highly effective pattern for this is running orchestration frameworks within Azure Container Apps. This provides strong local development-to-production parity, lightweight operational overhead, portable deployment boundaries, event-driven scaling, and a consistent runtime model between local development and cloud execution.

I personally prefer to use Azure Container Apps to operate agents leveraging Microsoft Agent Framework. A container represents a logical orchestration boundary rather than a single isolated agent. A useful mental model is to treat a container as a domain-aligned business unit containing multiple related agents operating collectively through orchestration patterns.

A sales operations runtime, finance runtime, or customer support runtime may each operate within their own orchestration boundary. This creates clearer ownership, deployment, and security separation between domains while keeping operational concerns physically contained.

Azure Container Apps

Agentic Orchestration Patterns

Microsoft Agent Framework can absolutely be used to build simple, singular agents. Its real strength, however, emerges when those agents need to operate as part of a broader orchestration model. At enterprise scale, the challenge is rarely “can we create an agent?” The harder question is how multiple agents, tools, approvals, workflows, and deterministic systems coordinate together in a way that is reliable, observable, governable, and operationally useful.

This is where orchestration becomes the primary design concern.

Microsoft Agent Framework orchestration patterns include sequential, concurrent, handoff, group chat, and magentic orchestration. Human-in-the-loop interactions through approvals and request-info flows are also supported.

Enterprise value does not come from creating more agents. It comes from orchestrating agents around business outcomes. A simple agent may answer a question or perform a task. An orchestrated agentic system can break work into stages, involve specialist agents, invoke deterministic tools, pause for approval, observe results, and continue execution dynamically based on context.

In practical terms, orchestration is where agentic systems begin to resemble repeatable human workflows performed by knowledge workers. The goal is not to make every process fully autonomous. The goal is to select the orchestration pattern that matches the business problem, required level of control, and acceptable level of autonomy.

Sequential Orchestration

Sequential orchestration is the simplest and most controlled orchestration pattern. In this model, agents execute in a defined order, passing context and results between stages. Each agent or activity owns a specific responsibility and execution flows linearly from start to finish.

Sequential orchestration

The value of this pattern is clarity. Each step has a defined responsibility, failures are easier to isolate, and controls can be inserted at predictable points. Human approval can occur before an action is taken. Deterministic systems can be called only after the required context has been gathered.

This works especially well when the underlying business process is already structured.

Sequential orchestration is often ideal for the dull and repetitive work that consumes human time without requiring significant judgement. It may not look revolutionary, but it can create enormous value by removing preparation, classification, routing, and drafting work from humans so they can focus on exceptions, conversations, and decisions that actually require human attention.

Concurrent Orchestration

Concurrent orchestration allows multiple agents or activities to execute in parallel before results are consolidated.

Concurrent orchestration

This pattern becomes valuable when different forms of analysis, retrieval, or reasoning can occur independently. One agent may assess commercial risk. Another may review legal considerations. Another may evaluate technical feasibility. Another may compare against historical examples. The value here is not simply speed, but the quality improvements that emerge from multiple specialised reasoning paths operating in parallel.

Using multiple agents with different instructions, roles, data sources, or even models can reduce the likelihood that a single reasoning path dominates the outcome. Different perspectives can expose blind spots, challenge assumptions, and improve overall decision quality.

Concurrent orchestration works particularly well for research, proposal generation, compliance analysis, policy review, and multi-domain assessment. The trade-off is coordination complexity. Parallel agents may disagree, duplicate work, or produce findings with different confidence levels. The orchestration layer needs a way to consolidate and resolve those outputs before presenting a final result.

Handoff Orchestration

Handoff orchestration allows responsibility to move dynamically between specialised agents based on context or intent.

Handoff orchestration

Rather than one large general-purpose agent attempting to solve every problem, orchestration routes execution toward the capability best suited to handle the next step. One agent may perform intake and triage before handing execution to a specialist agent for finance, legal, HR, operations, or technical analysis. This mirrors how work already moves through real organisations, and it also creates cleaner governance and security boundaries.

Different agents can have different tools, instructions, permissions, memory scopes, and policy constraints. A finance agent does not require the same permissions as a customer support agent. A legal review agent may need access to precedent and policy but not operational systems. This separation reduces blast radius and keeps responsibilities clearer.

Group Chat Orchestration

Group chat orchestration allows multiple agents to collaborate in a shared conversational context.

Group chat orchestration

Each agent can contribute insights, ask questions, challenge assumptions, and build on the outputs of other agents in real time. This resembles a brainstorming session or cross-functional meeting where multiple perspectives come together to solve a problem. For example, a proposal generation workflow might include a solution architect agent, a commercial agent, a delivery risk agent, and a customer value agent. The value is not simply task completion, it is better thinking.

However, group chat orchestration should be used deliberately. It can increase token consumption, execution time, and observability complexity. Without clear boundaries, agents can over-discuss, drift, or create unnecessary reasoning loops. This pattern works best when roles are clearly defined, outcomes are specific, and conversations have a clear stopping condition.

Magentic Orchestration

Magentic orchestration introduces a manager-style agent that dynamically coordinates specialised agents during runtime. Rather than defining every workflow path upfront, the manager agent decides which agents to invoke, how work should be coordinated, and how execution should adapt as context evolves.

Magentic orchestration

This pattern is valuable for ambiguous or evolving problems where the execution path is not known in advance. Examples include broad research, complex troubleshooting, dynamic planning, investigation workflows, and multi-stage advisory processes. Magentic orchestration offers flexibility and autonomy, but it also introduces substantially more complexity around observability, explainability, cost control, governance, and deterministic boundaries. This pattern should be applied deliberately where the business value justifies the additional autonomy rather than being treated as the default orchestration model.

Memory and Grounding

Enterprise agents are only useful when they understand the context they are operating in. A model on its own does not know your current policies, customer history, internal language, delivery standards, approval rules, pricing models, or the decisions your team made last week. This is where grounding and memory become critical.

Grounding provides the agent with relevant context at the time of execution. This may come from documents, knowledge bases, search indexes, business systems, records, previous examples, or structured data. In many architectures this is implemented through retrieval augmented generation, but grounding is broader than RAG.

Grounding is the discipline of giving the agent the right context, from the right sources, at the right time. Memory serves a different purpose.

Memory allows the agentic system to retain useful information across interactions and executions rather than treating every interaction as completely isolated. This becomes increasingly important as systems move from simple chat experiences into operational business workflows.

Like people, agents rarely arrive fully formed. Much of the most valuable organisational knowledge is never formally documented. The way a business actually operates often lives in edge cases, tribal knowledge, and operational nuance rather than process diagrams. Over time, strong teams build instinct. Enterprise agentic systems need mechanisms to build operational memory in a similar way. The most valuable operational knowledge is often learned through exception handling rather than happy-path execution. Human feedback therefore becomes one of the most important inputs into memory systems.

When an agent encounters ambiguity, edge cases, policy conflicts, or unfamiliar scenarios, humans provide guidance. Over time, those decisions can become part of the organisation’s operational memory, allowing the system to improve incrementally as it encounters new situations.

Memory and Grounding

Different forms of memory serve different purposes.

Short-term memory helps the agent maintain context within an interaction or workflow. This is useful for multi-step reasoning, handoffs, approvals, and conversational continuity.
Long-term memory helps the agent retain useful patterns across interactions. This may include preferences, recurring decisions, known edge cases, or domain-specific instructions.
Operational memory captures how work is progressing. This may include workflow state, decisions made, dependencies, approvals, unresolved issues, and actions already taken.
Institutional memory captures the organisation’s preferred way of working. This may include good examples, style preferences, risk appetite, delivery standards, pricing assumptions, or common exception handling patterns.

As memory systems mature, the distinction between “prompting” and “organisational learning” starts to blur. This is where enterprise agentic systems become fundamentally different from standalone chatbots. The system is no longer simply responding to prompts. It is gradually accumulating operational context about how the organisation actually works.

Tools, Skills, and Deterministic Capability Providers

The terminology can be confusing because “tool”, “skill” and “MCP” are often discussed together. A simple way to think about it is this:

Tools are callable actions exposed to an agent. They are the interface the agent can invoke during reasoning. In Agent Framework, tools can include function tools, code interpreter, file search, web search, hosted MCP tools, local MCP tools, and Foundry toolboxes.
Skills are packaged capabilities. A skill is best thought of as a reusable business capability or task-level behaviour. It may be implemented using one or more tools, prompts, APIs, workflows, or agents.
MCP is a protocol for connecting agents to tools and external capability providers. MCPs may run locally within the same runtime boundary or remotely as a hosted service. The important distinction is not whether something is “MCP” or “not MCP”, but where the capability runs, who operates it, and what trust boundary it crosses.

Agents create value when they can move beyond reasoning and safely interact with the business. That interaction should not happen through vague, uncontrolled access to enterprise systems. It should happen through deliberate, well-scoped capabilities.

Deterministic Boundaries

The agent may decide what needs to happen next, but the execution of important business actions should remain predictable, permissioned, observable, and repeatable. Pricing calculations, approval workflows, identity changes, financial transactions, customer record updates, and policy checks should not be left to generative reasoning alone.

The agent reasons and coordinates. Deterministic capability providers execute.

A proposal agent might help draft a statement of work, but pricing calculations should still come from an approved pricing model. A support agent might interpret a customer request, but account changes should still flow through a controlled workflow. The more important the business action, the more important the deterministic boundary becomes.

Middleware and Runtime Controls

If tools define what an agent can do, middleware helps control how and when it does it. Middleware is a way to intercept, modify, and enhance agent interactions at different stages of execution. This allows cross-cutting concerns such as logging, security validation, error handling, and result transformation to sit outside the core agent or function logic where it is largely transparent to the agent. It does not need to be part of the agent’s persona or business instructions.

This matters because security checks, telemetry capture, approval handling, redaction, error handling, and result transformation should not need to be rewritten into every agent prompt or tool implementation. They should be reusable runtime controls that can be applied consistently. This makes middleware one of the most important enterprise control points in the architecture.

Middleware

For example, a policy might state that an agent cannot update a customer record without approval. Middleware can help enforce that by intercepting the tool call, checking the execution context, validating the request, logging the decision, and only allowing the deterministic capability provider to execute when the required conditions are met.

Evaluations and Observability

A working demo proves that an agent can produce a useful answer once. An enterprise system needs to prove that it can produce useful, safe, and reliable outcomes repeatedly, this is where evaluations and observability become critical. For agentic systems, this matters because traditional application testing is not enough.

A conventional application can often be validated by checking that the same input produces the same output. Agentic systems are different. They may retrieve different context, reason through different paths, select different tools, or adapt their response based on prior interaction. That flexibility creates value, but it also changes how quality needs to be managed.

A small prompt change, model upgrade, grounding adjustment, or tool modification may subtly alter how the agent reasons, responds, or selects actions. The system may still technically “work” while behaving differently in ways that materially affect quality, safety, cost, or operational reliability. This is why evaluations become a continuous operational concern rather than a one-time testing activity.

Observability provides the operational view. It should help teams understand what the agent was asked, what context it retrieved, what memory it used, what tools it selected, what decisions it made, which approvals were requested, what errors occurred, and what output was produced. That level of traceability is what turns agent behaviour from a black box into something teams can inspect and improve.

Infrastructure metrics still matter. Latency, errors, token consumption, throughput, and cost all need to be monitored. But agentic systems also need behavioural observability. That means tracing the reasoning path, retrieval path, tool path, approval path, and execution path.

Security

Security in agentic systems is not just application security with an AI label attached. Agents introduce new risks because they can reason over data, retrieve context, call tools, interpret instructions, and interact with enterprise systems. The security model needs to account for that broader execution surface. Importantly, these risks are manageable when agentic systems are designed with clear runtime boundaries, least-privilege access, deterministic execution paths, middleware controls, observability, and approval workflows.

The most important principle is still least privilege. An agent should only have access to the data, memory, tools, and deterministic capability providers required for its purpose. A proposal drafting agent does not need the ability to approve discounts. A customer support agent does not need unrestricted access to finance systems.

Prompt injection is another key concern. Enterprise agents will often retrieve content from documents, emails, tickets, web pages, knowledge bases, and other user-generated sources. Some of that content may contain instructions that attempt to manipulate the agent. The agent must treat retrieved content as data, not authority. Grounding sources should inform the response, but they should not override system instructions, security policy, tool constraints, or approval requirements.

Closing Thoughts — Building Governed Agentic Systems

Enterprise agentic systems are moving quickly from experimentation into architecture. The question is no longer whether an organisation can build an agent. Most teams can build something impressive with a model, a prompt, and a few tools. The harder question is whether that agentic system can be operated safely, improved over time, integrated into business processes, and trusted with real work.

The strongest agentic systems are not necessarily the most autonomous. They are the systems that apply the right level of autonomy to the right business problem. Sometimes that means a simple sequential workflow that removes repetitive preparation work from a team. Sometimes it means multiple specialist agents working in parallel to improve the quality of a recommendation. Sometimes it means handoff patterns that mirror the way work already moves through the organisation.

The architecture should follow the business problem, this is where Microsoft Agent Framework becomes important. It gives teams a way to move beyond isolated assistants and start building agentic systems as composable enterprise software. Agents, workflows, tools, middleware, memory, and evaluations can be treated as first-class architectural concerns rather than ad hoc implementation details.