Server-Side AI Architecture

Problem & Motivation

In traditional multiplayer game architectures, AI simulation runs alongside other critical systems (physics, networking, input handling) on the same game server thread. This creates several problems: AI updates compete for CPU time with frame-critical systems, a bug in AI behavior can crash the entire game server, and iterating on AI logic requires full server restarts that disconnect all players.

I wanted to explore an alternative: decoupling AI into a dedicated server process. The AI server runs independently, handling only NPC state machines, behavior trees, and spatial queries. The main game server continues to handle player input, physics, and networking without AI overhead. This separation provides several benefits:

Performance isolation - AI spikes don't affect player responsiveness
Hot-reload capability - The AI server can be restarted without disconnecting players
Faster iteration - AI logic changes don't require full game server rebuilds
Scalability - Multiple AI servers can handle different world regions

A second constraint shaped the design early on: bandwidth. Broadcasting all NPC state to all clients every tick doesn't scale. A player on one side of the map shouldn't receive updates about NPCs on the other side. Solving this cleanly, without coupling the AI system to the network layer, became one of the main design challenges.

Inspiration: This architecture was inspired by my experience with Knight Online , an MMORPG where I observed how tightly-coupled AI logic could impact server stability and iteration speed during development.

The result is a server-authoritative AI system where the simulation runs entirely on the dedicated AI server. Clients receive only the state they need to render NPCs correctly, they don't know why an NPC moved, only where it is and what animation state to play.

NPC Behavior System

The server implements a state machine with 8 NPC states and priority-based transitions:

Idle - Standing still, minimal movement
Patrol - Wandering within spawn area
Chase - Pursuing player (30m detection range)
Attack - Melee combat (2m range, 1.5s cooldown)
Retaliate - Responding to damage, alerts group
Fleeing - Running away when below 30% health
Stunned - Temporary incapacitation from heavy hits (>35% damage)
Rallying - Group leaders coordinating attacks

NPCs are organized into groups with leaders. When a member is attacked, the entire group is alerted. Leaders can rally group members, creating coordinated responses.

Architecture

The system is cross-platform by design. The server runs on Linux using BSD sockets, while the client runs on Windows using WinSock2 inside a custom game engine. Both sides share the same message protocol and serialization layer, keeping the network boundary platform-agnostic.

The system is split into three layers. The AI Manager ticks every NPC each frame, updating their state machines or behavior trees and writing new positions and states. A spatial index (quadtree) is rebuilt from those updated positions. The network layer then iterates over connected clients, queries the quadtree for each client's view region, and dispatches only the visible, changed subset.

The key property is that the AI Manager knows nothing about clients or sockets, and the network layer knows nothing about AI logic. The quadtree is the only shared structure, it takes positions in and returns entity IDs out.

Priority-Based State Machine

The state machine uses priority values - higher priority transitions are evaluated first. This ensures critical behaviors (like fleeing or stunning) override normal behaviors immediately.

State Machine NPCStateMachine.h - priority transitions

enum class NPCState : uint8_t
{
    Idle      = 0,
    Patrol    = 1,
    Chase     = 2,
    Attack    = 3,
    Dead      = 4,
    Retaliate = 5,
    Fleeing   = 6,
    Stunned   = 7,
    Rallying  = 8,
};

struct Transition
{
    NPCState               myToState;
    std::function<bool()>  myCondition;
    int                     myPriority = 0;
};

class NPCStateMachine
{
public:
    explicit NPCStateMachine(NPCState aInitialState)
        : myState(aInitialState), myPreviousState(aInitialState), myStateTime(0.0f) {}

    void AddTransition(NPCState aFrom, NPCState aTo, 
                      std::function<bool()> aCondition, int aPriority = 0)
    {
        auto& vec = myTransitions[aFrom];
        vec.push_back({aTo, std::move(aCondition), aPriority});
        std::sort(vec.begin(), vec.end(), [](const Transition& a, const Transition& b)
        {
            return a.myPriority > b.myPriority;
        });
    }

    void Tick(float aDeltaTime)
    {
        myStateTime += aDeltaTime;
        
        auto it = myTransitions.find(myState);
        if (it == myTransitions.end()) return;

        for (const auto& transition : it->second)
        {
            if (transition.myCondition())
            {
                myPreviousState = myState;
                myState = transition.myToState;
                myStateTime = 0.0f;
                return;
            }
        }
    }

    NPCState GetState() const { return myState; }
    void SetState(NPCState aState) { myState = aState; myStateTime = 0.0f; }

private:
    NPCState myState;
    NPCState myPreviousState;
    float    myStateTime;
    std::unordered_map<NPCState, std::vector<Transition>> myTransitions;
};

Group AI & Coordination

NPCs are spawned in organized groups with leaders. The group system enables:

Group Alerting - When one member is attacked, the whole group is notified
Leader Promotion - If a leader dies, the next alive member becomes leader
Rally Behavior - Leaders can call group members to converge on threats
Spatial Clustering - Groups spawn in clusters (30m spread)

Group System NPCGroup structure and alert logic

struct NPCGroup
{
    uint32_t              id;
    uint32_t              leaderId;
    std::vector<uint32_t> memberIds;
    bool                  isAlerted;
    uint32_t              alertTargetId;
};

void AIServer::AlertGroup(uint32_t groupId, uint32_t targetClientId)
{
    auto git = m_groups.find(groupId);
    if (git == m_groups.end()) return;

    NPCGroup& group = git->second;
    group.isAlerted = true;
    group.alertTargetId = targetClientId;

    for (uint32_t memberId : group.memberIds)
    {
        auto* m = GetNPCMutable(memberId);
        if (!m || !m->isAlive) continue;
        m->aggroTarget = targetClientId;
        m->retaliateTimer = RETALIATE_DURATION;
        
        if (m->isLeader)
            m->stateMachine->SetState(NPCState::Rallying);
        else if (m->state == NPCState::Idle || m->state == NPCState::Patrol)
            m->stateMachine->SetState(NPCState::Retaliate);
    }
}

Process

The project was built outside-in: get bytes moving between two programs first, then layer AI on top of a working network foundation, then optimize. Skipping steps would have made debugging significantly harder, a broken AI system and a broken network layer failing at the same time is very hard to untangle.

Week 1 - 2 - Getting bytes moving

The first goal was deliberately minimal: a server that accepts a UDP packet and sends one back. Nothing else. Before writing any AI, I needed verification that the socket layer worked and that I understood what was landing on the other side.

I built a small custom server class rather than reaching for a library. The server tracks connected client sessions, handles message dispatch, and exposes a QueueMessage() API that buffers outgoing messages and flushes them as a single UDP packet at frame end. This keeps call sites clean, callers just queue and forget:

Networking Server.h - client session and QueueMessage

struct ClientSession
{
    ClientAddress                         address{};
    std::string                           username{};
    NetworkID                             id{};

    NetworkEngine::NetworkBuffer          buffer;
    uint32_t                              lastProcessedSequenceNumber{};
    uint32_t                              lastReceivedSequenceNumber{};
    float                                 avgSequenceNumberDelta{};

    std::chrono::system_clock::time_point lastReceivedPacketTime;
};

// Queues a message into the client's buffer.
// Everything in the buffer is sent as one UDP packet at end of frame.
template<typename TMessage>
bool Server::QueueMessage(TMessage&& aMsg, const ClientAddress& anAddress)
{
    auto client = GetClient(anAddress);
    if (!client) return false;

    if constexpr (std::is_base_of_v<NetworkEngine::NetworkMessage, TMessage>)
    {
        return aMsg.Serialize(client->buffer);
    }
    return false;
}

With a working echo test, the next step was connecting the custom game client and confirming that a hardcoded position sent from the server showed up in the right place in the engine. No AI yet. This is also where the 1024-byte MTU became concrete, at that size, message layout needs to be deliberate from the start.

Week 3 - 4 - First AI on the server

With the transport stable, I added the first NPC. The starting point was a simple state machine: Idle, Patrol, Chase, Attack. Each NPC holds one and ticks it every server frame. The state machine is a transition table, each state maps to a list of guarded transitions evaluated in order.

State Machine NPCStateMachine.h - transition table and tick

#pragma once
#include <functional>
#include <unordered_map>
#include <vector>

enum class NPCState : uint8_t
{
    Idle   = 0,
    Patrol = 1,
    Chase  = 2,
    Attack = 3,
};

struct Transition
{
    NPCState               to;
    std::function<bool()>  condition;
};

class NPCStateMachine
{
public:
    explicit NPCStateMachine(NPCState aInitialState)
        : myState(aInitialState) {}

    void AddTransition(NPCState aFrom, NPCState aTo, std::function<bool()> aCondition)
    {
        myTransitions[aFrom].push_back({ aTo, std::move(aCondition) });
    }

    void Tick()
    {
        auto it = myTransitions.find(myState);
        if (it == myTransitions.end()) return;

        for (const auto& transition : it->second)
        {
            if (transition.condition())
            {
                myState = transition.to;
                return;
            }
        }
    }

    NPCState GetState() const { return myState; }

private:
    NPCState                                                myState;
    std::unordered_map<NPCState, std::vector<Transition>> myTransitions;
};

The first version broadcast full NPC state to all clients every tick. With a few NPCs in a test scene this was fine. The problem became obvious when simulating more clients, the first culling pass used a simple per-client radius check before queuing. Crude, but enough to validate the idea before spending time on a proper spatial structure.

Week 5 - 6 - Behavior trees and the quadtree

The state machine worked, but adding a new NPC type meant writing a new transition table from scratch. It also couldn't express parallel concerns cleanly, you'd want an NPC to patrol and monitor health simultaneously, and flat transition tables make that awkward. A behavior tree solves both problems: composites let you reuse subtrees across NPC types, and you can wire the same leaf node into ten different trees without duplicating it.

Behavior Tree BehaviorTree.h - Sequence, Selector, leaf node example

#pragma once
#include <memory>
#include <vector>

enum class BTStatus { Running, Success, Failure };

class BTNode
{
public:
    virtual ~BTNode() = default;
    virtual BTStatus Tick() = 0;
};

// Runs children left to right, fails on first child failure
class Sequence : public BTNode
{
public:
    void Add(std::unique_ptr<BTNode> aChild)
    {
        myChildren.push_back(std::move(aChild));
    }

    BTStatus Tick() override
    {
        for (auto& child : myChildren)
        {
            BTStatus status = child->Tick();
            if (status != BTStatus::Success) return status;
        }
        return BTStatus::Success;
    }
private:
    std::vector<std::unique_ptr<BTNode>> myChildren;
};

// Runs children left to right, succeeds on first child success
class Selector : public BTNode
{
public:
    void Add(std::unique_ptr<BTNode> aChild)
    {
        myChildren.push_back(std::move(aChild));
    }

    BTStatus Tick() override
    {
        for (auto& child : myChildren)
        {
            BTStatus status = child->Tick();
            if (status != BTStatus::Failure) return status;
        }
        return BTStatus::Failure;
    }
private:
    std::vector<std::unique_ptr<BTNode>> myChildren;
};

// Leaf: succeeds if closest player is within aggro range
class IsPlayerNearby : public BTNode
{
public:
    IsPlayerNearby(const Blackboard& aBlackboard, float aRange)
        : myBlackboard(aBlackboard), myRange(aRange) {}

    BTStatus Tick() override
    {
        return myBlackboard.GetClosestPlayerDistance() <= myRange
            ? BTStatus::Success
            : BTStatus::Failure;
    }
private:
    const Blackboard& myBlackboard;
    float             myRange;
};

The radius culling from the previous week was O(n·m), every client iterated over every NPC. That's fine for small counts but degrades fast. Replacing it with a quadtree brought the per-client query cost down to roughly O(log n + k) where k is the result count. The tree is rebuilt from scratch each tick rather than maintained dynamically, simpler to implement and fast enough given the scale.

Culling Quadtree.h - spatial partitioning, AABB query

#pragma once
#include <array>
#include <memory>
#include <vector>
#include <cmath>

struct AABB
{
    float x{}, y{}, halfW{}, halfH{};

    bool Contains(float aPx, float aPy) const
    {
        return aPx >= x - halfW && aPx <= x + halfW
            && aPy >= y - halfH && aPy <= y + halfH;
    }

    bool Intersects(const AABB& aOther) const
    {
        return std::abs(x - aOther.x) < halfW + aOther.halfW
            && std::abs(y - aOther.y) < halfH + aOther.halfH;
    }
};

template<typename T, int Capacity = 8>
class Quadtree
{
public:
    explicit Quadtree(AABB aBounds, int aDepth = 0)
        : myBounds(aBounds), myDepth(aDepth) {}

    void Insert(T aItem, float aX, float aY)
    {
        if (!myBounds.Contains(aX, aY)) return;

        if (myItems.size() < Capacity || myDepth >= 6)
        {
            myItems.push_back({ aItem, aX, aY });
            return;
        }

        if (!myChildren[0]) Subdivide();
        for (auto& child : myChildren) child->Insert(aItem, aX, aY);
    }

    void Query(const AABB& aRange, std::vector<T>& outResults) const
    {
        if (!myBounds.Intersects(aRange)) return;

        for (const auto& entry : myItems)
            if (aRange.Contains(entry.x, entry.y))
                outResults.push_back(entry.item);

        for (const auto& child : myChildren)
            if (child) child->Query(aRange, outResults);
    }

    void Clear()
    {
        myItems.clear();
        for (auto& child : myChildren) child.reset();
    }

private:
    struct Entry { T item; float x, y; };

    void Subdivide()
    {
        float hw = myBounds.halfW * 0.5f;
        float hh = myBounds.halfH * 0.5f;
        myChildren[0] = std::make_unique<Quadtree>(AABB{ myBounds.x - hw, myBounds.y - hh, hw, hh }, myDepth + 1);
        myChildren[1] = std::make_unique<Quadtree>(AABB{ myBounds.x + hw, myBounds.y - hh, hw, hh }, myDepth + 1);
        myChildren[2] = std::make_unique<Quadtree>(AABB{ myBounds.x - hw, myBounds.y + hh, hw, hh }, myDepth + 1);
        myChildren[3] = std::make_unique<Quadtree>(AABB{ myBounds.x + hw, myBounds.y + hh, hw, hh }, myDepth + 1);
    }

    AABB                                    myBounds;
    int                                     myDepth;
    std::vector<Entry>                    myItems;
    std::array<std::unique_ptr<Quadtree>, 4> myChildren{};
};

Week 6 - 7 - Delta compression and dispatch

With culling in place, the remaining bandwidth problem was redundancy: sending the full state of every visible NPC every tick, even when nothing had changed. An idle NPC contributes the same bytes every frame for no reason. Delta compression addresses this by only sending state that has actually changed since the last frame sent to that client.

The server keeps a per-client snapshot of last-sent state. On dispatch, it diffs current state against the snapshot, if position and state are both identical, the entity is skipped. Only changed entities go into the outgoing packet. This kept packet sizes small and stable even with many NPCs in range, as long as most weren't actively moving.

Dispatch itself is a straightforward loop over culled results, queuing one message per changed entity, all flushed as one packet at frame end:

Dispatch AIServer.cpp, cull, delta check, QueueMessage

void AIServer::DispatchAIUpdates()
{
    for (const auto& [address, session] : myServer.GetClients())
    {
        AABB localViewBox{ session.worldX, session.worldY, 500.0f, 500.0f };

        std::vector<EntityID> outVisible;
        myQuadtree.Query(localViewBox, outVisible);

        for (EntityID id : outVisible)
        {
            const NPCSnapshot& current = myNPCManager.GetSnapshot(id);
            const NPCSnapshot& last    = myLastSentState[session.id][id];

            if (current.pos == last.pos && current.state == last.state)
                continue;

            myServer.QueueMessage(
                MessageNPCState{ id, current.pos, current.state },
                address
            );

            myLastSentState[session.id][id] = current;
        }
    }
}

At 1024-byte MTU, message budget matters. Each MessageNPCState carries a 4-byte entity ID, two 4-byte floats for position, and a 1-byte state enum, 13 bytes per update. That fits roughly 75 NPC state changes in one packet before hitting the limit, which was plenty for the view radius tested.

Week 7 - 8 - Demo scene and cleanup

The final stretch was wiring everything into a demo scene, several NPC types visible at once, with a debug overlay showing which entities each simulated client was receiving. This made culling visually verifiable: moving a simulated client position updated the overlay in real time, making it easy to see exactly which NPCs crossed in and out of range.

The last pass was cleanup: removing hardcoded values that had crept into the dispatch loop, fixing a few places where NPC logic was reading stale state across a tick boundary, and ensuring the whole loop could run cleanly headless for load testing.

Reflection

The hardest part wasn't the networking or the AI in isolation, it was the boundary between them. The AI system and the network layer are naturally tempted to reach into each other: the AI wants to know about clients for targeting decisions, and the network layer wants to understand AI state for smarter compression. Keeping them separated through the quadtree as the only shared interface took discipline but made both sides significantly easier to reason about and change independently.

The most impactful next step would be zone-based visibility for map-structured games. Raw distance is a poor proxy for visibility when geometry matters, a player behind a wall shouldn't receive updates about NPCs on the other side even if they're within radius. The quadtree is a solid baseline but doesn't understand the world.

I'd also look at update frequency tiering, sending updates for nearby entities every tick but throttling distant ones to every other tick or less. The current system treats all visible entities equally. Prioritizing by distance would make better use of the packet budget in crowded scenes.

Timeline

W1

Research & Architecture

References from MMO server design and server-authoritative systems. Architecture sketches, protocol decision, environment setup planning.

W2

UDP Foundation

Socket layer, client session tracking, echo tests between server and game client. Hardcoded position packet verified end to end.

W3

First AI on Server

State machine, NPC patrol/idle/chase logic. First real AI position packets visualized in the game.

W4

Authoritative Movement + Radius Culling

Server fully owns NPC movement. Per-client radius culling before dispatch. Multi-client simulation tests.

W5

Behavior Tree + MVP

Behavior tree implemented alongside state machine. MVP complete - server-driven AI replicated to multiple clients with culling working.

W6

Group AI & States

Added Retaliate, Fleeing, Stunned states. Implemented group system with leader promotion and rallying behavior.

W7

Priority Transitions + Combat

Priority-based state transitions. Attack cooldowns, damage thresholds, and group alert system.

W8

Testing, Cleanup & Portfolio

Headless load testing, boundary condition fixes, code cleanup, and documentation.

Live 2D AI Demo

Connect to the AI server. Move with WASD to control a player, watch NPCs chase you, attack when you get close, and flee when damaged!

Disconnected

NPCs: 0 Ping: 0 ms

Server-Side AI Architecturefor Multiplayer Games