You've decided to build an agentic systems practice. You've learned the platform. You've signed your first client. Now what?
The biggest mistake new consultants make is treating an agent deployment like a software project: gather requirements, build, ship, invoice. But agents aren't software. They're ongoing systems that interact with real customers, evolve with the business, and break in ways nobody predicted. The engagement model needs to reflect that.
Here's what the first 90 days actually look like when you do it right.
Week 1-2: Discovery
Don't touch the platform yet. The first two weeks are about understanding the client's business deeply enough to make design decisions they'll trust.
Map the conversations. What types of conversations does the business have with customers? Sales inquiries, support tickets, appointment requests, onboarding questions, billing disputes. For each type, what's the volume? Which channel? What's the current process? Who handles it now? Where does it break?
Most clients will say "we want an AI agent for customer support." That's not specific enough to build anything. You need: "We get 200 WhatsApp messages per day, 60% are order status inquiries, 25% are return requests, and 15% are product questions. Order status is handled by a team of 3 in Manila. Returns require manager approval over $100. Product questions usually need the catalog PDF." Now you can design something.
Audit the knowledge. Where does the information live that the agent will need? Product documentation in Google Drive? Policies in a shared wiki? Pricing in a spreadsheet? FAQs on the website? This audit determines your knowledge architecture, and it almost always reveals that the information is scattered, outdated, and contradictory across sources.
Identify the integrations. What systems does the agent need to interact with? CRM for customer lookup? Order management for status checks? Calendar for booking? Payment system for refunds? Each integration is a tool the agent will use, and each one has its own data model, auth requirements, and failure modes.
Define the guardrails. What should the agent never do? Never promise a refund over $500 without approval. Never share internal pricing tiers. Never discuss competitor products. Never confirm a shipping date it can't verify. These become behavioral guidelines, and discovering them now prevents crises later.
Deliverable: a discovery document with conversation types, volume estimates, knowledge sources, integration requirements, and behavioral constraints. This is the foundation everything else builds on.
Week 3-4: Behavioral Design
This is where most of the intellectual work happens, and it's the phase that separates a good consultant from someone who just configures a chatbot.
Design the guidelines. Each behavioral rule is a condition-action pair: "If the customer asks about return policy AND the order is over $100, explain that manager approval is required and offer to escalate." These aren't prompts. They're structured rules with explicit conditions, explicit actions, and explicit criticality levels (high for compliance rules, medium for standard behavior, low for nice-to-haves).
Guidelines interact with each other through relationships. Some are dependencies: "the discount offer guideline can only fire if the customer retention check guideline has already been evaluated." Some are priorities: "the compliance warning overrides the upsell suggestion when both are relevant." Some are entailments: "if the appointment booking guideline fires, the calendar availability check should also activate."
This is the graph problem that most people don't see coming. With 10 guidelines, the interactions are manageable. With 50, they're a web. With 100+, they're a system that requires careful design, testing, and documentation. The consultant who can design this well is providing enormous value.
Design the journeys. Multi-step interactions need explicit flow design. A lead qualification journey might have stages: greeting, needs discovery, budget qualification, product recommendation, objection handling, meeting booking. Each stage has its own behavioral rules, available tools, and transition conditions. The journey is a directed graph with nodes (stages) and edges (transitions).
Build the knowledge base. Take the knowledge audit from discovery and turn it into a structured knowledge system. This means: organizing sources by type, configuring sync (Google Drive, Notion, website crawl), ensuring chunking produces semantically complete segments, and testing retrieval accuracy with real questions.
Configure the tools. Wire up the integrations identified in discovery. Each tool needs: authentication, input/output schema, error handling, and crucially, behavioral scoping. The refund tool should only be available when the agent is handling a return request, not during a sales conversation. Tool availability per guideline is part of the behavioral design.
Deliverable: a fully configured playbook with guidelines, relationships, journeys, knowledge bases, and tool integrations. This is still in preview mode, not released to production.
Week 5-6: Testing and Iteration
Never deploy an untested agent. This is the phase where you find out everything that's wrong with your design before customers do.
Build test suites. Create test scenarios for every conversation type identified in discovery. Each scenario has: an initial customer message, a sequence of interactions, and expected outcomes (correct information provided, right tool called, appropriate escalation triggered, correct tone maintained).
Run regression tests. Execute the full suite and review results. You'll find: guidelines that conflict, knowledge gaps the agent tries to fill by improvising, tool calls that fire in the wrong context, escalation triggers that are too sensitive or not sensitive enough, and edge cases nobody thought of.
Iterate on the design. Each test failure is a design issue, not a "prompt needs tweaking" issue. A conflicting guideline needs a priority relationship. A knowledge gap needs new content or a different chunking strategy. A wrong tool call needs tighter scoping. Fix the design, run the suite again, repeat.
Test across channels. If the agent is deployed on multiple channels, test each one independently. The same guideline produces different results on WhatsApp (short, conversational) versus email (comprehensive, formatted) versus webchat (fast, value-first). Channel-specific behavioral rules may need adjustment.
Deliverable: a passing test suite with documented coverage, a list of known limitations, and a behavioral design that's been through multiple iteration cycles.
Week 7-8: Staged Deployment
Don't flip the switch for all traffic on day one. Stage the rollout to manage risk.
Release the playbook. Move from preview (live editing) to a released version. This snapshots the entire behavioral graph: guidelines, relationships, journeys, terms, canned responses. The released version is what the agent uses in production. Future edits happen in preview without affecting live behavior until the next release.
Start with one channel. Deploy on the channel with the most volume and the best data (usually webchat or WhatsApp). Monitor closely. Review every conversation for the first few days. Look for patterns: are there conversation types you didn't anticipate? Are customers asking questions outside the agent's scope?
Enable monitoring. Set up the analytics dashboards the client's team will use. Smart tag classification for every conversation. Resolution rate tracking. Escalation rate monitoring. Per-topic performance breakdowns. This is the infrastructure that makes ongoing management possible.
Establish intervention protocols. Train the client's team on when and how to intervene in live conversations. Define escalation criteria. Set up alerts for specific smart tag patterns. Make sure someone is watching during business hours for the first two weeks.
Deliverable: a live agent on one channel with monitoring, intervention protocols, and a client team trained on day-to-day operations.
Week 9-12: Expansion and Optimization
The agent is live. Now the real work begins.
Expand to additional channels. Each new channel is a mini-project: adapt behavioral rules for the channel's tone and constraints, test, deploy, monitor. WhatsApp requires shorter messages and template compliance. Email requires polished writing and comprehensive answers. Voice requires sub-500ms latency and structured flows.
Close the feedback loop. Review conversation analytics weekly. Identify: topics with low resolution rates (knowledge or guideline gaps), conversations with negative sentiment shifts (behavioral issues), high escalation rate topics (scope expansion opportunity), and smart tag distribution changes (drift detection).
Run the first business review. Present to the client's leadership with hard metrics: cost displacement (conversations resolved without human intervention), revenue attribution (leads qualified, meetings booked, upsells completed), operational efficiency (average handling time, first contact resolution), and conversation intelligence (topic trends, sentiment patterns, emerging issues).
Propose the retainer. By week 12, the value is proven and the client understands that the agent needs ongoing management. The retainer covers: weekly optimization cycles, monthly business reviews, knowledge base maintenance, new use case development, test suite expansion, and playbook version management.
Deliverable: a multi-channel deployment with proven metrics, a business review that demonstrates ROI, and a signed retainer for ongoing management.
The Retainer: Where the Real Business Lives
The initial deployment is a project. The retainer is a practice.
Monthly retainer work includes: reviewing conversation analytics and identifying optimization opportunities, updating knowledge bases as the client's products and policies change, expanding behavioral rules for new conversation types, running regression tests after every change, managing playbook versions (preview, test, release), quarterly business reviews with ROI reporting, and developing new use cases as the client's confidence grows.
This is the compounding value of an agentic systems practice. Each client's agent gets better over time, which makes them harder to replace. The retainer revenue grows as you add channels, use cases, and capabilities. And the expertise you build with each client makes the next one faster to deploy.
The first engagement takes 90 days because everything is new. The fifth takes 45 because you've built frameworks, test templates, and deployment checklists. The tenth takes 30 because you know exactly which questions to ask in discovery and which design patterns work for which scenarios.
The consultants who build this muscle first will have an insurmountable head start. Not because the platform is hard to learn, but because the judgment of how to use it well only comes from doing the work.