AI integration · Kenya

AI integration built in from day one, not bolted on.

The most common version of "AI in your product" is a chat widget grafted onto an existing workflow that nobody asked for. It adds latency, maintenance cost, and occasionally hallucinates at your customers — while the underlying process stays exactly as slow and error-prone as before. That is not what we build. The cases where AI earns its place are specific: a decision that repeats hundreds of times a day, a document type that follows a pattern but defies a simple rule, a body of internal knowledge too large for any individual to hold in their head. We start every engagement with a one-week audit to identify exactly where those cases exist in your stack — and to be direct about where they do not. We run AI in production ourselves, which means we know what the failure modes look like before we introduce them into your system.

In production

We run Claude Vision disease detection across 26,000+ coffee plants in MkulimaOS — field staff photograph a suspect leaf, the image goes to Claude, and a diagnosis with a treatment plan comes back before the supervisor moves to the next row. The same system runs AI-categorized finance entries before they are committed to the ledger, and generates automated weekly operations summaries for farm management. These are not demos; they are how the farm runs. Spidey Labs is led by the Anthropic Claude Code Ambassador for Kenya.

Our AI practice

AI as architecture, not a feature

There is a meaningful difference between adding an AI endpoint to an existing application and designing a system so that an AI model participates at the right point in the workflow. The first produces a feature. The second changes what the system can do. In MkulimaOS, the vision pipeline is not a sidebar feature — it is how disease is detected on the farm. A field supervisor photographs a leaf, the image is processed by Claude Vision, and the diagnosis and treatment recommendation are returned to a mobile device in the field before the supervisor has moved to the next plant. That only works because the image capture, the model call, the response formatting, and the offline sync were designed together from the start, not assembled afterwards. The finance categorization layer works the same way: before an expense entry is committed to the ledger, a model passes it through a classification step that enforces consistent accounting categories across the team, regardless of how the person entering the record describes the line item. The weekly operations summaries aggregate structured data from attendance, harvest, and finance records and produce a plain-language briefing that the farm manager reads on Monday morning. Each of these is AI doing a specific, bounded task it is genuinely better at than a hand-written rule — and nothing more.

  • AI designed into the data flow — not added after the build
  • Each model call has a defined input format, output schema, and failure fallback
  • Latency budgets set before integration, not discovered after go-live
  • Human review gates where model confidence is not sufficient

Vendor-neutral: we will tell you when a different model is the right call

We are Claude-native — our tooling, our production systems, and the majority of our integration work use Anthropic models. That is not a commercial arrangement; it reflects our view that Claude is the strongest model on the tasks that appear most in business software: long-context document processing, structured output generation, tool use pipelines, and code generation. It does not mean Claude is always the answer. For tasks where privacy is the hard constraint and data cannot leave a client's infrastructure, a locally-hosted open model — Ollama with Llama or Qwen — is the right call, and we build and run those too. For specific narrow classification tasks where a smaller fine-tuned model outperforms a large general model at a fraction of the cost, we will say so and use it. For clients operating in markets where GPT-4o has better language coverage for a specific language, that factor belongs in the model selection conversation. Our audit process surfaces these trade-offs with numbers: latency benchmarks, cost per inference at your expected volume, accuracy on a sample of your actual data. The recommendation follows the data, not a vendor preference.

Start with a one-week AI audit

Before any model is called in your system, we spend one week mapping where AI will and will not earn its place. That audit covers your current workflows, data formats, decision volumes, and existing tooling. We are looking for three things: repetitive decisions that currently require human judgment, documents or data that arrive in inconsistent formats that need normalisation, and knowledge bases that exist but are not queryable. We are equally interested in the counter-cases. A workflow where the input varies too much for a model to be reliable, a volume that is too low to justify the integration complexity, a task where a deterministic rule handles 95% of cases and the 5% remainder is better routed to a human than left to a model with uncertain confidence — these are findings we report and act on. At the end of the week you have a written report: which AI integration points have a clear business case, what build approach each one requires, what the expected cost per month is at your volume, and what comes first. That report is the foundation for any build work. It is also useful on its own if you need to make an internal case for AI investment before committing to a build.

What we build — and what you own on delivery

We build classifiers that extract structured data from documents — invoices, forms, field reports, contracts — where the format varies but the underlying information is consistent. We build internal copilots for operations, support, and sales teams, grounded in your own documentation and data rather than general training knowledge. We build retrieval-augmented knowledge bases that make a body of internal content queryable by people who need to make decisions, without surfacing information to the wrong audience. We build multi-step workflow automation using model tool-use: sequences where the model decides which action to take next, calls an API or queries a database, and proceeds based on the result — the kind of pipeline that replaces a process that currently requires a person sitting at two screens forwarding data between them. We build vision pipelines for structured data extraction from photographs: equipment condition assessment, crop health, document digitisation, quality grading. We build prompt engineering and cost optimisation layers for teams that already have AI in production and are spending more than they should per inference. In every case, full IP transfers to you on delivery: source code, prompts, schemas, documentation. No licensing fees, no vendor lock-in on our side.

What it covers

The modules, end to end.

Document classifiers & extraction

Extract structured fields from invoices, forms, contracts, and field reports — variable format input, consistent schema output.

Internal copilots

Operations, support, and sales copilots grounded in your own documentation — not general training data. Scoped access, auditable outputs.

Retrieval-augmented knowledge bases (RAG)

Make large internal knowledge bodies queryable. Built with access controls so the right information reaches the right people.

Workflow automation (multi-step tool use)

Model-driven pipelines that call APIs, query databases, and branch on results — replacing manual data-forwarding processes.

Vision pipelines

Image → structured data: crop disease detection, equipment condition grading, document digitisation, quality assessment. Runs on Claude Vision or a locally-hosted model where data cannot leave the premises.

AI cost optimisation & prompt engineering

Latency benchmarking, prompt compression, caching strategy, and model selection review for teams already running AI that is costing more than it should per inference.

Questions

Frequently asked.

Which model should we use — Claude, GPT, or a local/open model?
That depends on your task, your data privacy requirements, and your volume. We are Claude-native and recommend Anthropic models for most business software tasks — long-context document processing, structured output, tool use pipelines. Where data cannot leave your infrastructure, we use and support locally-hosted models (Ollama, Llama, Qwen). Where a narrow task has a better-fit model at lower cost, we will say so and benchmark it on your data. The audit process surfaces these trade-offs before any build decision is made.
How do we know where AI actually adds value in our stack?
We start every engagement with a one-week AI audit. We map your workflows, data formats, decision volumes, and existing tooling, and identify the specific points where AI has a clear case — and the points where it does not. You receive a written report at the end: which integrations have a business case, what each one requires to build, and what the monthly inference cost looks like at your volume. The audit is useful on its own; it is the foundation for any build work that follows.
Our data is sensitive. Can AI run on private or local infrastructure?
Yes. Not every integration needs to send data to a cloud API. For cases where data privacy is the hard constraint — patient data, confidential financial records, proprietary operational data — we build on locally-hosted models that run entirely within your infrastructure. We run Ollama with Llama and Qwen models in production and can design systems that use a cloud model for non-sensitive steps and a local model for sensitive ones within the same pipeline.
Can you add AI to our existing system, or does it require a rebuild?
In most cases, AI integration layers onto an existing system without a rebuild. We add model calls at the points in your current data flow where they belong — an extraction step before a form submission, a classification step before a database write, a summarisation step on a scheduled job. The audit identifies exactly where those insertion points are and what the integration requires. A full rebuild is only warranted if the existing architecture makes it technically impossible to add the integration cleanly, which is rare.
What does AI integration cost, and how long does it take?
Every engagement starts with a one-week AI audit priced from $4,500. The audit produces a written report and integration plan; it is useful on its own and is the basis for any build estimate. Build projects run from $15K to $75K depending on scope, delivered in fixed-fee phases — you know the price of each phase before work begins. Ongoing iteration and support is available as a retainer from $3,500 per month. We sign an NDA before discussing specifics of your stack or data. Full IP transfers to you on delivery.

Build it properly

Tell us what your operation needs.

Fixed-scope, fixed-fee phases. Full IP transfer on delivery. We respond within one working day, and there's an NDA before any specifics.