AI Orchestration in Logistics: Visibility vs Execution

Ask a VP of supply chain what their control tower actually does, and the honest answer is: it tells them something's wrong. Ask what happens after the alert fires, and the answer gets quieter. Someone opens the TMS, checks the WMS, logs into a carrier portal that has no API, and fixes it by hand. That's the actual state of AI orchestration in logistics at most large shippers and 3PLs right now. Very good at seeing problems. Still almost entirely manual at resolving them.

The numbers back this up. Only 23% of supply chain organizations have a formal AI strategy, per Gartner's own survey of supply chain leaders. Gartner also predicts 60% of supply chain digital and AI adoption efforts will fail to deliver their promised value by 2028. Its 2025 Hype Cycle puts generative AI in supply chain squarely in the trough of disillusionment, with fewer than 30% of pilots reaching production.

None of this is because the models got worse. It's because logistics runs on more legacy surface area than almost any other industry. EDI still carries 78.4% of B2B transactions industry-wide, and the systems processing it were built for batch file transfer, not for an agent that needs to reason across a shipment's full lifecycle in real time. Stack a TMS, a WMS, an ERP, and a dozen carrier portals with no API on top of that, and "orchestration" stops being a model problem and turns into a systems-integration problem that most AI vendors quietly route around.

Visibility platforms didn't invent this gap. They're the clearest evidence of it.

project44 is the best-known name in supply chain visibility for a reason. Gartner has named it the leader in real-time transportation visibility for five straight years running, tracking shipments across more than 267,000 carriers in 186 countries. If any company has earned the right to claim it solved supply chain AI, it's this one.

Which makes it worth reading what project44 itself says about why AI agent pilots in supply chains fail: agents lack operational context (carrier history, regional constraints, shipment urgency), and tooling is fragmented across vendors with no clear owner when something breaks. No competitor needs to say this about project44. project44 said it about itself, diagnosing the exact problem its own visibility product doesn't solve alone.

Its answer has been to build outward from tracking into execution: an AI Agent Portfolio covering freight procurement, disruption management, exceptions, slot booking, and carrier onboarding, followed a month later by a no-code deployment layer called Autopilot. The results project44 reports are real: a jump from 500 to 30,000 weekly agent-driven carrier interactions, and a claimed 4% cut in freight spend. But look at where those agents run. On project44's own visibility graph and carrier network, wired into a customer's existing SAP TM, Oracle OTM, or Blue Yonder deployment as, in project44's own words, "a layer on top". Implementation still takes three to six months, and independent reviews cite integration with legacy systems as a recurring pain point.

That's not a knock on the product. It's the natural ceiling of a company that built its business on visibility data, now extending into execution scoped to the transportation workflows it already touches: carrier communication, procurement, appointments. What it isn't is a model of the rest of the customer's stack. The WMS. The ERP. The finance system that actually has to process the chargeback once the exception gets resolved. TheLoadstar put it plainly when project44 announced its TMS push: visibility is meaningless in isolation. We'd go a step further. So is execution that only reaches as far as one vendor's own product surface.

Four stages, one gap

Gartner frames the last two decades of supply chain software as four stages: business intelligence dashboards in the 2000s, control towers in the 2010s, command centers and digital twins in the 2020s, and, only now emerging, orchestration platforms that don't just display a problem or model it but prescribe and perform the fix. AWS frames the same shift more bluntly: control towers can display, but they can't reason. A retailer AWS worked with cut root-cause diagnosis on a supply disruption from an analyst spending half a day to a single sentence answered in 30 seconds, once the system moved from showing data to reasoning over it and acting.

Most of what's marketed as AI orchestration in logistics today is still stage three with agent-shaped branding on top. A digital twin. A control tower with a chat interface bolted on. A dashboard that reasons a little further before handing back to a person. Stage four needs something none of that has: a durable, validated model of how the specific company's systems actually connect to each other, kept current, and governed well enough that an agent can act on it without someone re-checking every step.

The workflows that stall are the expensive ones

This isn't an abstract gap. Truck driver detention alone cost the US trucking industry an estimated $15.1 billion in 2023 between direct expense and lost productivity. Every hour of that detention is a negotiation across a TMS, a driver app, and often a shipper's own dock-scheduling system, exactly the kind of thing a visibility dashboard can flag but not resolve. Freight invoices run error rates of 5-15%, and a mid-market 3PL can lose $30,000 to $300,000 a year to mistakes that manual review only catches, at best, 60% of the time. A better dashboard doesn't fix any of that. What fixes it is something that can pull the order record, check the rate confirmation, reconcile the discrepancy, and post the correction, across systems that were never built to talk to each other, with a human approving anything above a set dollar threshold.

We've written before about why 40% of AI projects fail for close to this exact reason: the model can call any individual API correctly and still fail to sequence them into a finished workflow, because nobody encoded how the company's systems actually depend on each other. Vertical AI beats general-purpose tools precisely because it starts from a specific company's constraints instead of a generic capability. And as we argued in measuring AI ROI at the workflow level, a workflow that completes 40% of the time doesn't just underperform. It can end up costing more than doing the job by hand, once the escalations are counted.

What an orchestration layer actually needs

Getting past stage three means extracting a validated model of the company's own systems. Not a vendor's network graph, but the customer's actual TMS, WMS, ERP, EDI feeds, and the dock-scheduling tool that never had an API to begin with, pulled from sources that already encode how the work gets done: OpenAPI specs, SOPs, runbooks, database schemas. It means validating that model against a staging environment before an agent ever touches production. And it means a control plane that gates the actions that matter (a chargeback dispute above $10,000, a detention negotiation with a repeat carrier) behind human approval, while letting the routine 90% run end to end. That's the architecture we've built at Hintas: a knowledge graph extracted from a company's own sources of truth, exposed to any agent through two tools instead of one per endpoint, governed by a control plane that decides what's allowed to run on its own.

Visibility was never the hard part. Fixing what visibility finds, across systems nobody designed to cooperate, is. That's what the rest of this series digs into: why voice agents for freight brokers still stop at the load board, and why bespoke AI agent builds don't survive past the engagement that built them.

If you're interested in early access, reach out at hintas.com.