As I recommended in a previous article, we picked a workflow. We applied AI. It worked. Then we optimized it and got the cost down.

That version is technically true, but it leaves out the part that actually matters. What mattered was not just that the invoicing workflow worked. What mattered was what happened after it worked.

As some of you know, at GCS, we started using AI to automate part of our invoicing process. The first goal was not elegance or cost savings. It was really just to see if we could actually make it work.

Not demo “work”, I mean, could this workflow actually work in the real world?

That is the first job in the 2-6-4 process. Prove it. So that is what we did.

We got it working in our actual environment (Mach1). We connected it to our accounting systems (QuickBooks), our time sheets (Excel), email, databases, and the way the business actually operates (Ratesheets, Rules). Then we let it run for a few weeks, and watched (closely).

And once it had been running long enough to stop feeling like a demo, we started to measure the cost.

That is when the second lesson showed up.

The workflow worked, but the first version was too expensive. Not necessarily for us, but definitely for the kinds of organizations I help. GCS is small, so we can get away with a lot. Our customers usually cannot. They have more volume, more complexity, and less room for waste.

That is a very normal place to land, by the way. In fact, I think it is the right place to land.

If your first version is trying to prove the workflow, it should probably lean a little heavy, and most certainly a little ugly. It should use stronger models than you think you will need long term. It should probably be more brute force than elegant.

That is fine. We don't want perfection to be the enemy of good enough.

At this point, you are buying proof. I think of it as tuition to create high value later. Just be careful not to get stuck. This is a real trap I've seen hundreds of times.

Once we knew the process was real, we started looking at what was actually driving the cost per invoice. After working through some of the gaps, we made a series of changes that cut the cost per invoice by over 30%. So where did those come from?

At the beginning, we used the big expensive models. That made sense. We were trying to reduce uncertainty, not be frugal. But once the workflow stabilized, it became obvious that we were paying for more model than we needed. So we lightened that up.

We also realized we were passing too much into the system. In AI terms, that means too many tokens. In the first version, we were feeding in all rules for all invoices. That sounds reasonable when you are trying to make sure nothing gets missed, but it is wasteful. Later, we narrowed it down so the system only saw the rules for the specific customer tied to that invoice.

Another important change was giving the AI better access to smarter tools and data. Instead of trying to push every possible detail into the prompt, we let it use tools and databases to look up what it needed in a safer and more controlled way. That reduced overthinking and cleaned up the process.

We changed how the work got assigned by creating a ticket system that went straight from email to agent, with no wasted cycles and no polling.

And once the workflow was tighter, we started moving more of the work to cheaper models that no longer needed a large context window. That was only possible because by then we understood the workflow better, cleaned up the inputs, and did the work to tighten the system.

That is really the lesson.

The first version proved the workflow. The second version proved the economics.

I think a lot of teams try to do those in reverse order. They start by worrying about optimization, LLM logos, architecture, or cost before they have even proven the workflow works. That usually leads them to optimize the wrong thing, or deliver a solution to something that is not a problem to begin with. I call this Fake Progress.

For us, it made more sense to prove that invoicing could work with AI in the real world, let it run, learn where the drag was, and then tighten the system once the shape of the process was clear.

And if you are a larger organization processing serious volume, this is not even close to the bottom of the cost curve.

There is still more optimization available:

  • reducing names and verbose fields to IDs where possible

  • tightening datasets even further so the model only sees exactly what it needs

  • batching work together where it makes sense

  • using smarter tools to retrieve, validate, and route data instead of pushing more into the model context

  • with a modest hardware investment, moving more of the work to local models that are effectively free to run and stay inside your own data center

But that is not where I would start.

I would start where we did.

First prove the workflow is real.
Then learn what it costs.
Then start cutting the waste.

That is the sequence.

And in my experience, it is usually the right one.

Keep reading