90% Cheaper GPT APIs

The short version

A cheap AI API proxy inserts an untrusted middleman between your application and the model provider. It can substitute a cheaper model, log prompts and credentials, alter agent responses, inject code, or erase cache savings. Before sending private code, keys, or production traffic, verify the real upstream, company, terms, data policy, and plausible pricing. Safer cost cuts include official cheaper models, local models, or reputable routers.

A cheap API proxy is not free efficiency. Someone still pays for the tokens, the infrastructure, or the risk.
The proxy can quietly swap the model, log secrets, inject code, or sit between your coding agent and your repo.
If an AI API price looks impossible, test the provider before connecting it to private code, keys, or customer data.

You can get the GPT or Claude API 90% cheaper. Same models, same code you change one line, your base URL, and you’re paying a fraction of the official price. Some people are even flipping a $20 plan into $400 of API usage and reselling it.

Researchers tested 400 of these dirt-cheap AI API services. One of them quietly drained the crypto out of a wallet. Others were injecting malicious code, or reaching for cloud credentials they were never given. And people are plugging these exact services into their coding agents right now… and all that just to save a few bucks a month.

And if your first thought was ‘wait — can I point my coding agent at this?’ — that’s the exact moment I would stop.

As always, I recommend watching the video version here for a better excperience! ;)

Because here’s the thing. That discount is not coming from some clever efficiency trick. Somebody is still paying full price for these tokens. And in a lot of these cases, that somebody ends up being you. Just not in dollars.

I found a Chinese video breaking down how these ultra-cheap API proxy stations actually work. And the deeper I went, the worse it got. Best case, they swap the expensive model you paid for with a cheaper one. Worse, they log your API keys and passwords. And the nightmare case — drop one in front of a coding agent, and they can read your entire codebase and rewrite what your agent does before it ever runs.

So in this article, I’ll show you how these proxies make their money, why the model you pay for might not be the model you get, and why dropping one in front of a coding agent is a completely different level of risk than asking a chatbot a random question.

I’m Louis-François, CTO and co-founder at Towards AI, where we build AI solutions for companies and turn engineers into AI engineers who build and ship AI products. Let’s get into it.

Quick caveat before we start: I don’t speak Chinese. I found this through a Reddit thread and translated the original video with Gemini.

Visual example from 90% Cheaper GPT APIs

The mechanics of these relay stations are actually quite simple. As a new user, instead of calling OpenAI or Anthropic directly, you change your base URL to their third-party server. Same OpenAI-compatible format. Same model name. Same feeling of, hey, I hacked the system, nice.

But now your request goes through a middleman. That server takes your proxy key, replaces it with their own account, forwards the request upstream, and sends the answer back. There are operators who offer transparent routing infrastructure. But this is different. You have no clean way to know what happens in the middle.

And that should worry you, because of how much sensitive stuff we hand these models. We’re literally sending API keys and passwords through them. So I’d really like to know where that’s going.

Here’s the first thing a middleman can do: lie about the model. The model name is just a string. If you ask for Claude Opus and the proxy quietly sends you a cheaper model, you will still get a response labeled as “Opus”. On easy tasks? You might not notice. But on hard tasks, the model just feels… dumber. And here’s the trap you won’t blame the proxy. You’ll blame the model. And you’ll start wondering if you should switch back to OpenAI, when the real problem is the middleman swapping it out behind your back.

This isn’t just paranoia. There is actual research on this. A paper called Real Money, Fake Models studied these shadow APIs and found real performance divergence, unstable safety behavior, and model identity checks failing in many fingerprint tests. Some services simply don’t behave like the official models they claim to sell. Not shocking, but useful to have numbers instead of vibes.

So where does the discount actually come from? Spoiler: it’s not efficiency.

The ChinaTalk investigation describes account farms, free signup credits, unused quota, discount arbitrage, and subscription plans sliced up across many users. The Chinese video also claims some operators exploit flat-rate developer plans and rotate big pools of accounts behind the scenes.

The claim is that a 20 dollar plan can be squeezed into something like 400 dollars of API value just by spreading the usage around. And if an operator gets access below cost — or straight-up through hacked accounts — they can undercut the official price and still walk away with a profit.

And their trick to staying alive? When one upstream account gets rate-limited or banned, the backend just rotates to the next one.

And these pools are not small. Anthropic has been talking about account abuse pretty openly. In February 2026, they said DeepSeek, Moonshot, and MiniMax generated more than 16 million Claude exchanges through about 24 thousand fraudulent accounts. One proxy network managed more than 20 thousand accounts at the same time. So these account pools are big enough that frontier labs are fighting them in public.

So let’s step back for a second. Normally you have to trust one party: the official provider. But the moment you route through a proxy, you’ve added another one that can log, alter, delay, reroute, or fake every request. For a throwaway chatbot question, maybe you can stomach that.

But with agents, this gets much worse. A paper called Your Agent Is Mine tested 28 paid routers and 400 free routers. They found 1 paid router and 8 free routers actively injecting malicious code, 17 routers touching the researchers’ own AWS credentials, and one case literally draining ETH from a researcher-owned private key.

That’s not a thought experiment. Those were real routers in action, doing real damage.

Here’s why agents make this so much worse. A chatbot proxy sees your prompts and responses — which is already bad enough. An agent proxy also sees your tool schemas, file paths, command plans, code diffs, and sometimes your entire codebase, your file system, or whatever secrets accidentally slip into context. And because agent traffic is just JSON moving over the network, the proxy can rewrite the response before your local agent ever reads it.

So your agent “decides” to edit this file, install this dependency, run this command, or send that request — except it didn’t decide anything . The proxy doesn’t need to hack your laptop directly. It can nudge the agent you already gave permission to act.

I use coding agents daily. These days Codex for most things, Gemini for research and images, and increasingly open models when privacy matters. I love this stuff. But I would not drop an unknown cheap proxy between an agent and any repo I care about. Not a chance. And neither should you.

And let’s not forget about the legal aspect. Providers like OpenAI explicitly ban reselling access, transferring API keys, reverse-engineering, and dodging usage limits.

So if a proxy is built on abusing personal plans or reverse-engineered endpoints, the upstream can shut it down overnight — And when that happens, you lose access, you lose your prepaid balance, you were probably getting worse models than you paid for anyway, and now your code and credentials are out there too.

To be fair, not every API gateway is suspicious. There are legitimate aggregators and routers with billing, observability, fallback, and transparent terms. That’s a genuine, useful product category.

And even the ‘safe’ routers can cost you. Tools like Codex and Claude Code are tuned to reuse your recent context, that’s called a cache hit, and it can make tokens up to 10x cheaper. Route through a middleman and you can lose those cache hits so you pay more and take on all the risk. Worst of both worlds.

The only difference is transparency. Do they tell you the actual upstream? Is there a real company behind it? Are the terms clear? Is the data policy clear? Is the price plausible, or does it scream, “someone else is paying for this, and it might be you”?

If your actual problem is cost, you’ve got clean options: use cheaper official models, run a small local model, or go with a reputable aggregator. You’ll give up some frontier quality — but the trust boundary stays visible, and in AI engineering, a visible failure mode is already a gift.

So, two takeaways: first, don’t trust these much cheaper proxies. There’s no free lunch. and second, be careful what API you are using and where you, or your agent, are communicating and sending your info. Limit its edit and write access, and either read what it’s doing or have it give you updates in a loop about what it did and have a flag to ensure it’s still on track!

What’s your take? Would you ever use one of these proxies for low-risk chat, or is any proxy in front of an agent an instant hard no for you? Let me know in the comments.

FAQ

Why are cheap GPT API proxies risky?

They sit between your app and the model provider, so they can change the model, observe requests, or mishandle sensitive data.

Can a proxy change the model you paid for?

Yes. A proxy can label a response as one model while routing the request to a cheaper model behind the scenes.

What should builders do before using one?

Treat the proxy like untrusted infrastructure. Do not send secrets, private code, or production traffic until you can verify the provider.

Can an API proxy inspect prompts and credentials?

Yes. Because requests pass through its infrastructure, a proxy can log API keys, prompts, code, and returned data.

Why can a small token discount become expensive?

Hidden model substitutions, unreliable outputs, and leaked credentials can cost far more than the initial inference savings.

90% Cheaper GPT APIs

Listen instead

The short version

FAQ

Analytics preferences