Unexpected shortcomings of ChatGPT on his own OpenAI's billing knowledge

By Gabriel, 19 Oct 2025 , updated 26 Oct 2025

This post is about a real life experience with ChatGPT, where it failed to provide accurate information about its own billing system. Or How I lost $10 of OpenAI credit! It illustrates some shortcomings of LLMs even for a state-of-the-art model on an known domain of knowledge.

I had a bad experience last week with OpenAI billing system, I lost $10 and chatGPT provided wrong information on that topic that I would imagine it has a good knowledge of.

The conversation in a nutshell

Me: I have added $10 credit to the wrong workspace. Can I transfer it to my personal workspace, same OpenAI account?
OpenAI ChatGPT: Yes, no worries, Super common issue, happens a lot. Just ask OpenAI support to move it.
Me: sweet as

OpenAI support chatbot: No, cannot be done. Credits are tied to the workspace they were added to.
Me: … ChatGPT suggested it was done for other users. Check again.
OpenAI support chatbot: No, cannot be done. Credits are tied to the workspace they were added to.

Me: Whom to believe?
OpenAI chatGPT: oh I see the contradiction. I meant that up until around mid 2024 it was possible. As of late 2025 it is not. Goodbye.
Me: I lost $10 🤦‍♂️

Background

While the OpenAI ChatGPT is free to use (basic version with limitations) the OpenAI API to access the underlying models is a paid service. It is a pay-as-you-go model based on usage. I wanted to experiment with the API for some personal python projects so I created an OpenAI account and added $10 of credit to it.

I added the credit to the default “organization” workspace instead of my personal workspace, unaware of the distinction at the time. Then I went on creating an API key and start using it.

First API call returned a 400 error, with the message Your organization must be verified to stream this model. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.

hmmm…

To verify the organization on OpenAI one need to provide an identity document (driver license, passport, national id card). I noticed that today and for Australia, it uses “Persona” a third party service. This process makes sense for me to deter industrial scale fraud by fear of the consequence because the fraudster is identified. . But I won’t be using much, surely there is a way to use the API for small experiment without this layer of security. I’ll ask chatGPT, it should know!

Next are the two conversations I had, one with OpenAI chatGPT and one with OpenAI support AI agent to work on that issue.

Something that I was expecting to be a quick 10 minutes chat solving my problem ended up being a disappointing 30 minutes back and forth with contradictory answers not solving anything. OpenAI support agent being straightforward and clear about what is possible or not, while OpenAI chatGPT was going into details and reassuring explanations that were incorrect. (Was that outdated information or simply hallucinated knowledge?)

I’m not willing to go through the identity verification process for a small usage. Hence the next step is probably forfeit the $10 credit (refund is not possible in that scenario) and add another $10 to my personal workspace when I get back to this project.

Analysis

Regardless of OpenAI being the company behind chatGPT, the company is today a major world wide technology company and I would think that all state-of-the-art models today should have good knowledge of it.

The way I can best describe those challenging scenario for a LLM is when the question asked requires reasoning (on existing knowledge) ie: the straight answer is not written repeatedly somewhere in the training data because of variables (here the amounts, multiplicity of workspaces, changed policies over time) and it is not verifiable by running code.

Another way for me to describe those scenario is that a LLM is not a rule-based system it is a statistical model. Totally different logic inside. When only a small number of rules is involved it will works well (even with a huge corpus of data). But when the number of rules involved increases the chance the statistical model to simulate correctly the output of a rule-based model decrease rapidly. Later in the discussion when user provides crucial new information, the statistical model regains ground and it can adjust its answer accordingly (like here) but by then the damage is done.

It seems as well that chatGPT is biased to provide a positive answer. OpenAI Support agent doesn’t mind saying no.

The chat tool shows a lot of specific knowledge about the topic (see excerpts below): hard to tell if it is displaying outdated or hallucinated knowledge.

“yes, this is a common point of confusion right now with OpenAI’s newer organization and verification rules”
“this happens a lot lately with the new organization setup”
“You can ask OpenAI’s support team to move the $10 credit […] They’ve handled this for other users in the same situation.”

Knowing now the answer, this display of knowledge is probably just a case of sycophancy. Not a every day word for me (!) but that was used by OpenAI to describe the now addressed issue of the older model GPT‑4o - see here in April 2025 and there in may 2025) Or it could be plain classic LLM hallucinations.

Specific points

What is really possible to ask move credit from an organization workspace to a personal one within the same account before?

no idea

GPT-5 is incorrect for that because of its knowledge cutoff predate the change of policy

GPT-5 knowledge cutoff is September 2024. I won’t speculate whether there was a change of policy or when it happened. Yeah ok, it is possible that things have changed in the last 12 months.

But I would oppose that later in the chat, GPT-5 seems aware of it. We know that knowledge cutoff is not a hard limit, the chatGPT bot is more than just a LLM, it is a “unified system” combining a routing system (deciding which model to use based on the conversation). It has access to several tools, especially web search. So I would expect it to provide with the correct information on OpenAI billing system at the first try.

What about refund?

I tried but no luck: “Credits […] are generally not refundable, except in cases required by law or if there is confirmed account compromise.” - OpenAI’s AI support chat, October 2025

What is the model powering OpenAI’s AI support chat? (assuming it is based on one of their published LLM models)

I couldn’t find any official information. I risked asking ChatGPT itself (even if at this point my confidence in its answers on that topic is shaky at best). GPT-5 (or GPT-5 Mini) answered that, while it doesn’t knoow for sure, it “is probably backed by one of OpenAI’s “assistant-capable” models (like o3, o4-mini, GPT-4o, or something in that family)” (see shared chat above).

Conclusion

I found it interesting to report this frustrating experience but I do use LLM-based tools a lot and quiet successfully for some usages. My first usage is as a coding assistant, where it does save time for me. Another usage of LLM that looks promissing for me is “structured data extraction from unstructured text” (I will publish a post on that in a couple of weeks). But it has been said before and it is worth repeating: The LLM answer for some prompts can be deceptively convincing, yet completely wrong. Hard to tell apart a confident wrong answer from a correct one. Of course when the incorrect answer is challenged later in the conversation with appropriate context by the user, the model will correct itself (like here) but by then the damage is done, the confidence is shattered (and I lost $10 of credit :-) ). LLM/Generative AI has already demonstrated its usefulness in some areas: coding, writing, summarization, image to text description (still some caveats applying here). But for others it is not useful at all.

I feel like for in a lot of traditional business (potential user of those new AI tools) the attitude is “That’s a great tool, impressive. What else can I do with it?”. Slightly different from the hype generated by AI companies, and AI investors! (“a significant leap in intelligence over all our previous models”, “state-of-the-art performance”, “provide expert-level responses”).

When the tool provides a good answer to a prompt, the questions for a business wanted to leverage that capacity to provide a better service or products to customer are: will it scale, is it stable? How will it handle edge cases? Can we put measures in place to prevent or mitigate the bad scenario, overall is that a benefit or a hindrance for my customers and my business?