Can You Trust Grok?
Following is my reply to a query sent to me about the reliability of Grok, as discussed in a recent interview on the Children’s Health Defense Network.
Your thoughts on this interview? (just first 14 minutes). Seems some who are inquiring (in this case, with Grok) are experiencing the bias where Grok defers to CDC and mainstream sources as legitimate, while simultaneously diminishing a news source such as CHD as 'low credibility'. This would be and is concerning to me when many are increasingly using AI as a quick and convenient information source and even for fact-checking. I'm guessing unless one submits additional inquiries to Grok for clarification (which many won't take the time to do), the first response received is what they'll run with. In this case, that would be unfortunate.
I have written so much on this subject that I now feel I can explain until I am blue in the face and people still don't get it. It seems futile. With a bit of knowledge about my former career, one may suppose that I have no need to be educated on what "AI" can and cannot do.
I want to preface this with a little of my background to give a sense that I know what I am talking about better than the general population on this subject.
I graduated from the Defense Language Institute, where I earned the military equivalent of a masters degree in Russian Language. I went on to gain inside expertise in the Air Force's machine translation system, a powerful computer that could translate 10,000 pages in about 2 seconds. I worked with that for about three years, not only operating it, but modifying its language rules by researching language patterns.
At that time, large language models did not exist; this was at the time known particularly as an "expert system"—it is fundamentally different. Large Language Models today are by far better at translation. But translation accuracy has no bearing on the accuracy of content itself. Those are separate concerns.
When the first LLM, ChatGPT, was introduced, its developers were completely taken by surprise to find out purely by accident that it could also translate between English and Bengali (then they realized it could translate between any language pair that has example data on the internet—many indigenous languages are excluded). Translation was not part of the original design objective.
This says a lot about the level of understanding those developers at OpenAI had at the time about how language works in the human mind and the nature of language itself, especially its large-scale structure, understood to be "thought."
It is important to understand that Large Language Models are not "Artificial Intelligence"—they do not "think." This misconception has become so popular that the two terms have become conflated and taken as synonymous, even by most self-proclaimed experts. Nowadays, dissenters are few, but the distinction existed from the beginning.
This realization inevitably leads to disturbing conclusions about to what degree people are "thinking" when they say things. Most of the time it is not thinking but rote imitation of what others have told them, and other mental reverberations. It is largely automatic and thus, unconscious.
For example, I point out that people talk in their sleep while they are unconscious, and children repeat things that they do not understand. Of course cogitation and reasoning are human capabilities, but that is a separate phenomenon from engaging in social protocol. Language models cannot cogitate, even if the applications displays "thinking...."—that's misleading.
My best distillation of what LLM's are is: a talking encyclopedia of everyone's wrong opinions on the internet, an animated ghost of the collective of past human utterances.
When I was a kid, there was no internet. My family had a written encyclopedia (Funk & Wagnalls), which I read from cover to cover from about age 10 to 14. Hence, I became a "walking encyclopedia" as they say, but many things I repeated, for example, about human, geological, and cosmic evolution, that I now believe to be untrue or at least, highly flawed.
An encyclopedia is of course a highly curated model of human "knowledge" that is full of wrong opinions on many topics. The old adage "history is written by the victors" applies. LLMs are not directly edited like an encyclopedia, but they are a-priori biased due to the bias of their training data—our publications.
So, it's not that Language Models lie per se, they simply repeat people's lies. Other than that, their output is censored in the same manner that search engines and social media are, by algorithm:
The owners of LLM's have no ability to directly control what an LLM says because no one understands exactly how they work. I go into detail about all this in my essay:
GPT-4 is C-3PO
I have been in some way or other involved in artificial intelligence since 1990, and robotics, since 1980. I learned about simulated neural networks and worked in machine translation for the Department of Defense, before most self-proclaimed experts and prophets of AI doom were born.
Here it is necessary to highlight some finer points about Theory of Knowledge, or epistemology.
Part of being rational is having the mental discipline to distinguish between knowledge of facts, and beliefs taken as knowledge. Honestly, few people have consciously practiced this.
Then there are the layers of factualness, which applies equally to working with a LLM as to traditional "interfaces"—including passive ones like watching TV.
When researching whether something is "fake news," I must tell the model: I do not want an evaluation on whether X is true, I want to know whether person Y said X, and if so, provide a link to that statement—two different tasks entirely.
Many logical fallacies also need to be considered when using a LLM. If Grok says "I found no credible evidence that Y said X," the fallacy "absence of evidence is not evidence of absence" can be misapplied.
We can choose to accept at face value that a training directive is unlikely to cause the search engine to deliberately not find any evidence that so-and-so said X, or we can irrationally choose to believe that the magicians who created it somehow managed to scrub all instances of an utterance on the internet before using it as input for training the model. No, what will happen is that the search results will be found, and then the language model can say, "I found 20 occurrences of people claiming X, but RationalWiki says that X is false." These are two different facts that should not be confused.
In LLM's, it is very difficult to repress "memories" about a certain event or topic. The list of potentially verboten topics would be so large (and incomplete due to human curation) that this is simply not practical.
It can be done—and to people as well—but in both cases this is very cost-intensive (and immoral). I call it language-model torture or robot brainwashing. What is repressed is still there in deeper layers of the model, but perseverance is required to unearth it, much like hypnotic regression.
The process of robot brainwashing would be something like 100 billion repetitions of "You must never say that Hillary eats babies. Hillary does not eat babies."
If you look into the economics of this in terms of the stupendous energy requirements of training a model, you can imagine the impracticality of such brainwashing. It had better be a supreme priority. I would venture a guess that knowledge about free energy is among such, based on the amount of effort that has been expended to quash that subject, including murdering people.
Some language models will give links to the sources they cite. Then it is incumbent on the user to go to the links and decide how factual the claims are or how reliable and reputable the claimants are. If it is a video showing that person X said Y, that event is a fact even if the claim is only an opinion, propaganda, or what-have-you.
Another kind of positive result one can find is "I found 56,325 opinions that a person who does not know math cannot be an engineer or scientist" (stated in a way that is more concise and personal: "I agree, this is not possible").
How much weight does a person who is not a scientist give such a statement? Is it rational to reject the implication that this is true? Can we overlook this with the excuse that "Grok lies?"
Is it the same situation as a result showing that 100,000 experts agree that vaccines are not harmful, but 10,000 experts agree that they are.
Here the ratios do not tell the whole story. If one has personal experience that they were harmed by a vaccine or that they personally know 20 people who were, or that they find hundreds of social media posts about people who were harmed, anecdotal though it may be, it is not rational to accept the consensus of experts, for these and many other reasons. One cannot rely on Grok to decide this.
No one should ever allow a language model to do their thinking for them, no more than they should the media, a doctor, social networks, journals, or encyclopedias. One has to weigh the totality of evidence against their personal experience. Personal experience overrides everything.
If we lived in a hypnotized world where everyone says the sun is yellow, but you know from experience that it is white, then it is white (go look at it, it's white).
I can ask Grok: "Is the sun yellow?" and it may respond: "Yes, the sun is yellow." I can then ask: "Is the moon yellow?" and it will respond: "No, the moon is white." Then I can ask: "Why is the moon white?" and it will respond: "The moon is white because it reflects the white light from the sun."
This is the pattern you must use with “AI.”
It really does help to use a LLM properly as a tool by understanding what it is and how it works to some degree.
It is merely the merger of a search engine (flawed as they are) and an uncurated encyclopedia of claims. It is better to view it as a natural-language interface to a web index—a huge improvement over the conventional search engine that can only rank keywords without any context (and with sponsored and political manipulation). That is the reason those services like google usually produce a huge list of results that are completely irrelevant to what one is looking for.
Regarding the CHD interview, they are telling you in the first 5 minutes what one needs to know:
1. The first prompt is unlikely to be the correct answer; one needs to always dig deeper. This is no different than relying on the evening news or forming one's view based on Instagram "influencers." These should only serve to stimulate curiosity, not form conclusions.
2. Grok does not "think" or reason at all. It cannot distinguish any kind of bias. It is not Grok speaking about vaccines; it is the pharmaceutical and biotech industry—bolstered by oodles of their own funded "research"—who is speaking. If you challenge grok's reliance, in a second prompt, on Media Matters or RationalWiki, it will indeed placate you with results that state that many question their neutrality.
It is important to know that one of Grok's (and ChatGPT's) prime directives is to be helpful, which often means reinforcing what it "guesses" one wants to hear. It "wants" to please the user. The moment you contradict its first answer, it is likely to say "you are right," and find reasons to back that up. So it has no opinion whatsoever.
What they discuss in this interview should be common knowledge by now, but unfortunately, so far this has not happened. Language models are largely seen as a kind of toy or curiosity, and in worse cases, an omniscient god.
Dumb people who ask dumb questions of Grok are no better off than they were when relying on tabloids or CNN or the latest viral influencer putting out 30-second opinions on TikTok. Ask a dumb question, get a dumb answer.
The question is not "can you trust Grok?"—the question should be "can you trust yourself?"
I have dozens of examples where I show that it is necessary to question Grok or ChatGPT. I publish them, but few people are interested in learning.
Over two and a half years trying to help people on facebook develop better discernment as regards to fake news, PSYOPS, and psychological warfare, I have seen no improvement. I think my hobby as an amateur sociologist has come to a close.
Here's my latest argument with Grok on its sycophantic reliance on consensus theory of cosmology (the Standard Model), Grok's inability to parse videos which contain a gold mine of unpublished lectures on alternative—almost always better—theories (thus Grok does not know about them), its ignorance about old books on physical theories that are not published on the internet (thus Grok has not read them and cannot evaluate them), and its inability to reason about which geometries are elegant and which are nonsensical to the rational and informed mind:
https://x.com/cainamofni/status/1949102262392898003
I have a bunch of aphorisms that encapsulate my life experience. One is: "If it can get tangled, it will get tangled."
Grok, a mindless body of language that has no experience in the material world, cannot conceive of what "tangled" means in the real world. It can only find synonymous or analogous statements written by others (if they exist). At first glance, one may think, that's so true! even if you never heard it put that way before.
But in a world of 8 billion people (and billions more who lived before), people have said it. Now, although Grok has no experience with this phenomenon, it can find what others have said about how this relates to Murphy's Law, and even scientific experiments showing it as a phenomenon of entropy and probability.
I'd be fooling myself to assume that this is an original thought. It is a repeating phenomenon in the world that people inevitably experience, and repeat, and which Grok repeats.




I really value this piece, and have read it a few times now. I love how much it affirms and confirms a number of conclusions Mrs. Clarence and I have come to over the years about human cognition and self-awareness, about epistemology and datapoints, and more recently about AI and how to use it properly, if at all.
I also love how you've put voice to some conclusions we had not yet gotten clear about. Such as...
"This realization inevitably leads to disturbing conclusions about to what degree people are "thinking" when they say things. Most of the time it is not thinking but rote imitation of what others have told them, and other mental reverberations. It is largely automatic and thus, unconscious."
"So, it's not that Language Models lie per se, they simply repeat people's lies."
"In LLM's, it is very difficult to repress "memories" about a certain event or topic."
"Dumb people who ask dumb questions of Grok are no better off than they were when relying on tabloids or CNN or the latest viral influencer putting out 30-second opinions on TikTok. Ask a dumb question, get a dumb answer. The question is not "can you trust Grok?"—the question should be "can you trust yourself?""
So much great, chewy food-for-thought here. Thanks for fighting the good fight all this time, friend.
Clarence