25 March 2026

The babysitter problem in AI-assisted research

by Marcus Birkenkrahe

Last week I attended a very impressive seminar with Dr. Steven Ge on “Vibe Writing with Claude Code, Git, and Obsidian” (March 20, 2026), where he showed his workflow for generating manuscripts with the help of agentic AI.

What disquiets me is not the productivity gain, but the combination of opacity and the loss of process.

While I am a heavy user of AI for many of my teaching activities, I found myself unsettled by several of the implications of his demonstration. I’ve tried to capture that with the title for this post, which goes back to a remark Dr. Ge made: “[Claude Code] is like a babysitter for you.”

Being babysat is quite far away from where I like to be as a teacher and as a researcher — though I don’t think I do either of these as well as I should, so perhaps I need more help than I think.

As a teacher, I believe that speed and awesomeness should take second place to learning foundational skills. In my métier, computer science, this includes learning as much of the (often tedious) computational detail as possible — and then forgetting most of it, as I certainly do all the time.

What are the foundational skills that students perhaps don’t want to, but should learn given that AI masters most of them on the surface at least, and often better than its human counterparts?

I think that these foundational skills are almost invariant across disciplines and applications. They can be found in two books by Hungarian mathematicians that I’ve carried with me ever since I was a teenager, George Pólya’s “How to solve it” (1945) and Imre Lakatos’ “Proofs and Refutations” (1976).

Pólya’s¹ approach to problem solving is highly iterative and incremental but based on spending most of one’s time understanding a problem while avoiding jumping to a solution too fast. AI can help with the former but it will discourage the latter.

Lakatos² treats knowledge as an evolving system using error and failure as central learning mechanisms. He encourages the development of competing models of explanation: His focus is on the path and not (only) the solution.

These dual approaches - hovering over the problem until it becomes clear, and looking at it from many angles using failure as the guiding beam - have determined my own way of problem-solving for many years.

Confession: As a researcher, originally trained in mathematical physics, I don’t work with others easily. Perhaps irrationally, I distrust people who publish too much, and with too many others. As a writer, I prefer the life of a novelist who starts 50 novels and may only finish one, if ever. A goal like “one paper a week,” which Dr. Ge mentioned in passing, is not a goal I could share, though I applaud the experiment on procedural grounds (not because I think it will produce much value). After seeing his wizardry with Claude Code online, I believe he could do it.

But I also don’t have enough ideas. I’ve only ever had two, perhaps three good ones, which I keep reworking, partly because I am still interested in what I can build with them. I am more fascinated by building and making things than by finishing them.

Now I probably sound like an AI Luddite, but nothing could be farther from the truth: though I know Pólya’s and Lakatos’ positions very well, I still asked AI to summarize them for me to jog my memory. Otherwise I would have had to do some extra work, which for the purpose of this article might have frustrated me. But it would also have led me to reacquaint myself with both of these scientists, rekindling an intellectual relationship rather than just extracting bullet points.

The other thing that makes me uneasy, and it came up repeatedly during the seminar, is our lack of understanding of how AI works, and why it doesn’t when it doesn’t. This strikes me as non-scientific at the core. One might say that AI is just another Kuhnian paradigm—but if so, it is not driven by anomalies within the old system, but by the availability of a more attractive alternative. On one mundane level, AI is just another clown whose show is faster, louder, and more attractive than the old one.

Claude Code allows you to create agents that will do some of the work for you. But the way these agents work is not immediately transparent. This can be mitigated to an extent by providing strict guardrails (often with the help of AI), though the agent will occasionally, and unpredictably, escape the fence you’ve built for it. With this, the process itself becomes more distant: another layer of abstraction has been added. But this is AI abstraction, which is largely impenetrable in practice. I fail to see how this will not reduce insight at the level of individual results; and if that scales, it devalues the project as a whole, because maximizing insight and transparency is at the heart of scientific research. Instead, the emphasis shifts to the final answer alone. But throughout history, much of the value of research has come from the struggle, from false starts and stops. That is also where the fun is, not in being done. Like in cooking, which (ironically) I do not enjoy or practice — the value of the meal is not just in the final dish.

This goes deeper: the agents are capable of creating connectors to other apps, and running these apps autonomously when properly initialized. This hides the mechanics of infrastructure, a much maligned body of knowledge because of its tedious detail. Knowledge of AI infrastructure, however, is increasingly essential; its masters will be functional generalists rather than hardware specialists.

“[AI] is like a babysitter for you” could imply that we have become the babies, though we should remain the adults. Just an expression, of course, but a telling one that reminds me to insist on maintaining an adult level of control as much as possible.

Having written all of that, next week I am going to use AI in class to teach machine learning with Python to students who have not written a line of code. Using AI (in this case with Gemini in a Google Colaboratory notebook) is the fastest way to onboard them that I know - in a way I am using AI to babysit them.

To come back to the start: I very much enjoyed Dr. Steven Ge’s sense of humor, his open-mindedness, and his willingness to share his experience with AI. You should check out his Claude Code tutorial and the recording of his talk on 20 March 2026, by courtesy of its host, Dr. Blaine Mooers, and his team at University of Oklahoma Health Campus.

Home

Footnotes

¹ Pólya’s book was highly influential in developing awareness and mastery of problem-solving skills, and he was a teacher at ETH Zürich while John von Neumann, later a central figure in the development of modern computing, was a student there.

² Lakatos, the translator of Pólyas’s “How to solve it” into Hungarian, is remembered for his labeling of Darwinism, Freudian Psychoanalysis, and Soviet Marxism as “pseudoscience”.

tags:

Marcus Birkenkrahe

Articles, papers, and videos.

The babysitter problem in AI-assisted research

Navigation

Footnotes