In recent years, there has been considerable debate about how AI could replace all of our jobs. For a long time I completely ignored the subject; most of it sounded like pure bu11SH1T. I had already studied neural networks and understood the basics of LLM, machine learning, deep learning, natural language processing and expert systems. I had even tried some generative AI tools, but none of them impressed me; They were buggy, full of hallucinations, and constrained.
Last year (2024), I reviewed the topic more closely. A lot has changed, major improvements have been made, but still, nothing close to replacing a real software engineer, despite what many headlines suggested at the time.
Since then, I have followed the field closely, conducting experiments and trying to understand how this technology could affect our daily work. Here’s how this experiment came to life: I decided to use AI solely to redesign the visual theme of my website (rafaelhs-tech.com). The idea was simple: change only the appearance of the site. That’s how it was.
Technology stack
I originally built my site in 2021 using Angular (version 7 or 8, I can’t remember exactly). It was pretty basic, just a few components with a space-themed design.
The original version
This was the original version. The journey with AI agents began with Claude 3.7. My first message was something like this:
“This is an Angular XX application and I want to change the style to something with a cyberpunk theme. The folders are organized by components and each component represents a page.”
Claude scanned the project, found the SASS files, and started generating new code… but then he started to freak out: he created React files inside an Angular project.
I tried to fix it, but it kept drifting, breaking the app, mixing files, generating React code over and over again. So I scrapped everything and started over.
This time, I was more specific: I mentioned the Angular version, described the folder structure, highlighted key files, and gave a complete overview of the project. The result? Slightly better visually, but the UI is still broken. Misaligned pages, inconsistent styling, and worst of all: the code was a mess.
This type of departure was common. For those unfamiliar, mixing HTML, CSS, and JS in this way is bad practice in modern web development.
It wasn’t exactly surprising, but it reinforced a key point: these agents still lack a real understanding of good development practices. I continued using Claude 3.7 for a while, but it became frustrating. I would ask for one thing and they would give me something completely different. After more than 15 prompts to align a single button, I realized it would have been faster to fix it manually, but I committed to simulating a non-technical user experience.
I also tried CodeLLM, but it had another problem: it doesn’t retain long term memory. If the message fails or reaches the token limit, continuing on a new thread resets everything, out of memory from the previous context. That got really annoying.
Common hallucinations with Claude 3.7:
– I generated things that I didn’t ask for.
– Changes could not be reverted
– Removed everything and regenerated the code in loops.
I then switched to Claude 3.5, which behaved more reliably. He followed instructions more precisely. I continued modifying things and finally got this result:
The final version
After about 40 prompts, I got a decent visual result. But the code was still messy: CSS mixed with SASS, inline JS with HTML, strange file names, useless comments… nothing reusable. I decided to try other LLMs like Gemini 2.5, GPT o1 and GPT o4mini. The result was the same frustrating one, with each run producing noticeably different results (as expected).
Reflections
There is no doubt that AI is changing the way we code, learn, and approach problems.