The State of Code Quality with Saša Jurić
===

​

[00:00:31] Charles: Hey everyone. I'm Charles Suggs, software developer at SmartLogic

[00:00:36] Emma: And I'm Emma Whamond, also a software developer at SmartLogic, and we're your hosts for Season 15, Episode 5. We're joined by Saša Jurić, Elixir mentor and author of "Elixir in Action." Today, we're talking about code quality, how AI is changing the way we write, review, and maintain code, and how our workflows are shifting to adapt.

[00:01:01] Charles: Hi, Saša. Welcome back to the show

[00:01:04] Saša: Hey, thank you for having me

[00:01:05] Charles: Yeah. Always a pleasure. I'm sure most of our listeners are familiar with you and at least some of your work, Elixir in Action and other things, but, uh, for a refresher, could you give us a quick overview of your history and what you've been focused on recently?

[00:01:19] Saša: I've been coding, like, I wrote my first Hello World some 35 years ago. Started working professionally as full-time 25 years ago. Been doing a lot of object-oriented programming, C#, Ruby, C++ before that, JavaScript of course, and then in year 2010, my work took me to Erlang and then basically this is where it turns to this, like, beautiful functional concurrent programming.

A few years later I started dabbling with Elixir and that's my perfect stacks ever since and I haven't been doing much of anything else since then. Like for the past, let's say, what is it? Seven years already. Time flies. I've been working as an independent, I try to avoid the word consultant so I call it a mentor thing where I help teams adopt Elixir or work with Elixir more professionally or more, more reliably to produce more reliable code.

Yeah. That's roughly the idea.

[00:02:23] Emma: Before we jump in, I wanted to start somewhere more foundational because I think it actually sets up everything else we're gonna talk about. You've spent a big chunk of your career thinking about and writing about what makes Elixir code maintainable.

So I wanted to start there. In your opinion, what's the difference between correct code and good code?

[00:02:47] Saša: right. This is, uh, I suppose I could burn the entire time budget just answering that one question. So just stop me if I start to ramble too much. So, uh, right, we have this code. What is code? It is, uh, first and foremost, like an executable specification, uh, and out of running this thing, the program, uh, we basically, uh, then get our, uh, desired functionality.

But, uh, of course, it's more than that. It's, uh, a medium of communication. Uh, up until recently, it was a medium of communication exclusively between, uh, humans, and these days we also have artificial humans in the loop, but it's the same thing, right? So the code communicates how the author solves the problem.

And, uh, this is a very important thing because, uh, as we work with the code, right, so because software is, of course, changing, uh, we, we do a lot of changes in our programs over time, uh, we have to understand what goes on in there, how-- What's the current state of the code? How are the current solutions, uh, running, and how can we extend and expand these things?

So we read a lot of the code. Uh, I recall one of the m-most profound quotes that I have read from the book, which is called "Code Complete" by Steve McConnell. So that's like, I think probably some-- The book is 25 or 30 years old, something like that. Uh, and it has this quote where it says, like, uh, ""You read code much more often than you write it, and so optimizing for the write speed is a false economy," uh, in general.

And this doesn't change with AI. I, I assume that we're gonna talk a lot about AI this, uh, episode, right? So AI also reads or LLMs also read the code, and they can also get confused if the, the code itself is confusing, right? So code is information, and the quality of the code, uh, how we organize it, uh, is basically how we organize this information, and it affects our ability to move forward, to work with this thing.

Uh, uh, we have all seen like terrible code bases, I presume, over our time, and, um, we probably will not like to admit, but we produced some of those things, uh, and it's a part of the process, right? Uh, so like, but I have seen like, uh, really some terrible cases where, uh, where you could really see how the quality of the code affects your business opportunity, your opportunity to make some changes in it. Uh, like there, there, were literally cases where we couldn't do what was asked of us. I mean, we could like in very large amount of time, uh, like in years or something like that, uh, but that basically I like to describe the situation where the code has a minds of its-- mind of its own. It becomes this beast which you cannot really direct anymore, right? It, it owns you. You don't own your code base anymore, right? And so the quality of the code is really important if we want to produce business value. And I wanna stress this because I see even before LLMs, and especially these days, there is this like false dichotomy Uh, which, uh, somehow says that like, or I see many people say like, "I care about delivering business value.

I don't really care that much about code." But these two things are not in opposition. Uh, if your code is in a terrible state, then you've, you're failing to deliver business value in the first place, right? So yeah, I would say that the quality of the code really, uh, allows us to work with this thing reliably and move it forward because, of course, it's a process.

It's not a one-off thing in most cases. Uh, and what makes a good code is, of course, um, I don't think it can be precisely defined, but, uh, very broadly speaking, uh, it's about how you organize this information because code is information. So you want to have things like modularity. You want to split things across modules and functions.

Uh, you want to have like stuff such as cohesion, so the thing which belong together, things which belong together should be in a single place. Things which are kind of independent should be separated from each other, so cohesion and, uh, loose coupling, right? Uh, and, uh, the whole... Then you want to clearly express your ideas when we talk about lower level, you know, such as that someone reads the code, they should be able to relatively easily understand how is this problem solved.

You know, this is, to me, more important than the, the correctness itself. Like I have seen correct code which I couldn't understand, and then I cannot do anything about it, right? But if I, I can understand your code, then I can maybe understand that it's not correct and, uh, do something about it, right? So that's more important thing. There is this clarity, right? This is the, the general idea. Uh, so that's in broad terms how I would describe the quality of the code. 

[00:07:07] Charles: If low-quality code is making it into, say, a shared code base over time, how does that then impact the future development of that project, its evolution, its tech debt?

[00:07:20] Saša: It's, it's terrible, right? I had situations where I just looked at the code and, uh, there was no way I could understand it. So that was before LLMs, and, uh, admittedly, these days, uh, LLMs can help, uh, a bit with that, or not a bit, they can help a lot with that. Uh, but, uh, still, you know, uh, terrible code or bad code or cryptical code really, maybe that's how we should call it, is difficult to move, right? So you end up with a situation where you, like, fix a bug and then as a result of that, another bug appears somewhere or things like that. You have, like, this, like you have this ship where you, you know, plug one hole and another one pops up somewhere. I've had a bunch of those situations, right? then, uh, there's just this kind of lack of, uh, conventions and uniformity. I worked with code bases where we didn't do code reviews, uh, actually quite a lot, and you could literally look at the code and tell, like, if this code is, comes from Alice or Bob, you know, just by looking at the, the kind of a style. Uh, so, uh, that, uh, when you don't have, like, this kind of conventions, then anyone else who joins the team is gonna, you know, plug their own conventions because why not, right?

Uh, and this just leads to this further, uh, entropy, if you will, you know, uh, of the total code base. So, uh, yeah, I think that we have to pay, pay attention to, uh, to the quality of the code before it's actually merged.

[00:08:40] Charles: Goatmire Conference last year, you gave a, a wonderful about the, a theater of PRs. And if y'all haven't watched this talk or if you weren't at Goatmire, I suggest go and find it. It's a, it's a, good talk, um, and Saša keeps it interesting.

Uh, if you could help us understand the, the theater metaphor there? what, What is the performance that everyone is putting on, on this, this world stage, and, and who is that audience?

[00:09:07] Saša: So, uh, yeah, the talk is titled tell Me a Story. Uh, the theater metaphor, uh, was actually, uh What's the word that I'm looking for? Made, I suppose, said or written by Meks McClure, who wrote, who... So Meks saw this talk live, uh, and then wrote a blog post about it. I think we're gonna hopefully link both the, the talk link and the post itself.


[00:09:29] Saša: but, uh, yeah, the, the, the nice thing or interesting thing about this talk is, uh, I gave it, uh, five times, I think, last year, and the first delivery and the very last one was in a theater, right? So, uh, I wanted to do... Like, this was my first time talking in a theater, and I wanted to kind of do something about it, honor the beautiful stages, uh, and so I, I made like, like a little bit of theatrical.

I like to call this talk a monodrama. It's a one-hour talk, uh, but it's fun. Uh, so, uh, personally, for me, it's the best talk that I have given, uh, my favorite. So, uh, you know, grab some popcorns, uh, maybe some drinks and, uh, some friends or teammates and, uh, watch it. It's fun, but there is, there is a theater metaphor in it, uh, actually, uh, or I talk about this theater of pull requests and code reviews, right?

so the whole idea is, like, my feeling is that there is a lot of this just superficial, shallow ceremony in code reviews where we basically just kind of read this code, maybe make a couple of comments, and then we LGTM this thing without actually properly understanding. Uh, like, the question that the talk asks is, like, do we actually make a significant difference there with those code reviews? Like, why are we even doing code reviews? Uh, in many teams, nobody even ever told me, like, what is the purpose of a code review, you know? Uh, like, is there some list defined, you know? Like, what are we trying to achieve with this thing? No, we, we just assign the thing, and then someone makes some comments about, I don't know, this doesn't satisfy SOLID principles or whatever, and then you fix this thing, and then they approve, and now we're good to go.

Uh, so, and, uh, there are, of course, a bunch of reasons for that, but, uh, I argue that, uh, really two big problems are we make too big pull requests, which are, in my opinion, impossible to properly understand. Uh, and as a result of that, the reviewers are either gonna LGTM it or they are gonna make a couple of like, um, low-level, nitpicky comments, so they kind of get a feel that they made some contribution, right?

Uh, and then another problem, and, uh, that is really a big problem even when the pull request is small, is that, uh, authors frequently just don't uh, tell the story in incremental steps. They don't use commits to tell the story, right? So either you get, m- most often you get like one commit, like implement the entire feature.

Uh, and the problem with that is like the bunch of different changes are in the feature and then, uh, understanding this is difficult because, like, I read the diff, uh, and then there are like a couple of lines of code related to some refactoring, a couple of lines of code add some new field then more refactoring.

And it's all presented in a single thing. It's all like, uh, intertwined and tangled together, and it's very difficult to understand, right? Uh, I had problems, uh, understanding pull requests which were like 100 lines long, you know, because a bunch of different stuff was just, uh, cramped together. Uh, so either you get like this one commit or you get like a bunch of, uh, stream of consciousness commits, you know, where, uh, people just kind of commit randomly.

You, you have titles like foo, bar, baz, blah, make it work, make it really work, you know, make it really, really work, the famous walk of shame. And so, uh, I think like predominantly in our industry, we don't pay attention to the history, uh, of our commits, right?

[00:12:47] Charles: sounds like Vibe commits

[00:12:49] Saša: Precisely. So yeah, uh, basically, you know, the talk could be, uh, condensed into the elevert-- elevator pitch like make small pull requests.

A pull request doesn't have to implement the entire feature. I frequently split the feature across multiple smaller pull requests. Sometimes I can see this upfront, sometimes I don't. I just work on the thing and, uh, I see like, okay, this is becoming kind of large-ish, so I should look where can I wrap it up and, uh, submit this thing. I'm gonna write in the description like this is, this implement parts of the feature, this is what I implemented, this is the roadmap, you know, a couple of bullet points. Uh, uh, and then that's gonna give you like the basic idea, and so you can work on that while I resume working on the follow-up, right? Uh, and I pay a lot of, uh, attention to making the commit, uh, commits, uh, tell the story incremental.

I-I'm not perfectionist about it because it's counterproductive. But you can do a lot in, with very little or very simple tools. Uh, like the one thing that I mentioned in the talk, you know, work in small steps, right? So think a little bit about like how am I gonna approach this, you know? That's basically how you do work anyway.

You, you cannot implement everything at once, right? So you think about like, what is my next step? And then you implement it, and you commit it. So what is the follow-up step? And then you implement that. And, uh, moving from there, you basically get already like 80% to 90% of the nice history. I, I sometimes when I, uh, as I'm working, of course, I realize, oh, uh, I kind of, I don't know, I should have done this earlier. Like maybe I've chosen a bad name, uh, or something like that. And so I try to amend the history if it's simple. Uh, if I get a merge conflict, then I'm not gonna bother. Uh, so but the whole idea is that like the history doesn't, uh, doesn't go in circles, you know. That, that's kind of difficult to read. So I try to keep it like ideally straightforward from point A to point B.

If there are some small deviations, that's fine as well, right? So that's the general idea.

[00:14:41] Emma: And with the-- now the use of LLMs and the extreme volume of code coming out of AI usage, are we seeing more of this theater of pull requests, this kind of rubber stamping and reviewing?

[00:14:57] Saša: okay, first and foremost, like, uh, I use a lot of AI. Uh, I, I may come off in this episode as a sort of, a, AI skeptic, but, so for the record, I use it quite a lot, of course, and, uh, I think that these days, um, it's probably like 90, at least 90, 95% of my code is vibed, you know, if you will. So, um, hopefully we're gonna touch on those subjects, right? Uh, with that in mind, uh, at least looking at like, uh, bunch of, uh, in-informations or posts on social media, thoughts and blogs and whatnot, I have a feeling that we have been given this magical pill, which, uh, makes it possible for us to be much faster at the expense of quality, right? And so we just, uh, um, when I, when I just see the texts that are coming out, the posts, uh, even the code, I can think of only three words, you know, slop, slop, slop, right? So, uh, I know that this thing makes us faster, but, uh, if it makes us faster at the expense of quality, that's not necessarily a good thing, and it feels like at least the people who are loud about it, uh, have kind of lost the balance here, right? So, uh, yeah, I definitely agree that I think that we, uh, we get a much more of this theater of PR, right? So how does it even look like? Uh, people vibe, right? So, uh, teams vibe. So they, uh, uh, I don't know. They, they write some specifications, some markdown thing. They have like this, uh, probably harness which builds the entire thing. So you have, uh, whatever, I don't know, multiple agents. One is the architect, another codes, and then there are like some who reviews. and everyone says, like, human is supposed to be in the loop, right? But so, like, this machinery, for the lack of more explicit word, uh, produces, like, spits out, I don't know what, thousands of lines of code. 

What is gonna human do in the loop? Like, when you give me hun- thousands of lines of code, it's like I can either say I cannot do anything about it or I can LGTM it, right? So it's too late. 

[00:16:56] Charles: I heard a hot take the other week of we shouldn't even be reading the code anymore

[00:17:01] Saša: Sure, if you want, want to do it like that I suppose. That's one way of doing it. But, but like, I mean, I feel that the quality of software is going to degrade or is already degrading. Like, uh, when I go to the, these web interfaces of, uh, I predominantly use, uh, ChatGPT and Claude, I see a bunch of these, uh, small bugs in, uh, the most obvious of places, right?

So it's not some obs- obscure functionality, you know? Uh, and then like, uh, Claude Code, you know. So Claude Code, uh, we had recently, what was it? About a month ago or so, there was this big leak of its source code, uh, right? So I like to think that maybe LLM itself wanted to leak it so it spread around, you know, so it's a proof that it became self-aware maybe.

Uh, but you know, uh, bad jokes aside, so like, uh, what we know from that leak is that, uh, there are like 500,000 lines of code in Claude Code. Uh, I'm gonna wave my hand a lot here, but it seems to be a bit on the high side, and I'm kind of skeptical that there is a lot of slop there. And then in general, I have seen, uh, people complaining about a lot of, uh, duplication, uh, in the vibed code, a bunch of multiple different implementations, uh, of similar things.

Uh, then there is the thing that, uh, if you go to like GitHub of Claude Code, Clo- Claude Code, right? You're gonna see about 10,000 issues. Uh, I mean, I wonder, I don't know, don't they use agents for that so they-- Aren't agents supposed to like fix this thing automatically? Do, do like, uh, developers Anthropic, uh, they, they get like just pro subscription? They don't have max or whatever? Uh, but you know, again, bad jokes aside, 10,000 issues, 500,000 lines of code, that's like, uh, one issue every 50 lines of code. 

[00:18:46] Charles: episode right there. We're good I think I rest my case, you know. I'm gonna let you connect the dots, you Uh, uh, yeah. I mean, and this is like the poster child of vibing, right? 

[00:18:56] Emma: Yeah, absolutely. Yeah. 

[00:18:58] Saša: So like, do we, do we want to do it like this? You know, or maybe, maybe there are other ways, right? So, uh, uh, I vibe, but, uh, I vibe in small steps. I vibe in micro steps know? Uh, so, uh, I don't know. Did you want to talk about that or do, do you want to take this conversation elsewhere? 

[00:19:16] Charles: I think that it makes sense right now to kinda go into talking about, well, if we have this volume of code, if we have all these agents doing all of these steps, then what does it mean to have a human in the loop? And what does it mean to still leverage, let's call these power tools, going from a handsaw to a, a circ saw or something, or a table saw.

How do we leverage these power tools to effectively produce high-quality code or even improve the code that we're delivering and shipping, not just maintaining at a higher volume?

[00:19:50] Saša: circling back to that talk that we mentioned, you know, the point of my talk, uh, and I was myself kind of worried, you know, we are trying to figure out this AI thing. Uh, and I'm doing this talk in just ar- at the time where vibing becomes a thing, right? So I was preparing it around February last year.

First delivery was April last year. Goatmire, which is the best version, was in October, right? Many things have changed, uh, uh, in that period and since then as well, right? So, uh, I still think that we predominantly produce too much stuff to be reviewed. That was true Before AI, and now with AI it's just hyper-charged, right? Uh, so if we want to be kind of responsible and, uh, you know, w-want to do things in a better way, I think we should kind of maybe take a step back and think about like, how can I use this tool to provide some quality work? You know, uh, something that, that can actually be worked with further, something that has less bugs, less security issues, uh, less injections and, uh, whatnot, right?

And, uh, I always come back to this, what I said in the talk, you know, work in small steps, right? Because when, whenever you give me thousands of lines of codes, thousands of lines of code, this is barely reviewable, if not unreviewable really, and going above that is, uh, is really, uh, even worse, right? And I, I honestly don't see reading all those texts about, uh, harness, uh, how human can be in the loop, uh, when, uh, we basically, you know produce so much in a single thing.

So I think we should work in smaller, smaller steps. Uh, so make small feature, make small commits. Uh, the way I typically do it myself is, uh, these days, uh, I basically keep it on a tight leash. So, uh, I don't vibe a thing, uh, the whole thing, right? I, I basically... I don't know, it, it's probably not called vibing, but vibe is such a cool word, so I use it. just, uh, do one commit at a time, and there will be at least one prompt per commit, and my commits are like uh, micro small. uh, there is a repositor... Like, if you watch the talk or read that blog post, you're gonna see one demo repository, and, uh, I will also send you a, uh, a link to one pull request which I did some, uh, I think six years ago for Oban. Uh, actually Oban people, Parker and Shannon, they saw my talk. They came here to my home city to give a talk, and they saw my own talk, and then, uh, Parker remembered that I did, like, a pull request, uh, which is non-trivial, so that's, like, a more real thing. I did a non-trivial pull request, uh, for, uh, for, Oban, and it's actually split also into smaller commits, and so you can study the real case, right? And so, like, I do these small commits, uh, uh, uh, and basically I would make, like, one or two or three prompts, right? So, uh, I would say, like, "Write me a, a GenServer," for example. Uh, so let, let's make it more, more concrete. I did, like, recently a thing. It's called an SMPP, so this is, like, some networking protocol. Doesn't really matter for this story. Uh, there is a library which you use, and then on top of this library I have to implement some behavior. And so I said like, "Okay, implement me the basic thing, the most minimal thing, that connects to the other, uh, the other side, and write me a, a happy path test." And that's what it did. And then I said, "Write me a te- test when it's not connected," and that's what it did. and so that, that, was, like, maybe 20, 30 lines of code or something like that. Commit. Uh, next prompt, uh, okay, so now we actually have to send some... It is called a bind message. It's kind of like a login, if you will. I send some credentials, and then it's gonna respond asynchronously. So write me, write me this basic thing which just sends, uh, something. And, uh Then we had to actually implement a mock remote peer. And so I told it, implement basic, uh, re- re- mock remote. I added like an example of what I want to... how I want to use it in a test, so the usage examples. And so again, I got like maybe 20 lines of code. Commit, you know, uh, and then, uh, now we have to send a re- asynchronous response. Send me successful response. Write me a test for that. Commit. Send me an error response. Write me a test for that. Commit. You know, and so, uh, these are, uh, such small steps that I can always, uh, look at the code quickly. I can understand it quickly. Very frequently, there have to be some tweaks. You know, it goes, uh, it can go like kind of make some more complications, so to speak, you know. It make things more complicated, uh, uh, and when I work in those small steps, I can quickly see it. I usually then don't ask it to fix it. I just fix it myself. I say like, "Okay, I made some small tweaks, small simplifications. Refresh your context. Move along." And, uh, it's a nice smooth session. Uh, uh, like the... When I do it like this, uh, one time I actually tried it with Haiku, and it was working. You know, you don't actually have to use like these, uh, expensive models necessarily. I would necessarily say that Haiku is always good.

You know, Sonnet will probably be fine. These days I'm mostly m- more on Codex, uh, anyway. Uh, but, uh, when you work, when you do it in these small steps, it has less room to get lost, right? And it gets lost quickly, no matter how powerful it is,

[00:24:54] Emma: So I'm hearing keeping it small, single responsibility principles. So do you think pull requests as we are using them today still make sense, uh, in the age of AI-assisted workflows? Do you think it's even more important in the review process itself outside of keeping the commits small? Uh, what are your opinions?

[00:25:18] Saša: Yeah, I think pull requests are always important because it kind of gives you a rounded story of what you're bringing to to the main branch, right? Uh in that context, it is always important, even if it is terribly done. Uh, if nothing else, you can easily revert it, right? So there is like a side discussion here or side point, uh, which is whether, like when we're merging, whether we should be using explicit merge commit with two parents or whether we should squash merge or rebase.

And for the record, uh, I wanna say merge commit is the way always, and this is the hill I'm gonna die on. Uh, so there's like zero discussion about it, right? Uh, but anyway, when you, when you, merge this thing, when you have like pull request, it's a unit of change which is brought to the main branch. And of course, it is a place where we can gather and discuss like, uh, these approaches and these ideas.

I think the code review, reviews always make sense, uh, because human has to be in the loop, and so human has to understand this thing, and ultimately we have to approve, you know, uh, are we gonna let the AI just do this thing, uh, autonomously? I don't think we're there yet. I don't think we're nowhere near there yet.

But like, uh, uh, I mean, if we do that, maybe I'm gonna stop using software altogether, if possible. Yeah.

[00:26:31] Charles: talk a little bit more about using the-- using LLMs when it comes to the test aspect of what you're introducing into a PR. Sounds like you're, you're focusing on just kind of telling it what specific tests to write in terms of like what scenario to test like test. the happy path, test this particular problem, test this particular problem. Do you keep that pretty scoped? Do you use the same tool that wrote the code under test to write the test code, or do you use a different LLM for that? How, how do you seem to get better results versus other approaches? 

[00:27:08] Saša: I use very low tech setup. I just Have a single session and, uh, that's where I do my thing, and maybe then I just reset the session when I wanna reset the context. I think that, uh, from what I was reading, uh, uh, on social media that, uh, Zach Daniel of, uh, of the Ash framework and, uh, with Chris McCord also, they don't have, like, these elaborate setups and whatnot, you know? So hopefully that gives me some sense that I might be on the right track, that there are, like, other people. Uh, so in general, uh, when you talk about tests, uh, so I have been reading a lot of, like, rave reviews, like, "Oh, LLMs are great for t- writing tests." I have found them to be, like, relatively underwhelming in that regard. Uh, same thing or even worse for documentation. Uh, though I have to say, I'm sorry for being mean here, but given the kind of quality of tests and documentation that I have seen in my lifetime, I can understand why people, uh, praise LLMs, you know? But that's saying more about, uh, how it was written so far than, like... yeah, sorry about that. Uh, but, Uh, i'm just gonna call it like it is. Um, I think I have reached the, the age where I can be called an old man yelling at the 

cloud,

[00:28:27] Charles: It's a training data problem maybe

[00:28:32] Saša: Yeah. Yeah. it definitely is. But, uh, I mean, like, these things aren't critical thinkers, you know? They, They, are, like, good information synthesizers, and they can ma- match patterns and whatnot, you know? But, uh, this is why I also think that they are definitely insufficient for reviews. That doesn't mean that they are bad. They, they can, uh, discover a bunch of issues that we wouldn't discover. You know, it's an aid, right? But it cannot replace a human. Uh, but again, I wanna come back to this thing working in small steps. When you work in this really small step, then you know, like, okay, I have added this thing, uh, and I should test, like, this small behavior. I have added another thing, I should test this another behavior. So I, uh, frequently separate the happy path from the error path, uh, both in the implementation and then in the test. And then I don't have to worry about, uh, much about, like, uh Which test cases do I have to come up with? They, they come up as we develop, right? Uh, and, uh, uh, another thing speaking of tests, I have this-- I ha- had a whole workshop which I'm not giving anymore because, uh, apparently nobody's interested because I think LLMs are, uh, you know, they're just gonna wipe it everything, right? but, uh, I feel that the tests are usually written in a very poor way, uh, in the sense that there is a lot of mechanical noise thrown in there, uh, and the test fails, and I find myself like I... Most of the cases, I find myself looking like, "What is this test even testing? Uh, why is this test failing? Is it a bug? Is it a problem in the test?" And I spend like a lot of time trying to even understand the purpose of the test because there is so much noise there. Uh, or there is like assertion, something has to be one, two, three. Why does it has to be one, two, three? Where is it written? And you don't find it. It's like buried somewhere in like deep fixture, God knows where, right?

And so, uh, many people say to me that they don't like writing tests. So writing test is one of my favorite activity because, uh, Tests are the exercise in good, clear, concise communication, and code is communication, as I said. Tests, you, you have to bring them into very three simple steps. Given this, when I do this, that should happen, you know. Given when then is, is pattern code. It doesn't have to be three lines of code, but it, it has to be the clear flow and all the information that you need to understand the purpose of the test has to be there, and nothing else has to be there. And then you basically, what you do is you write small wrapper functions, uh, helper functions, which I put as private in the test case module, and then maybe they go to test support, uh, as they have to be reused across, uh, different cases. Uh, and this is how I evolve my internal test, uh, test API, uh, and it's such a great experience, uh, and, uh, this is how I do it with LLM too, you know. So I... It writes like this, uh, bulk thing, and then I tell it, "Okay, I actually want it to look like this," then it refactors and, uh, then we maybe write another test.

And once we have these patterns established, then I can maybe give it a bit more space like, "Okay, given this pattern of code of how this looks like, maybe you could cover more functionality with test yourself," you know. So that then you can actually be faster. 

[00:31:32] Charles: Are you using like a, a tDD, a test-driven development approach with LLMs, and is that an approach that you used prior to using these tools?

[00:31:41] Saša: That's a good question. Yeah, so I never use TDD. Well, I mean, I tried it of course, but, uh, I just couldn't work like that. Um, my mind is like okay, I wanna first implement and then test. Uh, but I do it again in small steps, right? Uh, is-- So in terms of, uh, LLM, it, I guess it makes sense to try to do TDD, uh, with it if you want to give it more, uh, room, right?

So if you want to like, uh, put it to work for, I don't know, whatever, 10, 20 minutes or more, right? That doesn't happen to me. My, My, prompts, uh, take like up to one minute or two minutes at the most to, to finish, you know? So, uh, give it a larger chunk of work, I can see where it would make sense that it first maybe writes a test and then uh, actually tries to implement it. Yeah.

[00:32:29] Emma: The testing should be the living documentation, 

[00:32:33] Saša: I, could probably have a whole episode about, about tests, but I just feel that in this Vibe era, no one really cares, right? Uh, same thing for documentation when, when, we talk about it, right? but, uh, but yeah, th- they just have to be very concise and, uh, so one thing that I have, uh, I have noticed with tests, you know, that, uh, I think that actually from this limited experience, uh, LLMs, uh, they're, they're like worst areas perhaps in tests and documentation as well, right? Uh, so the, the thing that I have seen, like, uh, I have a colleague who like, uh, vibed, uh, something and, uh, then I review it, uh, and the, the test part was really, uh, where it was terrible, right? So the tests were taking too long. Uh, it actually tested some internal implementation details which were completely, uh, you know, irrelevant. It didn't test the behavior. That's, that's one thing that I heard. Uh, there was this powerful talk, it's called "TDD: Where Did It Go Wrong?" by, uh, Ian Cooper. Uh, and, uh, Ian said in this talk like, you know, "Test behavior, not implementation details," and this is a very powerful, very important statement. And, and like LLM went all over the place. Um, it didn't, uh, use different timeouts in, uh, test environment, and so tests for this relatively small service were taking like 15 seconds. it generated modules at, uh, dynamically for each test, like unique module, which was completely unnecessary, and that's like Opus generated code, right? So we're talking about the prime thing here.

Uh, what else was in there? Uh, some compile time macro magic which was needless, uh, using agents where you could use just plain data structure and so on. Oh, and my favorite was, uh, was like a, a block of tests, the like describe block I think, which was about telemetry. and Uh, and it added if, uh, so only run this test if telemetry module is available, but it was not available because the telemetry was not added as a dependency. Uh, you know, so you shouldn't even do it. So like those tests were not running at all, right? 

[00:34:28] Charles: Yeah, I've seen a mess in the tests from, from LLMs. Tests that, like your shirt, they're always right, but there's nothing to them. They're hollow or yeah, a lot of duplication and

[00:34:41] Saša: yeah

I think we're kind of just trying to throw, like I see that people are trying to throw mutation testing. Of course, it helps, right? But, uh, uh, what, what really helps is also critical thinking, you know, a human reviewer who actually, like, can see this, and they can spot those problems if they are given reasonably small amount of information to review, you know? So, uh, working in small steps has, has always been a universal advice for anything, uh, really, not just programming, right? So, uh, so yeah, I think that we really need to be more working smaller increments. Uh, that's the agile approach after all, and, uh, still keep humans

[00:35:19] Emma: I've seen that too, just verbose, hyper-complex, or I've had the issue where I'll ask it to fix a failing test, and it fixes the test so that it matches the, the failure. It's like, "Oh, great, that didn't help but I guess the test is passing now," right? Um, so in this age Can AI even know what good code is, given that sometimes goodness is contextual to a team and a code base?

What do you think?

[00:35:51] Saša: Yeah. I I don't think we're there. I don't think we're there. Uh, I don't, I don't really see it becoming there, going there anywhere soon. Like, this is, not a critical thinker again, you know. It's, uh, basically a fancy pattern matcher. It's, it's great to be clear. Uh, but I have found whenever I kind of ask it to review something for clarity, whether it's code or text, it was mostly a mixed bag. Uh, and ultimately I just gave up at doing this because, uh, it, uh, spends my mental, uh, potential or mental energy, uh, just, you know, reading those things where like, okay, this actually doesn't make sense. This maybe makes sense. Uh, I didn't really find I was getting a lot of benefits from that. It didn't spark joy, if you will. Uh, and so, uh, I personally don't think that, uh, AI is good reviewer in that sense. I think it's great in finding bugs, so that was really, uh, it found like some, uh, problem, problem edge cases, things that I wouldn't notice that maybe were not even covered by tests. So this is where I, I find it's really good. But as, uh, in term of code quality and clarity, no, I think that this is really, uh, precisely what you said. Uh, it's about, you know, knowing the whole context. It's about having this, uh, professional experience if you want also of what works, what not, in which situation. Like, I mean, big part of code quality is that you want to have solutions which are neither over nor under-engineered, and, like, that depends heavily on the given context, you know. you know. you want to have this simplicity. Uh, people struggle with this, uh, like more junior or even like, uh, uh, early senior developers, uh, and it takes time to, to kind of get that feeling. I don't even know how to describe it other than just, you know, uh, give you a case-by-case example through pull requests 

[00:37:42] Charles: what what tools... You said you have pretty low tech, simple 

setup, um, and you mentioned you're, you're largely using Codex these days. What, what, what tools... are you using? Do You use Tidewave for Elixir work? Do you, um, how do you approach it? And when do you decide this is something that I'm gonna write? 

[00:38:02] Saša: because I'm mostly completely on the backend side of things, uh, I didn't have the chance to, to do Tidewave, but, uh, I would, I would use it. It makes sense to me. other than that, uh, so when I said low tech, I mean the lowest possible of it all. I don't have skills, okay? Uh, I don't, uh... Well, I didn't have need for MCPs, so I just didn't do it, right? Uh, but like those skills, th- this is also another one of my pet peeves, you know. I was looking at, uh, various examples. Bunch of them seems to be vibed anyway. I'm just looking at like this, this markdown list, like what is this thing?

You know, like you're an world-class expert. Uh, I think there was one skill for Elixir which actually said like, uh, among other things, like write code like Saša Jurić. And, uh, I was thinking to myself, like if this thing looks at my source, open source repositories, you know, like, uh, that's gonna be terrible code. People don't write code or don't tell your AI to write code like that. Like I do have ideas about good code, but not my open source, at least my earlier projects are not necessarily written well, okay. Uh, so yeah, I don't do any of those things. Uh, I have like, uh... Oh, I, I just closed my ID, uh, but I think that like my agent's MD is maybe 10 lines long or something like that.

Very, very, uh, very basic things like, uh, I have some mix alias which runs on CI and I tell it like, "Run this thing before you're done," which like checks everything, you know, the formatting, the Credo, things like that. I think that these tools are actually becoming more important than ever, you know. So Credo, like write a good Credo setup, uh, with all those, uh, uh, style, styling rules so, you know, you don't have to explain it to, to the LLM, right? When it comes to, uh, manually writing code, uh, I pretty much ask, like if I have to implement some chunk of code, I'm gonna ask the, the LLM to do it for me. And, uh, then when I'm fixing it, uh, I may usually if, if there are like small fixes, it's just quicker to simply make the fix manually rather than, actually ask the LLM, right?

But I still feel, uh, and this is a very interesting thing. So I feel that like in this flow I'm maybe, it's hard to say, 10%, 20%, 30% faster typically when we're talking about building new thing, you know. There are, there are things where LLMs can be dramatic-- bring dramatical improvements. Like I know you have to upgrade to the new version of Elixir and Erlang and dependencies, and you just do this thing on the side.

Like, I will vibe this thing as well, you know, and look at the final result. so that kind of thing. F- uh, try to fix me this bug, right? That's also something which I would just give it and see where, what happens, you know, at least. Um, what else? Like if you're doing repetitive work, you, you have like 10 forms. You wanna build form number 11, which is in the same style and everything, different fields. Like, uh, that's where I would vibe it. I would say like, "Okay, use this form as an example. Use this pull request as an example of how I want my commits to be structured, look at the test, blah, blah, blah, right? but, uh, what I want to say is like when we're talking about this kind of non-standard, non-repetitive work, I'm building it just in small steps, and what I find more important than speed, we are focusing too much on speed here really, is that, uh, I am very close to the code.

I still feel like I'm coding. J- I'm just not typing every character manually. Uh, but what I feel is my mind is not so burdened with all these low details. I don't have to look into the documentation, how exactly to call this function, you know, uh, things like that, right? So, uh, kind... The LLM does this thing for me, and it feels like I have moved to just one level above, like typical coding, and I find this actually quite refreshing. So it feels like I'm coding, but I don't have to pay so much, uh, attention or keep so many things in my mind. So it feels like it's a bit less intense and almost like I could do it for a longer period of times as a result of that, and actually focus more on like the quality of the work, because this is the thing that nobody talks about.

You know, everybody talks about being faster, like, uh, but nobody talks about like doing better work. Maybe we should do less work better rather than more things faster, you know? 

[00:42:01] Emma: Yeah, you always are given the, "It'll improve the speed of your, of your work by 10 times. You can do 10 mi- 10 times more." And you're saying maybe it's 10%, 20, 30% more, not 10 times. you voiced this in a recent X post that we had stumbled on about some of, AI reviewing code with issues, which we talked about. Can you walk us through your first impression when you opened that PR? What was the moment you realized that maybe this wasn't gonna be a normal review? Well it 

[00:42:34] Saša: is a normal review actually, at least for me. Uh, it is, uh, like a... No, I'm kidding. I'm kidding. Uh, but, uh, it's not uncommon to have this kind of, uh, thing, especially, uh, if I come with a new team, you know, where, where they're kind of not used to this thing, uh, or, or used to like this way of doing that I propose.

Uh, uh, usually the feeling is always the same. Like I, I look through this, you know, maybe start making some comments and something, uh, along the lines of about comment number 20 or so, I realize that like, okay, this is actually not working because I'm gonna make a bunch of comments, and even if they fix those comments, I'm still not gonna be convinced that we're better.

I'm kind of just seeing a bunch of like, if you will, death by a thousand paper cuts, you know? So there is this point, and again, this is something that ki- kind of comes with experience, uh, but there is this point where I feel like, okay, I think the best thing I could do is take this thing and try to refactor it myself, right?

Uh, and that's what I did in that, uh, post, which I think we're going to hopefully also link. So the context of the post was like, uh, I was given this pull request. It was vibed. Uh, it was about 1,000 lines of code. It had a bunch of issues, some of those I mentioned with tests. It had a bunch of other issues like, uh, uh, too, too complex state in the GenServer.

There were like redundant pieces of information that were needless. Uh, there was supervisor running under GenServer, which was, uh, completely confusing, and it wasn't needed at all. Agent was used in some place instead of, uh, plain data structure and so on and so forth, right? Uh, and but so I'm, I'm ranting a lot on AI, and so I have to just, uh, say as well, this thing was built by s- by a developer who's otherwise experienced but not experienced with Elixir, and there is no way they would be able to produce something like that, uh, in a reasonable amount of time.

So it does help really, right? Uh, uh, and in general, I would say given the kind of code that I have seen pre-vibe, I'm still kind of thinking that, uh, LLMs are on average better than humans. Sorry about that. That doesn't mean they're good enough, uh, but, uh, uh, I have definitely seen much worse code produced by humans.

[00:44:51] Charles: That's just a statistical, uh, answer anyways, right? 

[00:44:57] Saša: Exactly. Yeah, it's, it's median, right? I suppose. Yeah, but, but anyway, so like I refactored and I vibe refactor it, right? So I didn't understand this code completely and, uh, I just noticed that I'm making all these comments and I still, still don't understand it, and I'm not sure exactly what, what should we do.

And so I just, you know, so like I'm gonna, I'm gonna try it myself, and I fire up the session and I see one issue, and then like I noticed, do we really need this thing in the state? And it says like, "Oh, it's actually redundant. Okay, let's remove this thing." and again, small steps, you know, you just carve it off, and I do this thing, commit, and like going first...

Well, the first thing I solved was like these tests, uh, because they were taking 15 seconds and, uh, I wasn't gonna have any of that because that slows me down. And so, uh, first we fixed the test run, and then we were just, you know, I was just looking one issue at a time and, uh, it just went from 1,000 lines of code to 500 lines of code, right?

And so I think that this is a huge saving, uh, and bear in mind that, uh, those LLMs, they are human-like. They, they get lost if the context is too large, you know, uh, as well, right? Uh, they start to hallucinate more or just simply, uh, ignore things that you tell them, you know, even if it's in screaming caps and whatnot.

And so, like, you wanna keep things simple for them as well, right? So, uh, it's pretty much a similar thing. So whoever says they don't care about state of the code, uh, I think basically is just trying to get fast to the market and grab some cash. I'm sorry for being cynical and blunt, but, uh, I cannot see it any other way.

You know, that code has to be in a good state so we can actually move this thing forward.

[00:46:33] Emma: In your, um, Tell a Story talk at Goatmire, you had mentioned Telling the developer when reviewing code that it was unreviewable was also an option, which might be, uh, controversial compared to maybe some developers that are like, "Oh, gotta find something to nitpick. Looks good to me."

[00:46:55] Saša: Uh, there, there's like two-- Okay, so obviously, of course, uh, you don't say it like that, right? You know, that your code is unreviewable or this pull request is unreviewable. " y-y-you could say like, uh, "I have very-- I have problems understanding this," you know. "This is too big for me." Uh, things like that, uh, right? Uh, uh, there is like, uh, I know it's hard for everyone. It's hard to admit like, "I cannot understand this thing," right? You feel like a failure. But honestly, uh, uh, just making some nitpick comments and then LGTM is not gonna cut it, right? So like, uh, I think we should do this more, of course, obviously in a somewhat nicer way. I typically do this, uh, I, I, also mentioned it in the talk as far as I remember. I typically also reach out to the author then, uh, and so I say like, "Okay, this thing is too big. I, I, don't know what to make of it. Like, can we somehow make it smaller?" And if I have the time, I'm going to, because this is my role after all, uh, I'm going to actually help them, uh, do this.

Uh, I had one great example. This was like, so the first, first pull request in the project, greenfield project, right? Uh, 4,000 lines of code, a bunch of stuff, you know. Uh, it's, it was a big form of maybe 20, 30 fields. Uh, it solves problems that don't even exist yet and whatnot. And, uh, like, and I, I told to the colleague like, "You m- you made legacy in the very first pull request of a greenfield project," you know. This is already legacy, right? and so we just sat and paired for a day, uh, or maybe even less than that. Uh, and what we did was like, uh, there were like, I don't know, 20, 30 fields. I said, "We're gonna use just one field, ID." Okay. He w- was like, he was puzzled. "Okay, but what do we... That's not..." "Doesn't matter, you know, we just need one thing so we can implement CRUD, right? Uh, and then we're gonna add like a group of fields, like maybe personal data. and so that's like a smaller group, and then we can add that, and then you have like these validations because you have to read through these validations." So we kind of build slowly, you know. You don't have to, if you have this mega form, you don't have to crunch it all in one piece, 

right?

[00:48:56] Charles: It's easy to miss key details too when you're doing so much all at once. And how, how do you make sure you've actually covered all of the functionality you need to test in what you just built in 4,000

[00:49:11] Saša: so so th- yeah, th-this reminds me, like, uh, after I gave this talk last year in Chattanooga in Gig City Elixir, so Maggie and Bruce hosted me. That was a really fun experience, and then I was, like, the closing keynote of the whole thing. uh, one attendee came to me and said, like, uh, thanked me for the talk and said, like, they, they get, uh, these huge pull requests and it takes them, like, five hours to review. And I was like, first of all, I would quit my job, you know? Uh, the, like, I don't know what I'm gonna do. I play guitar as a hobby. I'm gonna play on the street, whatever. Like, there's no way I'm gonna do these five-hour code reviews as a regular time thing, you know? And I don't know, no matter how much concentration do you have, like, if it takes so long, I, I bet you there are, like, a bunch of things that you have missed.

It's just too huge. It's kind of like, you know, I, I'm writing a book and give you, like, this 800 pages, you know, War and Peace, uh, size. Uh, like, there you go. Can you review this thing for me? It doesn't work like that, you know. Publishers are not gonna accept you writing the whole book and submitting it like that

[00:50:22] Emma: So what would a good AI workflow look like to you if it was the ultimate-- If you could be the one determining the workflow?

[00:50:35] Saša: I use it as a tool, as a side tool, right? So it's a powerful, very powerful tool, uh, but, uh, I feel that, uh, at least the loudest members of the industry currently are, uh, kind of, uh, just use it as this whole thing that just spits out bunch of code and crunches feature. Basically, they are becoming like the coders and, uh, testers and whatnot. I don't use it like that. For me, it's a tool, and so basically what I said earlier, it's, uh, just this very low-tech setup. Uh, I just fire up the session. I ask it to do something for me. I discuss things with it. Uh, so I implemented this SNPP protocol. I didn't know anything about it. We just, you know, discussed it in this session.

I asked it like, "Give me a very high level overview, then let me what I have to implement next," and this is how we, how we work. I think really more important than, uh, all these tools, uh, the fancy tools and integrations and whatnot, is not forgetting that, uh, we have to work in small steps, and I always keep coming to that, but this is really the idea because, uh, that's the agile way, and the agile way basically means that you don't have to design everything up front, and this is really difficult to, to, do anyway. This is why we went from, I think, big design up front to the, to the agile approach of working in small increments, you know, uh, getting some feedback, reflecting, and moving along. At least that's how my mind works, and it has always been working like that for me. We seem to be heading, uh, popularly in this waterfall direction. Um, I am somewhat skeptical about the quality of the output. Uh, it's hard to predict what people are gonna use, you know, because we are basically greedy. Everyone is greedy. Everyone just wants to do more things faster and grab more cash faster, right? Uh, and I will not be surprised if it just goes down that path, but, uh, as I said earlier, I feel that, uh, th- the software quality is gonna suffer more and more. Uh, I'm somewhat grateful that enterprises tend to be more conservative, so, you know, probably, uh, banks, um, uh, are not vibing still, uh, much, and so my money's hopefully safe, right? Uh, but going back to the original question, yeah, I'm currently in fan of just, you know, using it in a very low-tech way. I fire up a session. I ask it question. These are typically small prompts like, uh, one or two sentences, occasionally a code snippet like, uh, "This is what I want. Uh, make me this thing," and then reflect. I don't allow it to commit. I commit myself, uh, because it's an easy thing anyway, and, uh, I typically don't let it work, uh, much without control. 

[00:53:06] Speaker: I think that's a that's a good note to, to end on, is remember s- the lessons that we've learned over the many years of evolving our craft of writing software, of working in manageable chunks and iterations and, uh, but leverage the tools that exist for you. Um, Saša, do you have any other final plugs or messages you wanna leave with the audience? 

[00:53:30] Saša: yeah, as I said, work in small steps. Uh, take a look at the Goatmire version of "Tell Me A Story." There is a transcribed version. Uh, I would also like to mention a wonderful book by, uh, Adrian Braganza, which is called "Looks Good To Me." It's about, uh, apparently obviously about code reviews. It's a lightweight read, but it's very important read that kind of systemizes a bunch of different aspects about, uh, code reviews, so take a look at that too. And, uh, yeah, I'm gonna send you also an example of, uh, my Oban pull request, uh, just in a real-life example of this kind of philosophy that I'm talking about 

[00:54:08] Charles: Great. Thank you so much for joining us. 

Our, our guest is Sasa Jurić, and, uh, we'll be sure to put in the show notes links to the, uh, talk Tell Me A Story from Goatmire, the X post we've been referencing, the book, the Oban PR, and, and more. So, thanks for joining us again, we look forward to next time. 

[00:54:29] Saša: Thank you for having me, it was fun. 

​ 

[00:55:01] Yair: Hey, this is Yair Flicker, president of SmartLogic, the company that brings you this podcast. SmartLogic is a consulting company that helps our clients accelerate the pace of their product development. We build custom software applications for our clients, typically using Phoenix and Elixir, Rails, React, and Flutter for mobile app development.

We're always happy to get acquainted even if there isn't an immediate need or opportunity. And, of course, referrals are always greatly appreciated. Please email contact@smartlogic.io to chat. Thanks, and have a great day!