S11E07 Garbage Collection with Dan Plyukhin and Manuel Rubio Intro: Welcome to another episode of Elixir Wizards, a podcast brought to you by SmartLogic, a custom web and mobile development shop. This is Season 11, where we're branching out from Elixir to compare notes with experts from other communities. Dan: Hey everyone. I'm Dan Ivovich, director of engineering at SmartLogic, and I'm your host for today's episode. For episode seven, we're joined by Dan Plukin, post doctoral researcher at the University of Southern Denmark and Manuel Rubio, author of Erlang OTP, A Concurrent World. In this episode, we're comparing notes on garbage collection in Erlang and Akka. Welcome to the show, Dan. Dan P: thanks. I'm happy to be here. Dan: And welcome to the show, Manuel. Manuel: Thank you. Happy and in the same way to be here. Dan: Fantastic. So, let's learn a little bit more about you guys. Manuel, why don't you kick us off? Who are you? Where are you from? What are you up to? And why Erlang? Manuel: I'm from Spain, but, at this moment I'm living in the Netherlands. I was, using Erlang since 2009. And Elixir, uh, a bit less, 2016, I started using it and, well, my first time, that I were into this, uh, world of, , Erlang, Elixir and Beam, uh, was because, uh, we needed something different to PHP, Java, or C that, were the typical languages, , we were using in a telco company. So we were searching something that could be clustering with the servers and sharing information, handling the concurrency in a good way. So we were, , a bit, , tired of using always, the POSIX, uh, signals, sharing memory in the operating system. We found Erlang and, uh, in the first. Years, 2009, 2010, we implemented different solutions. we even, assisted to implement, , telco, solutions in Spain. And after that, I was, convinced, uh, to, dedicate the rest of my growing in my career, dedicated to that because I was, uh, focusing on the backend development. So I think that the Erlang and after that Elixir was fitting very well. Dan: Great. And Dan, how about you? Background, language experience. What are you up to these days? Dan P: Yeah, pretty different background. So I'm, I'm a researcher. I'm based in Chicago, but I work for the University of Southern Denmark. And I'm just finishing my PhD, , for the University of Illinois at Urbana Champaign. Urbana, Illinois being the greatest city in the world, of course. And I did my thesis on the topic of actor garbage collection. So my background was a lot more in kind of theoretical computer science, like 80 percent of my thesis is proofs. But, as part of that work, I developed this actor garbage collector for the Akka actor framework. And so in so doing that gave me a bit of. A bit of knowledge with JVM Scala type of stuff, but it's a very different type of problem actor garbage collection, which then ordinary garbage collection, which I think we'll talk about maybe later in this episode. Dan: Awesome. So for our audience that is more heavily on the elixir Erlang side, could you just explain quickly how. Scala, Akka, the JVM kind of all relate. Dan P: Scala is, a language for the JVM just like Java is, and Akka is pretty much it's emerged as the standard actor framework for the JVM there for both Java and Scala. I'm not sure about how, other. JVM based languages, whether they use Akka or not, but definitely Scala and Java. Akka is basically, you can think of it as like a re implementation of Elixir slash Erlang for Scala. They have the same ideas of supervision trees, you know, messages, clustering. The main difference is the terminology instead of what they call a process in Elixir, they would call an actor in Akka and what they call a PID, a P I D, in Erlang is a actor reference. Pretty much everything else is just straight translation. Dan: Fantastic. so Manuel, when you think about the importance of garbage collection in Erlang, in actors, you know, your your concurrent world book, what is your perspective on the actor approach? Manuel: in Erlang, in the beam, more specifically, the garbage collection, because most of the languages that are on top of the beam are immutable, based on the memory that they use. beam has the property that is soft real-time. The garbage collector in beam has to feed the features that cannot stop the work. , it's being to remove the memory that is not in use anymore, when it's possible. And because, the immutability is a fact and something that, uh, you cannot avoid using BEAM ,it's very easy for the algorithm to get all the memory that is not in use anymore when the functions are, are finished, uh, for the processes or even when the processes are finished finally. Dan: Awesome. And so Dan, your work on garbage collection of actors, you know, what does that mean for an actor to be garbage collected from your perspective? And how is that different from the ordinary garbage collection that programmers just know and love so well. Dan P: that's a good start. So, an actor that, again, that's like a process. And the idea is that we're automatically figuring out when these I'll say the word processed whenever possible, given the audience. We're trying to figure out when these processes can safely be killed. And it's a pretty different problem. With ordinary garbage collection, what you're really trying to do is figure out when these objects are garbage. And an object is garbage if no thread can reach it. Right? So we're talking really about reachability from threads. So you have these things called tracing garbage collectors. Most of the big garbage collectors that we know in the JVM, for example, they're tracing garbage collectors. But for actors, it's a pretty different problem. And we use a, a lot of different approaches compared to that. Maybe I'll say what's an application when you would use actor garbage collection, a very simple case would be in elixir. Yeah, you spawn off some process to do a little bit of work. Maybe you send it some messages, it sends you back some messages, then you're done with that process. now, that process is just sitting there taking up memory. So, hopefully you remember to, to kill it and hopefully you make sure to do it in the correct way and then nobody else is using that process. So that's a very simple case where you would want to do that. Normally you have to do it manually. It would be nice if we could do it automatically. A more complicated example is maybe in Hadoop, like the Hadoop cluster computing framework. You've got all these different nodes in your cluster and you're trying to allocate containers to those different nodes. And those containers are going to do a little bit of work, right? And if you actually look in the Hadoop code base, those containers are wrapped inside of basically what is an actor. And that actor is going to be managing the container, but it's also going to be talking to some kind of manager on another node. It's going to be talking to some distributed file system actors on some other nodes. And we want to figure out when this container actor is safe to kill because then we can deallocate that container and then we can reuse that space for something else. But that's a pretty challenging thing because now you have to reason about the whole state of your kind of distributed system, right? How does this actor relate to all these other actors? Are they done talking to one another? Have messages been dropped? Have some nodes crashed? So you have to reason about all these kinds of failures as well in addition to all the differences between actors and objects. So it's giving a little bit of a taste of how that works. But , the main idea I want to come across is that we're working in some kind of distributed system. At least that's what most of my research was focused on and trying to figure out when one of these processes, its memory is safe to free up. Dan: Right, right. So you're taking the actor idea and saying now it's distributed. So the people who may need to send a message to this actor, you know, or the people who care about it still existing is somewhere within a cluster. And so that kind of turns the when can we really free this? When can this process really go away? To like a whole nother level, if you're trying to do it automatically and not have your system kind of self supervised self orchestrate when pieces are no longer necessary. Dan P: Yeah, exactly. Dan: Great. And so Manuel, thinking about that how do you feel that kind of compares to your work in Erlang and with the OTP? Manuel: Well, regarding the specific part of the sending of the messages, for example, I think that is pretty similar when you code to a function, because depends on if that is local to a process that is in the same machine or if that is a process that is in the other machine, of course, when you are sending something between processes inside of the same machine, , that could be, or copy the information to being used for the new process because each process has its own heap. So it's needed to get that information inside of the heap of the memory of the process. But, could be happen also that, , we could configure that information is of heap. So, the fragments that are generated outside of the heap of the memory are, uh, plugged in some way. , when we are configuring the upheap memories for the messages it's behaving a bit different. That is a tuning that could apply depending on the use case of our processes. So if you are going to get something that is very heavy. And it's going to receive a lot of messages inside of a process, but the process is running continuously on the, you don't want to be interrupted to put the information into the heap. You can configure it as off heap. And then that is reserving outside memory to be in use and it's not interrupting process. In that way, the handling of the memory for the heaps is completely isolated. And because Erlang, nor Elixir has objects, it's pretty simple that you have only processes, functions, and that's all is, uh, the only way that, uh, you can copy and handle memory between the, the different parts of the code. Dan: So then does that mean that there's a concept of generational garbage collection in Erlang, in addition to kind of just the actor approach, or could you go a little bit deeper on that? Manuel: Yeah. The way of the memory is handled in the Erlang is, as you say, generational. when we are receiving memory. And that is creating a specific heap that is the John heap. When all the information is, , all the variables and functions are running on the easter requesting more memory. That has a specific mark that when it's trespassing that mark is executing the, the garbage collector. So when that, amount of functions are running on the, you are needed different runnings of the garbage collector that is, generating a new heap that is called the old heap. And that is where the long-lived, elements of the memory are living, because they are more in use than the, the younger ones that it's cleaning. Well, I mean, I'm maybe, uh, missing a bit with that, but it is, uh, not that too easy to explain without the graphics, but, when you are cleaning the memory, Erlang is always performing a copy and write. So, it's creating a new heap and, it's not removing. It's only copying what, should be kept and then removing the whole heap that is not in use anymore. But, in another process, all heap that is called for the long life elements is, uh, like another long life process that is performed later to get all the long livings elements from the young heap and then moving that for the old heap. There are a lot of papers that is telling that, uh, this is a good way to perform a cleaning because, uh, the young heap is, where the most of the times, the removing elements are, are happening. So when you are moving everything to the old heap, then you don't need to run the garbage collector so often for that heap. You are like, running the different cleaning or the garbage collection for both, but with different frequencies. Dan: Excellent. And so Dan, does generational garbage collection kind of apply to Akka's actor approach as well? Dan P: Well, let me clarify something I realized that might be ambiguous. So Akka is built on the JVM and the JVM has its own existing garbage collector. So the stuff that Manuel is talking about for doing garbage collection, it applies equally well to what Akka is already doing. And the stuff that I'm working on, the actor garbage collection is something that you can add to Akka. That's what I implemented, but could also potentially be applied to Erlang and Elixir. Right. So these are complimentary approaches. that said, I do know that the JVM, I believe it does use some very advanced form of a generational collection. It doesn't do stop the world. I think it's a common myth that the JVM uses that it's, it's extremely efficient. It's extremely concurrent. They do very smart things there. It's a very different problem. But the beam can kind of take advantage of a lot of things that Manuel talked about because it has immutability by default. So that said, the question about the actor garbage collection in Akka, it's a very different paradigm, although there were some overlaps. Historically, there were some tracing based garbage collectors for actors, although they had to do a lot of extra complications because actors are different from objects. My approach is pretty different and also there's a well known actor programming language called Pony, which also has actor garbage collection in addition to regular garbage collection. My approach and Pony's approach, they both use this kind of more message based approach basically, without going too much into detail of it. Answering the question about the generational stuff is interesting, because Pony's algorithm it's not very flexible in terms of what kinds of optimizations you can do, for example, doing some kind of generational type of collection. , mine kind of built up on top of Pony and released some restrictions. For example, pony's algorithm requires causal message delivery, which is something that's kind of often you don't get that in a distributed system when you have multiple in a cluster. , but you do often get it when you have it on a single machine. , so mine remove some of those restrictions and in removing those restrictions, it creates a lot more optimizations, uh, opportunities for optimization. So for example, What actors do is periodically they send these updates to the local garbage collector, but you can tune how frequently they send those updates. So for example, if you have an actor that you suspect is going to be a long lived actor, maybe, it's just already survived for x amount of time, so you think probably it's not going to become garbage anytime soon, you can reduce the frequency of how often it will send updates to the garbage collector. The garbage collector itself, the actor collector, it creates kind of a view of the distributed system, kind of its own version of the heap, and it needs to search that heap. In that case, you can also do some kind of generational approaches. You can say I'm only going to be interested in checking these particular nodes or these particular short lived actors in the cluster. So you can add those kinds of optimizations there. But what I want to emphasize is that this is very new research that we're actively developing and we're finding out ways that the traditional garbage collection approach can be applied in this kind of scenario. And that's really exciting. So I do have a lot of hope that generational approaches and other insights can also be brought to bear on this work. Dan: Thinking about this garbage collection approach, there must be some downsides, right? Manuel, do you have thoughts on how garbage collection is handled in this actor process model? Are there any downsides or things that you feel like we hit up against inside OTP? Manuel: I was not studying theoretically, uh, what could be the downsides, but, in my experience, I found a couple of them because, when I was called for consultancy for companies , I found that, sometimes the release of the memory is not happening , so often, or. Or maybe the opposite, it's too often and then the memory is released but the CPU is overloaded because the heavy load of the system is requesting us to tone up the system to get the correct values for the garbage collection. In Erlang you have specific functions to force, to, run the garbage collection, in the old heap, , or even, in the youngest, but, uh, in the old heap, , we have a specific command that you say at this moment, I, I want you remove the old elements and that is, being used, for example, when you are, uh, implementing. Um, For example, a web service that is handling a lot of, or massive information for JSON or XML that, , you have to transform a lot of data and handling a lot of data that is, performing a lot of changes, , of that data in the memory. And then, that is, inside of the memory at that moment inside of the functions, because, you did that, , in those specific functions and then, Previously to go out from that function, maybe you say, okay, uh, in this moment, at this moment, , release, the memory to, to keep clean of all the mess you did with the transformations in the memory. Yeah, the, the downsides, , for that is, , because, you are keeping everything in the immutable. eVerything that is copied when you are calling a lot of functions, is remaining there. So in some times or some specific moments, can realize that, uh, you are running out of memory, or even if you need to run, a lot of times the garbage collection that could happens that, looks like the server is too overloaded, uh, when they, you are not receiving a lot of, requests, So it's a part of, , adjusting , those values. Maybe it's, , one of the downsides that, you have to keep in mind that, is working that way and, you need to adjust, , those values. Dan: Yeah, it reminds me of a few years ago we were working on a Phoenix API and was working with a DevOps team. , kind of external that wasn't familiar with Erlang and kind of how its approach to things were. And so we were, we were pegging the CPU a lot cause we were just, you know, processing as much as we could, but we weren't memory constrained at all. And so like the memory was flat, but the CPU was like pegged. And so like they thought it wasn't behaving and we're like, no, no, it's doing what it's supposed to do, which is not losing a bunch of, you know, it's not , , fragmenting the memory and growing exponentially there, it's just. Leveraging every process it can. And, you know, we could scale it up if we want to, but it's processing things fast enough. So it's all good. Like we need to like tune our alerts. And it kind of makes me think similarly around, you know, like you said, you can free memory so fast that, you know, you end up with other constraints or maybe you want to hold on to things longer. yeah, I think it's important to remember that. You know, we get really great performance out of the box with Erlang and, the Elixir community, but there is still opportunity to adjust how things go. So Dan, thinking through that, I'm sure downsides must be a big part of what you think about in your research. Any kind of particular things that you want to touch on thinking about actor garbage collection and its limitations? Dan P: Yeah, I would say that, I mean, the exact same problems that Manuel was talking about, we would have in an actor garbage collection approach. How do you tune How much work the actor garbage collector is going to do versus how much work the application is going to do, and that's always going to be a balancing act. And it would be interesting to incorporate some, maybe some static approaches in a language. For example, what Rust is doing. there's some flaws to Rust's approach, I believe. , that's, I think, an interesting step as well. It'd be interesting to think about what Rust would, okay, sorry, I've got sidetracked there for a second. I'll say this, particularly with the actor garbage collection thing, when you're detecting if an actor is garbage or not, , one of the restrictions that we have to impose that is unusual is, we manage your actor references for you. So right now in, for example, Erlang and, and in Elixir and in Akka, a process ID is kind of just this value that you can pass around. You could stick it in a data structure. You can do whatever you want with it. You can put it, write it to a file, probably if you want, at least you can do that in Akka. , and it's kind of analogous to how in C you can just throw around these pointers. You can do whatever you want with pointers, but pointers are kind of the enemy of garbage collectors. If you can just do whatever you want with a pointer, then the garbage collector has a hard time keeping track of all that. And so you get a language like Java, where you don't have access to the pointers directly. You have access to these references, that are more constrained. You can't store a pointer to a Java object into a file. And similarly, if you have an actor garbage collector, we need to keep track of these actor references that you're passing around. And so there are going to be some additional restrictions to how you use your actor references. You can't just, you know, write your actor reference to a file, then somebody else reads from that file, and then they send a message to that actor. Because the garbage collector has no way of keeping track of that flow of information that, that kind of leaked through the file. So that's kind of a, a restriction in terms of just the programming of it. Although on the other hand, , hopefully if you have an actor garbage collector, it reduces some of the code that you would have to write. It would have to reduce, it would reduce the killing, obviously, that you have to do, but also the tracking of when is it safe to kill a particular process. So that's kind of the programming angle of it. But there's also, of course, the runtime angle. And there's some differences there, too, for actors versus ordinary objects. Because for objects, we talked about a little bit, If you, have a copying garbage collector, right, so you, , get rid of this garbage, and then you move the objects to some other part of the memory, and they get closer together. So you can actually get some increased locality there as a result of garbage collection. So, , some work has found that actually if you introduce a garbage collector into certain kinds of algorithms, you actually get a speedup. It's faster allocations and improved locality, but you don't get that type of situation for actors because actors don't really benefit from being close to one another. They just have a different type of processing model. So we aren't going to get those kinds of locality speed ups that you would get with ordinary garbage, which is kind of interesting. On the other hand, you would maybe get some, performance advantages from the fact that if you were killing your processes , in an incorrect way. Maybe you were killing a process too early and that triggers some fault handling mechanism or something like that, that reduces performance. Maybe an automatic garbage collector would kind of speed things up. So there's kind of this trade off, but where you're letting this thing do work for you and hopefully it's going to do a better job than you will. But you are giving it some of your time, your precious time that you would spend doing work yourself for your own application. Dan: Fantastic. So then Manuel, how do you think about balancing those performance trade offs? You mentioned kind of a little bit earlier, right? Like freeing too quickly, freeing not quickly enough, CPU pressure. Any other kind of like important things to consider when you're thinking about performance trade offs in the Erlang garbage collection approach? Uh, Manuel: In my experience, I suffer a lot of different languages and virtual machines. And the Beam, I have to admit that I was starting with that because of the requirements of the kind of work I usually do. In terms of trade offs, I think that maybe because, the way of, the management of the memory is working. In the garbage collector is needed to run, uh, a lot of times because, , that is, copying the, uh, we have different heaps that is not possible to reuse, most of the times, different parts or different portions of memory. It's not completely true because, uh, if we are using binaries for, for example, and, binaries is, uh, greater than, uh, 64 bytes, that is, stored in a specific, , part of the memory that is, , for the. Virtual machine, but for the rest of the elements, if we need, for example, to compose, , a list at Apple or, , pass that amount of information directly to another process or even to handle that. between functions is not a problem, but transmitting that information, to different processes is not possible to referentiate that information that is copied again and again. So I think, , the trade off is nothing in the garbage collector, but in the way that, the memory is handling inside of the beam and of course, , maybe I'm wrong and, , the new, uh, um, features that they were implementing there for the speed up, the beam, , were solving , this, Specifically, because, uh, you know that, uh, every year the Ericsson is launching new features, , for the speed up and the garbage collector was, uh, review at, I think three years ago or something like that, uh, memory management and, uh, the speed up, uh, using, the just in time compilation. So in terms of how the memory is working, , from my point of view and my experience is, not as performant as other languages like Java, for example, because, uh, Java is very optimized in input output, memory handling, but in other terms, I think, uh, Beam is, more secure. Because that data is copied and it's not referenced if, uh, some process is needed to be stopped at terminator or is, , corrupting the, information. Uh, well, again, it's very difficult to do that, in an immutable state now, but, If, uh, that is happening wherever it is. The reason because, uh, we can plug, uh, always, seek code in our programs, , that is, still, protected, uh, because it's in another heap. Dan: Mm hmm. So Dan, are there any kind of specific fault recovery or fault tolerance aspects to the actor approach and what Akka is adding as a framework? Dan P: the main thing that I was trying to overcome in my thesis was trying to deal with faults, for example, uh, nodes crashing or just nodes operating very slowly or messages being dropped. And that's something that, , wasn't really addressed very well in prior work or hasn't been the focus. So actor garbage collection is much less researched than other kinds of garbage collections, especially distributed garbage collection. And so, for example, in, Pony's garbage collection, , which I mentioned before, they don't address faults. They don't address dropped messages. Because that kind of wasn't in, in their model that they were trying to target. Whereas there's another actor language called Salsa that does have some limited support for, fault tolerance. So what will happen is, if a particular node gets Is just slow or if it crashes, then the garbage collectors for the other nodes will kind of, they'll ignore the information from that particular node that's faulty and they'll try and do the, their best garbage collection on this half of the cluster. And the, the main thing that I was trying to work on number one was dealing with dropped messages, which wasn't really addressed in some of the prior work and another case is, a limitation in this salsa garbage collector. Which so we say that the terminology is a little bit confused, so I have to clarify. So the salsa garbage collector, we say that it's fault tolerant because if a node crashes, some of the other nodes can still make progress in doing some garbage collection. But here's this problem. , let's say that I have an actor on my node and it has a reference to an actor on Manuel's node and I crash. , in a fault tolerant garbage collector like Salsa, Manuel's actor has to not be garbage collected. It can never be garbage collected because we don't know, we don't actually explicitly handle the fact that I am permanently down, that my note is permanently down, my actor will never send messages to Manuel's actor. We have to be pessimistic and say maybe eventually it will come online and it will send a message. to Manuel. And one of the things that we were able to do, partly because of Akka's particular fault model, which is a slightly different fault model than Erlang, we were able to say what happens in, in Akka is a node will actually, if it's slow enough, it will get kicked out of the cluster and all of the actors will have to be killed. And we have a guarantee that if that node rejoins in Akka, you will have to rejoin like with a new hat on it with a new set of actors. And so those actors, you can actually guarantee that they were gone. And we can take advantage of that semantics and say, okay, because now we can guarantee that my note is really crashed. It's really gone, or maybe it wasn't really crashed. Maybe it was just really slow and it got kicked out of the cluster. You have to consider these kinds of things in a distributed system. because we can have those guarantees, we can actually figure out. Indeed, the actor on Manuel's node can be garbage collected. And this was kind of the, the hardest problem that we really need to solve. Crucially, you need to solve this problem. If you're going to be doing any kind of resource management in a distribute system, you need to be able to reason about dropped messages and, and crash nodes like that, or, you know, slow nodes that got removed. So, that's the kind of thing that I was working on dealing with, yeah. Dan: I think it's interesting the optimistic approach of garbage collection came up a lot in the first half hour of this conversation. And then here we are finally, and we're like, and here's where you have to be pessimistic, but maybe you should be so pessimistic that you assume it will never come back. It kind of reminds me of Erlang's, like, let it crash approach of just like, you know, if it has an issue, let it die. And then handle the fact that it is dead and going to get recreated as just like normal behavior. and I think there's some parallels to what you were saying there, Dan, around, you know, Hey, if it's slow, if it gets network isolated, or if it actually crashes, just say, you're welcome to come back when you're fixed, but we have collected everything that was related to you because like, it's pessimistic. That you may come back, but it's also optimistic that, you know, we want that memory, we want that space back and you may never join again. And, you know, these, these trade offs I think are particularly interesting. Dan P: I think you're right about that, and it's interesting that Akka does the Let It Crash in that sense better than Erlang. Because Erlang, if, for example, like I'm monitoring an actor, I'm watching an actor that's on a different node, and then that node crashes, the message that I get back is just that the node disconnected. I don't get any kinds of promises that that node will never rejoin. In that sense, Akka gives us a little bit of a better guarantee. And so that's part of the reason why, you know, I said that this work that I'm doing could be applied potentially to Erlang and Elixir. That's kind of the main limitation. If you want to get this fault recovering aspect that my collector uses, we would have to somehow patch that into Erlang to get that stronger semantics. Dan: Sure. Excellent. I guess to either of you, maybe we'll start with Manuel, advice you'd give to developers dealing with garbage collection challenges and distributed system or, things you kind of wish developers understood about how Erlang thinks about garbage collection. I mean, you wrote a, you wrote a whole book, so it's probably an idea or two. Manuel: I was thinking about, last part that I was telling Dan, because, , in Erlang, as he said, it is not possible to know when you are monitoring the processes from other nodes, , if you are disconnected because a network is split or whatever else. You have the problem that, , when the network split is resolved or the connectivity latencies , back, you are disconnected from the other node. But, I think, that the Erlang is preparing to the programmer to think about that because, even when the, we were talking about the garbage collector, you have different flags and different configurations for the garbage collector. So you are on charge of tuning up, , develop your own, mechanics, for the basics and the airline is providing you the, the idioms, to create, the software, but, the implementation of raft, gossip or whatever else is, uh, on you. So, uh, the Erlang can provide you the way of monitoring other nodes, monitoring other processes. But if something is failing, you are in charge of deciding how to act, about that. So you can use a library that is using Paxos, other that is using Raft or whatever else is, you are not depending on the low level at the, at that moment, you have to implement your own. aNswering to the, the question of Dan, , the advice about, , the garbage collection is always, , I give always the same. So it's, if you are developing, uh, and. Uh, heavy load system because, you are going to recite a lot of messages. You are even creating a cluster that, has, going to share information or access, to different processes. it's not a common in Elixir, but, I have the, the hope that the people is starting to use more OTP in Elixir. And, developing the processes, , actors, uh, supervisors, uh, and other mechanics, , that the language has. And, they can, realize that is very simple and very easy to, to get into, , those, , elements, , realize, how is working the system for them, how is handling the memory, how is handling every part of the system and, , how can they, adjust, for their system to run as, well as possible or, uh, in the best way. And, , yeah, it is, my main advice. Dan: Excellent. Dan, your research, you know, is research, maybe a little, a little less, in the weeds with the developers, but do you have anything from your view or as you've done your research that you kind of wish developers understood about your approach or the challenges that exist in distributed systems. Dan P: I'd like them to know that a garbage collector for actors exists and that they, they could potentially use it, or at least, it's in development. You know, software that I've made is, still a prototype, but I'd like people to know that it exists and that they can also contribute PR requests or, pull requests are very welcome, and I'm kind of looking forward to a day when that is more integrated into existing workflows, but there's kind of a question yet for me, which is how relevant this will be for real developers, actually, because a lot of the time when I Talk to, , Akka or Erlang or actor people. And they say, Oh, I'm working on a garbage collector. The first thing they says is, Oh, fantastic. You know, this is such a great thing. I hate killing processes manually, blah, blah, blah. And then we talk a little bit more and they say, well, actually I kind of designed my application around the assumption that there's no garbage collection and I kind of don't need actor garbage collection now. So it's interesting. It's going to be interesting to see if we give this to developers, how will they change the way that they program? And maybe I encourage people to think about how they're developing their systems and say, Well, do you think that, , an actor garbage collector would help in this circumstance? Maybe it will, maybe it won't. How would you change your behavior if you had one? these are kind of interesting questions to be thinking about. Down the line. You know, I've been in the ivory tower and now I'm trying to bring, bring forth, you know, my work out to people. I'm saying like, Hey, check this thing out. One thing I'd like to hear from people, over time is, what are the types of applications that they're developing? How are they using actors and how can my work better serve what they're doing. Dan: right. So advice is communicate, share, contribute, talk to the researchers. Dan P: talk to the researchers. We're, we're, researchers are just free. You know, we're doing free work for you guys. Like I'm subsidized. I'll do whatever you want me to just text me, send me an email. Dan: Fair enough. Fair enough. Manuel, do you have any thoughts as an author, maybe you have some ideas as how we can better teach or communicate about garbage collection to new developers or developers who have never really had to think too much about their garbage collection. Manuel: Well, I, I could say with the information I have now, talk to the researchers. , it's a good advice because, I think, from most, of the developers, in the backend, , we are more worried about, how the server is working. How is, uh, the virtual machine behaving. And, we keep in mind, some of those aspects, but I have to admit that, for example, garbage collector is only worrying, to the project when that is a failing. So while that is working as expected, uh, nobody's, thinking about the garbage collector. The, the things are working smoothly, like a breeze and, is not something that is in the roadmap, but, uh, yeah, have the knowledge about how it's using or how it's working, the, garbage collector and how could use, the parameters the languages has for us, to adjust it is very important. Dan: All right, sounds great. So think about it before it's a problem and ask your researcher. Some important takeaways. As we kind of bring this to a close, I was curious, did you guys have any questions for each other? Or anything that you want to make sure you say before we wrap this up. Dan P: Well, one thing I was interested in Manuel talking about was this, you roll your own, Paxos or whatever implementation when you're dealing with failures. And I, I, I take back what I said that Akka is better at Erlang, than being Erlang. , that was maybe a little bit too far, but the idea the perspective of being a garbage collection researcher, I'm wondering, to what extent in the Erlang elixir, , world, you could make sure that all actors have some kind of fault policy like this, you know, where if a node crashes, eventually that node will be removed from the cluster and never come back. Can you implement that as a library and then put the garbage collector on top of that library? Would that, work in that world? I don't know. I'm not familiar enough with that ecosystem. Maybe that's a very technical question. Manuel: Yeah. I mean, the garbage collector, as I said, is something that the people is not worrying about too much. So when we are thinking about processes, actors, and communicating between them, even, as you said, in different clusters, in different nodes of the cluster. The garbage collector is only the small and specific part that is inside of every node of the cluster that is handling that when a node, when a process is dying, that is, we are recovering the memory to be used for another process. In Erlang, messages are completely asynchronous. So when you are sending the message to another processor, you have not the warranty that you are. You have the receipt about, that the message is delivered. So if the process is even death for a long time, and you have the pid from the process. If you send that message to a dead pid and, the system is accepting. So it's completely asynchronous and it's not checking that. The only way that you get an error is, if you are sending to a name, that name is related to, uh, uh, to a process and that, process is, dying, so the registering is removing the name and the name is not existing anymore. So that is the error you get. The, name is not related to, a process. But the point is, , that, when you are developing in Erlang you need to know that the messages could be lost. So you need to get the approach of, sending the idempotent messages. The, so if you need to send it twice or in a third time to ensuring that is arriving, maybe it's needed and, developing like mechanics that is helping you, to get, uh, well, what happens if something is failing in this point, because I send a message and I get a timeout because I have no response. So, uh, you have to implement, , what is happening in case of the timeout. So it's a implementation in the high level, but the point, because I was, , replaying you about, the implementation of the raft, Paxos and others is because I have the feeling that, when they, in the low level, you try to, , implement something that should be in the application level. It's forcing to the developer to acting with that implementation is there is no chance to, to change the implementation. So, for me, as naive is the system is better for getting the event, the different layers of implementation and you can change those layers. Dan: Fantastic. Dan P: I don't know to what, to what extent I should, , clarify that, like, for example, in, in Akka, it is still asynchronous messaging and you still have, very similar, uh, you prograAkka m actors in a very similar way that you'll do with Erlang. The only difference is this particular case where if a node is gone for a very long time. What will happen? Will those actors ever be able to rejoin again, or will they not? So, definitely no shade to the Erlang model. It's just, for, for me trying to roll out this fault recovering thing, it's like, oh, dang, I was so close to being able to get this for Erlang as well. but just because of this tiny limitation, there's a, a slight difference. Dan: Sure. Awesome. so as we bring this to a close, just wanted to give you each a chance to do any plugs, ask for the audience. Dan, I know you said PR requests are encouraged. You want to just call out your side project or any way people can reach you. Dan P: As a researcher, the number one way to contact me is by looking at my website and then sending me an email. , I'm also, nominally on Twitter and I do have a YouTube channel where I uploaded my, thesis defense. If you're interested in like the technical details of how this stuff works, you can check that out. There's more interest. I'd, love to make more expository videos about how this, work goes. Because, you know, I'm in, I'm in service of the public. I see myself that way. And another thing, so the name of this project, it's called UIGC, and you can find it on my GitHub page. Dan: Yeah, we'll have that link in the show notes. Dan P: And there's some, uh, there's lots of issues, if you don't see an issue, , that appeals to you, feel free to reach out. Or if you just have a question. About actor garbage collection, or you want to tell me about your experience about, using actors or questions about any of that stuff. I'd love to talk to more practitioners about this type of thing. I'd also love to see an actor garbage collector for, Erlang Elixir, if anybody can figure out how to do it. Dan: Awesome. Manuel, any asks for the audience or things you'd like to plug here at the end? Manuel: Well, my, my main content is in Spanish, but, in the page of, altenwald.Com, I have books, now translated to English. The first book that was, talking about Erlang as a language, the basics and how the processes are working, not working. Everything to complete a project and the second book that is developing two is talking only and everything about OTP that is the part of the actor model how is compared to object oriented programming and how is the implementation of servers virtual machines supervisors and everything as well as, I'm very active in, in Twitter, so we can share the, Twitter as well. And, uh, that's all. Dan: All right. Well, Dan Plukin and Manuel Rubio, thank you both for your time. This was a really great conversation about the actor models and garbage collection Dan P: Thanks for having us. Outro: Elixir Wizards is a production of SmartLogic. You can find us online at smartlogic. io, and we're at SmartLogic on Twitter. Don't forget to like, subscribe, and leave a review. This episode was produced and edited by Paloma Pechenik for SmartLogic. We'll see you next week for more as we branch out from Elixir.