S11E06 Machine Learning using Elixir vs. Python, SQL, and Matlab with Katelynn Burns & Alexis Carpenter Intro: Welcome to another episode of Elixir Wizards, a podcast brought to you by SmartLogic, a custom web and mobile development shop. This is Season 11, where we're branching out from Elixir to compare notes with experts from other communities. Dan: Hey everyone. I'm Dan Ivovich, director of engineering at smart logic, and I'm your host for today's episode for episode six, we're joined by Caitlin Burns, software engineer at launch scout and Alexis Carpenter, senior data scientist at cars. com. In this episode, we're going to compare machine learning with Elixir, Python, SQL, and MATLAB. Welcome to the conversation, everybody. Katelynn: Hi, good to be here. Alexis: Great to be here. Dan: Caitlin, kick us off? Tell us a little bit about who you are, what you're up to. What your path to machine learning is Katelynn: Yeah, totally. I am a software engineer at LaunchScout, as you mentioned earlier. I've been software engineering for about two years now, I would say. I live in Brooklyn. And I started getting into machine learning because I was interested in motion tracking, specifically. Body positioning and tracking that with motion tracking, so. I started on that journey and wanted to help others on that journey, so I ended up doing a talk about it. And, that's where I'm at. Dan: fantastic. I'm sure we're going to dive more into that in a little bit. Alexis, same kind of question. Who are you? How'd you get into this? Where are you from? Alexis: Yeah, I'm currently a senior data scientist. I'm in Denver, Colorado, and right now I'm working on most things search related for cars commerce marketplace. I've worked on a wide variety of projects at cars, but right now most things are search related. Prior to CARS, I was a data scientist on Wayfair's search and recommendations team working on SORT and also their experimentation platform. aNd then prior to that, I was actually completing my PhD where I was working with behavioral and neural data for the most part. Dan: Fantastic. Caitlin, since this is an Elixir podcast, we'll start off with specifically Caitlin, your experience with Elixir and machine learning and the projects you're working on in that space. Katelynn: Yeah, so, for the most part, what I have been doing with machine learning with Elixir so far is been using a library called Bumblebee. That lets you use already pre trained models. Which is a really great way for people who are wanting to breach into machine learning. But... Are a little overwhelmed machine learning is so big, it's huge, so it's a great way to learn the basics of machine learning. It's a good step into there. So I have been using that for, as I said earlier, motion tracking, and I am currently working on implementing that a little bit more. And I would like to learn how to make my own model soon. There's a book that came out by Sean Moriarty about machine learning in elixir, so I'm looking forward to reading that and training my own model soon. Dan: Fantastic. And so Alexis, it sounds like your experience then is maybe a little bit more traditional. Machine learning certainly goes back further to some of the more original tooling, speak a little bit to the tools you use, how what you do could lead to something that Caitlin could use in Bumblebee from a training model standpoint, et cetera. Alexis: tHe work that I've done at CARS has been pretty wide ranging. So I've worked on building models, for things related to determining the likelihood of a lead submission resulting in a vehicle sale. Also models related to how we should rank vehicles on a search results page. As well as we've done, several POCs related to using pre trained models like generative AI and LLMs for various customer and consumer facing applications. So all of those things are either based on pre trained models or something that we're training in Python, based on the cars. com data. Dan: So when we're talking about a pre trained model. Where's that come from? Is that something that is, is generally reusable or is the fact that we're using pre trained models in bumblebee something kind of atypical. Alexis: It depends on the use case. I wouldn't say that it's atypical. for example, there are a lot of pre trained language models that you can use. And you can fine tune those language models on a, company specific data set. So you get to keep a lot of the information that those models had learned during that pre training. And you get to add in additional information that makes, it more specific to your company's use cases. Working with things like generative ai, like, those are all pre trained models. You can build a wrapper or a custom set of logics around something like ChatGPT or one of the open source models as well. And those would all still be based on those already pre trained models as well. So I wouldn't say it's atypical. Dan: So Caitlin, when you're using these pre trained models, are you seeing the boundaries on what they can do, or are you looking at. Leveraging fine tuning , those models. How does, your experience with pre trained models align with what Alexis is sharing? Katelynn: Yeah, I haven't gotten very far into fine tuning them yet, so I definitely am seeing the limitations of the fact that it is Especially because what I am using is images, and it's hard to identify an image. , text, it's a little easier than trying to identify a video or an image. And so, it has so much data to work from that it's a little harder for it to parse what it's looking at. It might have a general idea of what it's looking at, but it might not understand what it is that you're specifically trying to focus on. Dan: Great. Could you talk a little bit about, specifically Bumblebee and your ElixirConf talk and what you see as the state of machine learning in Elixir? Katelynn: Bumblebee is fairly new, so it doesn't have a lot of pre trained models, but it has access to some text models, some image models, , sound modeling, and it really is just a easy way to bring machine learning into your project. I especially liked using it for mine because Machine learning wasn't the main reason that I was making the project, but it was an integral piece of it, so it was nice to have that piece already there when it wasn't my final result that I was looking for and having it easy to access. My talk was about motion tracking, which was my... Final goal, and I think the biggest thing with Elixir is, I think it has a lot of potential for machine learning, but I don't think we have really grasped at that potential yet. I had to do a lot of research, and found a lot of articles in Python, and needed to take Python's learnings, and evolve it for myself, and so I think we're in the early stages of it, which is very exciting. It's on the groundbreaking of it, but I think there's a lot of potential for it. Dan: Yeah, it definitely seems like the community is very excited about some of the things with, with Bumblebee and Hugging Face and the integration of pre trained models. So Alexis, I'm curious, you know, you're definitely more on the science and creation and training side. When that work is done, Caitlin and I, our experiences pre trained model integrated into code, how do you work with the developers at cars or, anyone else to get it implemented correctly, get that feedback I'm curious, what is different about integration of a model you train versus something Caitlin and I maybe pull out of like hugging face. Alexis: the steps getting there are definitely different and then also productionalizing it is slightly different. So depending on the type of project it is, it usually starts with exploring the requirements, exploring the data sources, understanding, The type of model that we might want to use in a given scenario and all of, the information that we gain from that and the decisions that we make then sort of play into how we would productionalize the model. For example, if we're working on something that needs to be real time or near real time, that would constrain the choices that we're able to make model wise because you have to take into account the inference time versus actual quality or accuracy trade off there. But typically for something that's batch or like run once daily or twice daily, we have fewer constraints there. And typically we'll use Docker to wrap up our pre processing training, scoring, and evaluation jobs, and then we would pass those off to, , engineering as well, so that they can be scheduled for jobs. Dan: Could you speak then maybe a little bit to more of the specific tools? , I know at the top we talked about Python, SQL, MATLAB. How did those fit into what you just outlined? Alexis: Usually when we're exploring the data, we'll be working in Python, maybe some SQL as well. , in grad school, I primarily used MATLAB, working with neural data, as well as Python, so. I haven't worked with MATLAB much in industry or at all in industry, it's primarily been Python and SQL, so when we're building out the pre processing, training, evaluation, scoring, all of those will be mostly in Python with some SQL, , as well as For the preprocessing jobs, if we're working with an extremely large amount of data, and depending on the requirements, we'll create like a PySpark job to handle that stuff as well. Those are typically the tools that are used for each of those steps. Dan: Cool. Caitlin, could you talk a little bit about, you know, what it was like getting started with machine learning and Elixir? Kind of your, your first experience with Bumblebee, ? Katelynn: getting into Bumblebee and machine learning, definitely the biggest hurdle was just trying to figure out where to get started because it's so new, there's not a lot of, Documentation on it, especially, there's a lot of great documentation for word parsing and there's some for sound parsing as well, but image parsing in particular is pretty in its early stages, so trying to find where to start with that was pretty difficult. But once I figured out what models are available through Bumblebee, because there are certain, as you said on Hugging Face, there are certain models that you're Able to connect to through Bumblebee, and deciding which one of those to connect through because I was working with video, how to parse that into images and send it into the model and also not overwhelm the model because I'm sending it frame by frame and figuring out that. That was, I would say, the hardest part of it. And then once I got started, it was a lot easier. Dan: There's a similarity between what I'm hearing Alexis say and what I'm hearing Caitlin say around a lot of the work is just getting all the data into the right shape. And then it's just a lot of data, which has its own challenges to, to mess with. But the preprocessing is something that you both keep mentioning over and over again, Alexis, from your standpoint, you know, developers getting more into machine learning because of tools like Bumblebee or, , being more interested or trying things out or trying to learn this, how, uh, I don't wanna say how does it make you feel, but how, how do you see that going? Any hot takes on developers getting more hands on with machine learning? Alexis: I mean, I think it's great if, if more people who are interested in it want to learn more about it. I think there's definitely a lot to learn and it definitely takes a while to get up to speed at first, but it's also okay not to know everything. I don't think anyone expects someone to know literally everything in the field. It would be almost impossible. So, Dan: Mm hmm. And Caitlin, besides your ElixirConf 2023 talk, any, , really great resources or where do you think somebody should start to get into machine learning with Elixir? Katelynn: Yeah, I would definitely say the machine learning in Elixir book that Sean Moriarty wrote is definitely a big one. I would also suggest going to Elixir Forum. It's a great source for anything really, but there are a lot of people talking about it. Machine learning on there, as well as the Bumblebee documentation itself. Dan: Alexis, from your standpoint, if I want to get to machine learning, how would I do that? Where would I start? Alexis: yeah, I mean, there are a ton of great. resources and great books and tutorials and everything out there online. For example, like things that are super well known, like Andrew's machine learning course from Stanford is pretty good for the basics. , in terms of books, there's like machine learning with scikit learn and data science for business, as like starter references. I think reading and like all of these courses and everything are great, but. I do feel like the most helpful thing for people who want to get into machine learning is going to be hands on practice. there are also a ton of great data sets online. So I guess I would say find one that interests you, learn about it, learn the issues with it, , figure out some questions you might want to answer with that data set, and then try a couple of different approaches, evaluate performance, try to improve. But I think one of the most important things is, also understanding why certain models are, more suited for certain types of questions that you're trying to answer. And I feel like that's something that you learn from hands on practice as well. You'll hit roadblocks that you wouldn't have thought that you might hit. And you can read about different approaches that people take to, to figure that out. But it's, it's a little bit different having to actually translate. That knowledge into an actual application, like actually doing it yourself. So I feel like that is something you can really only get from hands on practice, and actually working with some data. Katelynn: I definitely agree with you on that. That's what got me started in machine learning in the first place, is I think picking something you're interested in or passionate about and playing around with it, even if it's not Something that you think you will use to build something at your company, or even have a finished product by the end of it. If it's something that you're interested in, it's a great way to start picking at that and finding what those pieces do. Dan: sO Alexis curious if you have any questions for Caitlin as somebody who's new to it and just kind of trying it out and working with it in Elixir as you see all of your developer friends at cars. com loving Elixir, anything that's on your mind, given what we've talked about so far? Alexis: Sure, I guess, so where do you see Elixir with respect to machine learning going, like in the near future? You mentioned like right now, it's primarily pre trained models. What do you think the next steps there are? Katelynn: That's a good question. I do think that people are going to start, now that we are getting more interested in it as a community, training their own models, especially since Elixir is really good at its back end communicating with its front end very smoothly and open sockets. I think it's going to be huge for machine learning once we get that momentum. So I think that, hopefully, I'm hoping there's going to be interest and progress on building our own models in the future. Dan: Yeah, I, I think that, At least for me personally, , having Elixir experience and then having there be a way to very easily add in some machine learning and just try it out. And I think Alexis, like you were saying, you know. Just try, right? Throw something against the wall, see how it goes. Alexis: Yeah. Dan: but there's always that kind of initial, I have to learn enough to even be able to get something to run. And for me personally, , starting with Elixir and then being able to just add Bumblebee in a pre trained model was a way to at least get something to run, even if the output was like, well, that's not particularly interesting. You know, it was at least like, I don't know, it sat there for a while and thought about something, or learned something, and then, you know, spit something back. And, that kind of like repetitive loop, is a key to evolving your knowledge , on any topic. Katelynn: I, have something to add on to what you were just saying, , and,, I agree with you on the being able to jump into it because the initial motion tracking project that I have been working on is a personal project for myself that has been a passion project of mine since before I started software engineering, but I never touched it because it has way too many complicated parts to get to the final piece. And so when I learned about Bumblebee and found out that there was an entry point, it was like, okay, it still may take me years to get to where I want to, but at least I see an opening and I can start building that. And I think that's huge. Dan: Yeah, having a good starting point I think is, is the real winner there, right, just get something to feel success around and then build on that momentum. so Caitlin , do you have any questions for Alexis as someone who's working with Elixir developers on machine learning, but not working specifically with Elixir and Bumblebee? Katelynn: Yeah. I think my biggest question for you, Alexis, as someone who has a lot of experience with machine learning and a lot of knowledge of this, if you could talk to yourself at the beginning of your machine learning journey, what's something you would tell yourself or what's something you wish someone had told you at the beginning of your journey? Alexis: Honestly, I think it's that it's okay not to know everything, that the most important thing is to know what questions to ask to figure it out. You don't, you can't know every possible method like off the top of your head or know all of the intricacies of everything, but as long as you know what questions to ask and where to find resources to figure it out, that's completely okay. Katelynn: Could you give some examples of what those questions might be? What kind of questions you would ask? Alexis: Yeah, just learning, like, where resources are. if you want to learn more about, let's say, fine tuning a language model, for example, just knowing that there are a bunch of resources, , specifically on Hugging Face as well, like, tutorials, like, how to do this, walking you through what everything means. because, like, me personally, prior to working at CARS, I didn't have a ton of experience working with, generative AI or LLMs. And then I went from not having a lot of experience, , and mostly have just I've read the papers and some use cases that other teams had used to actually building a set of POCs and like a QA, vehicle recommendations chat bot, , things like that. So I went from almost zero to actually building the POC, just by knowing where to look for some of those resources. So looking at those tutorials, looking at those examples and things like that. Dan: so I have a question for, for both of you. I think something that is maybe outdated common knowledge is that you have to have a lot of computing resources to do any of this. And I guess I'm curious, would it be with the pre trained models or, you know, have things shifted in any way where you're fine tuning, do you still need, you know, to hand Amazon a huge check or can you get into this stuff with a little less pocket change? we'll start with Caitlin. Katelynn: For me personally, all the machine learning that I have done is on my MacBook Pro. I have no external data, I am not, everything is locally on my computer and it's running just fine. I had a lot of people when I started doing this, At my company being like, do you need external drives? How's your CPU? How's your GPU? And I just shrugged and went, I have no idea. It's running fine, though. So, I, I guess it's not that big of a deal. So, I would say that if people are worried about that, especially with the pre trained models, and it, they're sourced externally, so there's really no, that's not a barrier at all, in my opinion. Dan: Alexis, do you have similar competing thoughts? Alexis: I think it's definitely possible, but I also think it depends on the model, the model size, and also what you're using it for. For example, we have a couple of, , fine tuned models that if we were to run it on anything other than the GPU instance that we run it on for inference, it would take All day to run instead of a few hours. So things like that, I think, require additional resources, but working with, , smaller models locally, or maybe not running imprints on hundreds of thousands or millions of, elements, you, you could do some of that locally. Dan: Great. Katelynn: I do have a bit of a hot take on machine learning, I suppose. Dan: All right. Caitlin coming in with a hot take on machine learning. Go for it. Katelynn: I see the potential right now of it being, turning a little bit into Excel worksheets, in the sense that everybody is really excited about machine learning right now, and so they want machine learning to do everything, and every project wants to have machine learning in it, whether they really need it or not. And I think machine learning is exciting and is a great tool, but I don't think it's the solution to everything. And I've seen a lot of companies do this with Excel worksheets where they're like, it's my calendar. It keeps track of our finances. It does these things. And it's like, well, it wasn't really built to do that. And I see machine learning doing that same potential of people. Using it where they don't necessarily need it because it's so exciting. Dan: I wonder Alexis, if you have thoughts on, I'm sure you have thoughts on that sentence, but maybe specifically like, how you help coach maybe the business unit around, , what are the things we actually should be doing? What are our questions? This thing can actually help answer versus, things that are really not that they don't need a model to spit out something Alexis: Yeah, I think whether or not you use machine learning definitely depends on the question. like at CARS, for example, we have projects where we're not throwing a model at this problem. It's something that is a somewhat more simple analytics solution that we just schedule as like a daily, weekly, monthly job or whatever. We don't need to throw a machine learning model at a problem like that. But then we have things that definitely do require machine learning models. But again, we're not just throwing the newest, most hype driven model at every single problem. It's about figuring out which one is the best fit for the data that we have, the question that we have, and also the requirements in production, right? Like, if it needs to be something that's near real time, we can't be throwing... a massive fine tuned language model at it. , and if it's something that doesn't require one of, the super new, hyped up models, then we shouldn't be using resources on that. It, it really depends on the question. Dan: fantastic. With the advent of chat GPT and, these really popular mainstream large language models, Alexis, do you see, , things drastically changing with things becoming so accessible or just maybe some people solving the wrong problems or, or applying it in places where it's not a great fit? Alexis: Yeah, so I think there's still definitely a long way to go with... Things like ChatGPT, especially with respect to, creating custom logic flows and things like that, as well as the issue with hallucinations. THose are, two things that I feel like right now, there's still a lot more work to be done. It's pretty difficult to productionalize, something using ChatGPT or, or using that at a production scale, I guess, without a lot of, custom flows and things like that. You can't just throw ChatGPT at it and call it a day. Dan: I think Caitlin, it sounded like your interest in this kind of predates a lot of the technology accessibility, at least for you personally around it, you know, have you seen, I mean, obviously it becoming so popular and the excitement in the Elixir community has been a big benefit to you being able to make this happen, but do you have thoughts on, the more general models and some of the things we're seeing happen and how you see that impacting the work you're interested in? Katelynn: Yeah, I mean, I definitely am seeing that a lot more, not only developers, but companies, when you start, because we're a consultancy, when you're talking to them, it's definitely on the forefront of their mind because nobody wants to be behind. There are so many apps out there and websites that If you're behind, someone will just use another one, so I completely understand why people are very, this is at the forefront of their mind of being the forefront of the forefront, like, machine learning is big, so they want to be a part of it now, so that when everyone has it, they have already been doing it for a while , and it's the same, I think, for both developers and companies, That's kind of where I'm seeing it, personally. Dan: Fantastic. as we near the end of our time together, , any other questions or, or things you think we should be talking about when it comes to machine learning that we have not yet covered , Alexis: Just going off what you said earlier, like that a lot of people don't realize about machine learning is that I would say the majority of the time it really is about learning the data, learning the issues with it, cleaning it, pre processing it. And. The smaller portion of the time is the actual model building piece. You can't build a model that performs really well if you have garbage data going in. So I would say, like you said earlier, like the majority of the time is the part that people don't talk about a lot. And then the actual model building and model decisions are the smaller portion of the job. Dan: Katelyn, any other closing thoughts on machine learning and Elixir, or maybe particularly what you're most excited about in the Elixir space coming down the line? Katelynn: I do have a question for Alexis, just because I'm curious. So as an Elixir developer, Alexis, I'm curious, have you ever tried to use Elixir? Or is there any urge to ... Alexis: I have not, but I would be open to learning about it, definitely. , especially if there are, , certain use cases where there are qualities of Elixir that, I guess, are preferred over how we currently do it. Definitely open to learning new approaches and new methods. Katelynn: I guess that's also kind of what my question is, is that... Since Elixir is earlier in its stages of machine learning, I feel like, as someone who is using a language that has been doing it for a while, are there any sticking points that you really wish Python would allow you to do with machine learning , if you were going to create your own? Alexis: So there definitely are, a lot of, sticking points, I guess, um, and a lot of things are, model or like approach specific, so potentially if there were some more generalized, unified, methodology, I guess, behind, like, there definitely are some packages that like try to get at that, but then if you're working On more custom things, like it's going to be more specific to like each project, but I feel like, especially if you're just starting out, having a more generalizable approach could be something that's helpful. Katelynn: Cool. Thanks. I know that was a hard question, but. Since I have you here. Dan: Any final takes on machine learning and Elixir. in the future? Katelynn: Yeah, I think my final take is pretty similar to what I've been talking about. I just want to see people talking about it more and dipping their toes into it. And, like Alexis was saying, that, Just getting started and answering those questions. And if nothing else, if people aren't answering the questions, asking them. Finding out what people are interested in in Elixir and what they're passionate about. And just starting those conversations. And then moving forward from there. Dan: Great! As we bring this to a close, we're going to give you each a chance to plug any social media, , any libraries or side projects or hustles you want to promote, or anything else you want to make sure the audience knows about you, your work, or, what you're interested in with machine learning. So Alexis, we'll start with you. Alexis: Sure. Well, if anyone is interested in the other very wide variety of projects that, CARS Commerce is working on, , there is the CARS, com tech blog that's hosted on Medium that has a bunch of pretty great articles about some of the recent projects that, um, both myself and our team have been working on. Dan: Awesome. Sounds like a fun read. Caitlin, same question. Katelynn: Yeah, . People can reach out either via the LaunchScout website. We have a contact form and... Everyone is available to contact me through that. , on GitHub, I'm K-Burns. And I am also going to be at CodeBeam America in March, doing a similar version of the motion tracking talk that I did, but with a little bit more advanced technology. Dan: Awesome. Conference driven development for moving a side project forward love to see it. Alright, that's a wrap Outro: Elixir Wizards is a production of SmartLogic. You can find us online at smartlogic. io, and we're at SmartLogic on Twitter. Don't forget to like, subscribe, and leave a review. This episode was produced and edited by Paloma Pechenik for SmartLogic. We'll see you next week for more as we branch out from Elixir.