Generative AI and Software Engineering Education

This post was also authored by Michael Hilton, associate teaching professor in the School of Computer Science at Carnegie Mellon University.

The initial surge of excitement and fear surrounding generative artificial intelligence (AI) is gradually evolving into a more realistic perspective. While the jury is still out on the actual return on investment and tangible improvements from generative AI, the rapid pace of change is challenging software engineering education and curricula. Educators have had to adapt to the ongoing developments in generative AI to provide a realistic perspective to their students, balancing awareness, healthy skepticism, and curiosity.

In a recent SEI webcast, researchers discussed the impact of generative AI on software engineering education. SEI and Carnegie Mellon University experts spoke about the use of generative AI in the curriculum and the classroom, discussed how faculty and students can most effectively use generative AI, and considered concerns about ethics and equity when using these tools. The panelists took questions from the audience and drew on their experience as educators to speak to the critical questions generative AI raises for software engineering education.

This blog post features an edited transcript of responses from the original webcast. Some questions and answers have been rearranged and revised for clarity.

Generative AI in the Curriculum

Ipek Ozkaya: How have you been using generative AI in your teaching? How can software engineering education take advantage of generative AI tools?

Doug Schmidt: I’ve been teaching courses on computer science, computer programming, and software engineering for decades. In the last couple of years, I’ve applied a lot of generative AI, particularly ChatGPT, in some courses I teach that focus on mobile cloud computing and microservices with Java. I use generative AI extensively in those courses to help create programming assignments and lecture material that I give to my students. I also use generative AI with the assessments that I create, including quiz questions based on my lectures and helping evaluate student programming assignments. More recently, as the Director, Operational Test and Evaluation in the Department of Defense, we’re evaluating how to use generative AI when assessing DoD systems for effectiveness, suitability, survivability, and (when necessary) lethality.

Many activities performed by software engineers and developers are tedious, manual, and error prone. In my teaching, research, and practice of these activities, I therefore try to identify boring and mundane activities that can be outsourced to generative AI, under close supervision and guidance on my or my TA’s part. For example, LLMs and various plug-ins like Copilot or CodeWhisperer are quite effective at documenting code. They’re also useful for determining build dependencies and configurations, as well as refactoring parts of a code base.

I teach many courses that use the Java platform, which is open source, so it’s easy to examine the underlying Java class implementations. However, Java method definitions are often not thoroughly documented (other than the comments above the method names and the class names), so when I review this Java source code, it’s typically complicated and hard to understand. In this case, I use tools like ChatGPT or Claude for code explanation and summarization, which help me and my students understand powerful Java frameworks that would otherwise be opaque and mysterious.

Michael Hilton: I’ve been a little more cautious than my colleague Doug. I’ve had the students do exercises while I’m present. I can therefore help answer questions and observe how they’re doing, mostly so I can learn about where they struggle, where the tools help, and where the gaps are. I do allow the use of generative AI in my classes for large projects. I just ask them to cite it, and there’s no penalty if they do. Probably around half the students end up using generative AI tools, and the other half tell me they don’t. I’ve also been doing some research around undergrads and their usage of generative AI tools in a more structured research context.

We also encourage them to use such tools heavily for learning language constructs for new programming languages—for example, if they’re not familiar with Python when they come into our course. We are trying to start teaching these tools in our classes because I am a firm believer that software engineering classes should prepare students for the realities of the real world that exists out there. I think it would be irresponsible to teach a software engineering class at this point and pretend like generative AI doesn’t exist in the real world.

Ipek: Are there new skill sets that are becoming more important to teach?

Doug: Absolutely. Some of these skill sets are what we’ve always emphasized but sometimes get lost behind the accidental complexities of syntax and semantics in conventional third-generation programming languages, such as C, C++, and Java. The most important skill is problem solving, which involves thinking clearly about what requirements, algorithms, and data structures are needed and articulating solutions in ways that are as straightforward and unambiguous as possible. Getting students to problem solve effectively has always been key to good teaching. When students write code in conventional languages, however, they often get wrapped around the axle of pointer arithmetic, linked lists, buffer overflows, or other accidental complexities.

A second important—and much newer—skill set is learning the art of effective prompt engineering, which involves interacting with the LLMs in structured ways using prompt patterns. Prompt engineering and prompt patterns help improve the accuracy of LLMs, as opposed to having them do unexpected or unwanted things. A related skill is learning to deal with uncertainty and nondeterminism since an LLM may not generate the same results every time you ask it to do something on your behalf.

Moreover, learning to decompose the prompts provided to LLMs into smaller pieces is important. For example, when I ask ChatGPT to generate code for me it usually produces better output if I bound my request to a single method. Likewise, it’s often easier for me to determine if the generated code is correct if my prompts are tightly scoped. In contrast, if I ask ChatGPT to generate vast amounts of classes and methods, it sometimes generates strange results, and I have a hard time knowing whether what it’s produced is correct. Fortunately, many of the skills needed to work with LLMs effectively are the same principles of software design that we’ve used for years, including modularity, simplicity, and separation of concerns.

Michael: I did my PhD on continuous integration (CI), which at the time was relatively new. I went around and interviewed a bunch of people about the benefits of CI. It turns out the benefit was that developers were actually running their unit tests, because before CI, no one actually ran their unit tests. I agree with everything that Doug said. We’ve always told people to read your code and understand it, but I think it hasn’t really been a top priority skill that had a reason to be exercised until now. I think that it is going to change how we do things, especially in terms of reading, evaluating, testing code that we didn’t write. Code inspection will be a skill that will become an even more valuable than it is now. And if it isn’t trustworthy—for example, if it doesn’t come from my colleague who I know always writes good code—we may need to look at code in a slightly suspect manner and think about it thoroughly. Things like mutation testing could become much more common as a way to more thoroughly evaluate code than we have done in the past.

Ipek: Where should generative AI be introduced in the curriculum? Are there new classes (for example, prompt engineering) that now need to be part of the curriculum?

Doug: To some extent it depends on what we’re trying to use these tools for. For example, we teach a data science course at Vanderbilt that provides an introduction to generative AI, which focuses on prompt engineering, chatbots, and agents. We also teach people how transformers work, as well as how to fine-tune and build AI models. These topics are important right now because high school students entering college simply don’t have that background. In a decade, however, these students will enter college knowing this kind of material, so teaching those topics as part of computer literacy will be less important.

We need to ensure our students have solid foundations if we want them to become effective computer and data scientists, programmers, and software engineers. However, starting too early by leapfrogging over the painful—but essential—trial-and-error phase of learning to become good programmers may be trying to supercharge our students too quickly. For instance, it’s premature to have students use LLMs in our CS101 course extensively before they first master introductory programming and problem-solving skills.

I believe we should treat generative AI the same way as other important software engineering topics, such as cybersecurity or secure coding. While today we have dedicated courses on these topics, over time it’s more effective if they become integrated throughout the overall CS curricula. For example, in addition to offering a secure coding course, it’s crucial to teach students in any courses that use languages like C or C++ how to avoid buffer overflows and common dynamic memory management errors. On the other hand, while teaching prompt engineering throughout the CS curricula is desirable, there’s also value in having specialized courses that explore these topics in more detail, such as the Introduction to Generative AI Data Science course at Vanderbilt mentioned above.

People often overlook that new generative AI skills, such as prompt engineering and prompt patterns, involve more than just learning “parlor tricks” that manipulate LLMs to do your bidding. In fact, effectively utilizing generative AI in non-trivial software-reliant systems requires a comprehensive approach that goes beyond small prompts or isolated prompt patterns. This holistic approach involves considering the entire life cycle of developing nontrivial mission-critical systems in collaboration with LLMs and associated methods and tools. In much the same way that software engineering is a body of knowledge that encompasses processes, methods, and tools, prompt engineering should be considered holistically, as well. That’s where software engineering curricula and professionals have a lot to offer this brave new world of generative AI, which is still largely the Wild West, as software engineering was 50 or 60 years ago.

Michael: One of my concerns is when all you have is a hammer, everything looks like a nail. I think the tool usage should be taught where it falls in the curriculum. When you’re thinking about requirements generation from a large body of text, that clearly belongs in a software engineering class. We don’t know the answer to this yet, and we will have to discover it as an industry.

I also think there’s a big difference between what we do now and what we do in the next couple years. Most of my students right now started their college education without LLMs and are graduating with LLMs. Ten years from now, where will we be? I think those questions might have different answers.

I think humans are really bad at risk assessment and risk analysis. You’re more likely to die from a coconut falling out of a tree and hitting you on a head than from being bitten by a shark, but way more people are afraid of sharks. You’re more likely to die from sitting in a chair than flying in an airplane, but who’s afraid to sit in a chair versus who’s afraid to fly in an airplane?

I think that by bringing in LLMs, we are adding a huge amount of risk to software lifecycle development. I think people don’t have a good sense of probability. What does it mean to have something that’s 70 percent right or 20 percent right? I think we will need to help further educate people on risk assessment, probability, and statistics. How do you incorporate statistics into a meaningful part of your workflow and decision making? This is something a lot of experienced professionals are good at, but not something we traditionally teach at the undergraduate level.

Equity and Generative AI

Ipek: How are students interacting with generative AI? What are some of the different usage patterns you are observing?

Doug: In my experience, students who are good programmers also often use generative AI tools effectively. If students don’t have a good mastery of problem solving and programming, they’re going to have difficulty knowing when an LLM is hallucinating and producing gobbledygook. Students who are already good programmers are thus usually more adept at learning how to apply generative AI tools and techniques because they understand what to look for when the AI starts going off the rails and hallucinating.

Michael: I am a firm believer that I want everyone in my class to be successful in software engineering, and this is something that’s very important to me. In a lot of the research, there is a correlation between a student’s success and their sense of self-efficacy: how good they think they are. This can often be independent of their actual skill level. It has commonly been studied that oftentimes students from underrepresented groups might feel that they have lower self-efficacy than other students.

In some of the experiments I’ve done in my class, I have noticed a trend where it seems like the students who have lower self-efficacy often struggle with the LLMs, especially when they give them code that is wrong. There is this kind of cognitive hurdle: essentially you have to say, “The AI is wrong, and I am right.” Sometimes students have a hard time doing that, especially if they are from an underrepresented group. In my experience, students’ ability to overcome that inertia is not necessarily dependent upon their actual skills and abilities as a student and often seems to correlate much more with students who maybe don’t look like everyone else in the classroom.

At the same time, there are students who use these tools and they absolutely supercharge their ability. It makes them much faster than they would be without these tools. I have concerns that we don’t fully understand the relationship between behavioral patterns and the demographic groups of students and important concepts like self-efficacy or actual efficacy. I’m worried about a world in which the rich get richer and the poor get poorer with these tools. I don’t think that they will have zero impact. My concern is that they will disproportionately help the students who are already ahead and will grow the gap between these students and the students who are behind, or don’t see themselves as being ahead, even if they are still really good students.

Ipek: Are there any concerns about resources and costs around including generative AI in the classroom, especially when we talk about equity?

Doug: Vanderbilt’s Introduction to Generative AI course I mentioned earlier requires students to pay $20 a month to access the ChatGPT Plus version, which is akin to paying a lab fee. In fact, it’s probably cheaper than a lab fee in many classes and is often much less expensive than the cost of college textbooks. I’m also aware that not everybody can afford $20 a month, however, so it would be great if colleges offered a program that provided funds to cover these costs. It’s also worth mentioning that unlike most other prerequisites and requirements we levy on our CS students, students don’t need a computer costing thousands of dollars to run LLMs like ChatGPT. All they need is a device with a web browser, which enables them to be as productive as other students with more powerful and costly computers for many tasks.

Michael: I started at a community college, that was my first institution. I’m well aware of the fact that there are different resourced students at different places. When I said, “The rich get richer and the poor get poorer earlier,” I meant that figuratively in terms of self-efficacy, but I think there is an actual concern monetarily of the rich getting richer and the poor getting poorer in a situation like this. I don’t want to discount the fact that for some people, $20 a month is not what they have lying around.

I’m also very concerned about the fact that right now all these tools are relatively cheap because they’re being directly subsidized by huge VC firms, and I don’t think that will always be the case. I could see in a few years the costs going up significantly if they reflected what the actual costs of these systems were. I know institutions like Arizona State University have announced that they have made premium subscriptions available to all their students. I think we’ll see more situations like this. Textbooks are expensive, but there are things like Pell Grants that do cover textbook costs; maybe this is something that eventually will become part of financial aid models.

The Future of Software Engineering Education

Ipek: How do we address the concerns that the students might take shortcuts with generative AI that become habitual and might hinder them becoming experts?

Michael: This is the million-dollar question for me. When I was in school, everyone took a compilers class, and now lots of people aren’t taking compilers classes. Most people aren’t writing assembly language code anymore. Part of the reason is because we have, as an industry, moved above that level of abstraction. But we have been able to do that because, in my lifetime, for all of the hundreds of thousands of bugs that I have written, I have never personally encountered the case where my code was correct, and it was actually the compiler that was wrong. Now, I’m sure if I was on a compilers team that would have been different, but I was writing high-level business logic code, and the compiler is essentially never wrong at this point. When they are wrong, it’s usually an implementation problem, not a conceptual theoretical problem. I think there is a view that the LLM becomes like a compiler, and we just operate at that level of abstraction, but I don’t know how we get there given the guarantees of correctness that we can never have with an LLM.

Given that we are all human, we’re often going to take the path of least resistance to finding the solution. This is what programmers have prided themselves in doing: finding the laziest solution to get the code to do the work for you. That’s something we value as a community, but then how do we still help people learn in a world where the answers are easily given, when based on what we know about human psychology, that will not actually help their learning? They won’t internalize it. Just seeing a correct answer doesn’t help you learn like struggling through and working out the answer on your own. I think it’s really something that we as a whole industry need to wrestle with coming forward.

Doug: I’m going to take a different perspective with this question. I encourage my students to use LLMs as low cost—but high fidelity—round-the-clock tutors to refine and deepen their understanding of material covered in my lectures. I screencast all my lectures and then post them on my YouTube channel for the world to enjoy. I then encourage my students to prepare for our quizzes by using tools like Glasp. Glasp is a browser plugin for Chrome that automatically generates a transcript from any YouTube video and loads the transcript into a browser running ChatGPT, which can then be prompted to answer questions about material in the video. I tell my students, “Use Glasp and ChatGPT to query my lectures and find out what kinds of things I talked about, and then quiz yourself to see if you really understood what I was presenting in class.”

More generally, teachers can use LLMs as tutors to help our students understand material in ways that would be otherwise untenable without having unfettered 24/7 access to TAs or faculty. Of course, this approach is premised on LLMs being reasonably accurate at summarization, which they are if you use recent versions and give them sufficient content to work with, such as transcripts of my lectures. It’s when LLMs are asked open-ended questions without proper context that problems with hallucinations can occur, though these are becoming less common with newer LLMs, more powerful tools, such as retrieval augmented generation (RAG), and better prompt engineering patterns. It’s heartening to see LLMs helping democratize access to knowledge by giving students insights they would otherwise be hard pressed to attain. There simply aren’t enough hours in the day for me and my TAs to answer all my students’ questions, but ChatGPT and other tools can be patient and answer them promptly.

Ipek: With the rise of generative AI, some argue that students are questioning if it’s worthwhile to pursue computer science. Do you agree with this?

Doug: I took an Uber ride in Nashville recently, and after the driver learned I taught software courses at Vanderbilt he said, “I’m a computer science student at a university in Tennessee—is it even worth being in software and development?” I told him the answer is a resounding yes for several reasons. First, we’ll ultimately need more programmers, because businesses and governments will be trying to solve much larger and more complex problems using generative AI tools. Second, there will be a lot of poorly generated code created by programmers working with these generative AI tools, which will incur lots of technical debt that humans will need to pay down.

Sometimes these generative AI tools will do a good job, but sometimes they won’t. Regardless of the quality, however, an enormous amount of new software will be created that is not going to maintain and evolve itself. People’s appetite for more interesting computing applications will also grow rapidly. Moreover, there will be a surge of demand for developers who know how to navigate generative AI tools and use them effectively in conjunction with other software tools to create business value for end users.

Michael: This is where I love to point out that there is a difference between software engineering and programming. I think how programming gets taught will necessarily have to evolve over the next few years, but I think software engineering skills are not going away. I like to talk about Jevons Paradox, which is an economics law that states that an increase in efficiency and resources will generate an increase in resource consumption rather than a decrease. Word processors and email have made paperwork easier to generate, but this hasn’t resulted in less paperwork than there was in the 1940s. It’s resulted in a lot more paperwork than there was in the 1940s. Will programming look the same in 10 years as it did 10 years ago? Probably not, but will software engineering skills be as valuable or more valuable in the future when all these people have these large piles of code that they don’t fully understand? Absolutely.

Ipek: Are you giving thought to continuing education courses about generative AI for deployment to the existing workforce?

Doug: I think that’s one of the other low-hanging fruit areas of focus. While our emphasis in this webcast is primarily computer science and software engineering education, there are many other non-CS professionals in universities, industry, and government that need to solve problems via computation. Historically, when these people asked software engineering and computer science teachers for help in using computation to solve their problems, we’d try to turn them into programmers. While that sometimes worked, it often wasn’t the best use of their time or of our time. Nowadays, these people may be better off learning how to become prompt engineers and using LLMs to do some portions of their computation.

For example, when I have a task requiring computation to solve, my first inclination is no longer to write a program in Java or Python. Instead, I first try to see if I can use ChatGPT to generate a result that’s accurate and efficient. The results are often quite surprising and rewarding, and they underscore the potential of applying generative AI to automate complex tasks and aid decision-making by emphasizing collaborative problem solving via natural language versus programming with traditional computer languages. I find this approach can be much more effective for non-CS professionals because they don’t necessarily want to learn how to code in third-generation programming languages, but they do know how to convey their intent succinctly and cogently via prompts to an LLM.

Michael: I’m not an expert in continuing education, so I’m not going to address that part of the question, although I think it’s important. But I will point out that you asked, “Are programmers going away?” The most commonly used programming language in the world is Excel. Imagine if every dentist office and every real estate office and every elementary school had someone who knows how to do prompt engineering and is using LLMs to do calculations for their business. These people are doing this right now, and they’re doing it in Excel. If those people start using LLMs, the number of programmers isn’t going to go down, it’s going to go up by orders of magnitude. And then the question is, How do we educate those people and teach them how to do it right with things like continuing education?

Doug: I think Michael makes a crucially important point here. Anybody who uses an LLM and becomes a more proficient prompt engineer is a programmer. They’re not programming in languages like Java, Python, and C++, but instead they’re programming in natural language via LLMs to get the results of computational processing. We need more—not fewer—people who are adept at prompt engineering. Likewise, we need sophisticated and multi-faceted software engineers who can manage all the programming that will be done by the masses, because we’re going to have a big mess if we don’t.

Software Engineering Institute

SEI Blog