search menu icon-carat-right cmu-wordmark

Harnessing the Power of Large Language Models For Economic and Social Good: 4 Case Studies

Computational models of natural language generation, understanding, and related tasks—collectively known as natural language processing—are not new. Following the demonstration of automatic translation of sixty Russian sentences to English in 1954, researchers predicted that machine translation would be a solved problem within five years. Yet despite early successes, most problems in natural language processing, including automatic translation, remained unsolved for more than a half century.

Despite gradual, but significant technical advances that have occurred in language models during the past 50 plus years, the release of ChatGPT in November 2022 was as a tipping point: For the first time, a language model entered widespread use. ChatGPT did so, in part, because of the greater accuracy of its responses relative to earlier language models and the emergent abilities of large language models (LLMs), which use deep neural networks to (DDNs) to learn about the likelihood of words appearing in the context of different sentences and paragraphs. Specifically, LLMs are capable of incontext learning—that is, adjusting how they respond based on user instructions. This ability allows LLMs to complete novel tasks that they were not trained for.

To better understand the potential uses of LLMs and their potential impact, a team of researchers in the SEI CERT Division conducted four in-depth case studies. The case studies span multiple domains and call for vastly different capabilities. In all of these, we used a version of GPT-3.5 provided in the ChatGPT web-based application. This blog post, the second in a series, outlines four case studies, that explore the potential of ChatGPT and also explores limitations and future uses. Our first post explored the underlying priniciples of LLMs.

Further details about the case studies, including complete model transcripts, are contained in our white paper, from which this series of posts is adapted.

Four LLM Case Studies

Data science. As the business landscape becomes increasingly data-centric, organizations are striving to incorporate data science capabilities to gain a competitive advantage. Despite the evident potential, integrating these capabilities into business lines presents significant challenges. From assembling a versatile data science team to instituting robust data science processes, organizations face steep hurdles. Maintaining quality assurance standards, ensuring the durability of deployed products, and catering to the rising demand for new data science products all add to the complexity of the task. In this case study, we create a data-driven intrusion detection system with ChatGPT.

Training and education. To produce a highly capable workforce, organizations must invest in human capital development. This investment includes delivering training and education to equip individuals with the requisite knowledge, skills, and competencies for their respective roles. However, the process of curating and delivering training materials is labor-intensive and costly. Training managers are burdened with the responsibility to create, update, and adapt content, abiding by instructional design principles while also personalizing it to suit diverse learning needs. In this case study, we create a training curriculum for data scientists in cybersecurity with ChatGPT.

Research. In research and development, the literature review process is the foundation upon which new knowledge and innovative ventures are built. To expand the horizons of knowledge, researchers must be well-versed with existing knowledge. To create cutting-edge products, designers must understand the science driving emerging technologies. However, the rapid rate of publication makes it difficult to stay informed in even relatively narrow sub-areas. To generate effective literature reviews, researchers must be systematic, comprehensive, critical, and timely. In this case study, we perform a literature review on AI safety using ChatGPT.

Strategic planning. Long-term thinking and planning are essential for sound decision making when dealing with uncertainties about the pace of technological development and the future global environment. Foresight methods are well-established tools for such long-range planning, but their implementation is challenging. To effectively deploy these methods, decision makers must assemble subject matter experts, scrutinize assumptions, and invest substantial time and financial resources in data gathering and analysis. In this case study, we identify potential applications of emerging technologies for training and education using ChatGPT.

Four Attributes Observed in LLM Case Studies

Across the case studies, we observed four attributes of ChatGPT that enhanced the quality and efficiency of products created by human users.

Knowledge—Knowledge is the information imbued during training that ChatGPT brought to bear while performing tasks. For example, when asked to create a classifier in the data science case study, ChatGPT loaded the proper Python libraries, and it used the correct syntax to fit a logistic regression model to the given data. Moreover, when asked to describe random forests in the training case study, it generated accurate and concise bullets.

Creativity—Creativity is the application of existing knowledge to new problems, and the combination of disparate elements in new ways. For example, when asked to write a science fiction story in the strategic planning case study, ChatGPT provided a vivid account of how AI could transform cyber security training. Moreover, when asked to integrate concepts about random forests in the training case study, it created a coding exercise with an accompanying cover story.

Evaluation—Evaluation is the use of knowledge to deliver critical feedback about computer programs and text passages. For example, in the training case study, ChatGPT provided personalized feedback based on student responses. Moreover, in the research case study, it identified strengths and limitations of journal articles.

Communication—Communication is the ability to use natural language to communicate information to different audiences. For example, in the data science case study, ChatGPT generated documentation explaining the code it produced. Moreover, in the training case study, it crafted responses for different formats (e.g., PowerPoint bullets versus text passages) and for different audiences (e.g., data science novices versus experts.)

Table 1 shows the attributes of ChatGPT that we observed in each case study. ChatGPT’s knowledge and communication featured prominently in all case studies. This action is consistent with the fact that ChatGPT is, at its core, a store of knowledge and a model of language production. ChatGPT’s creativity featured prominently in three case studies. LLMs’ tendency to hallucinate, or generate responses unfaithful to source content, have been viewed as a liability. When ChatGPT is asked to give creative responses, however, this characteristic is an asset. Finally, ChatGPT’s ability to evaluate content featured prominently in three case studies. This attribute is particularly powerful in training and education because it can be used to provide personalized feedback to students at scale.


Case Study





Data Science

Training and Education


Strategic Planning

Limitations of ChatGPT and Strategies to Overcome Them

Notwithstanding these strengths, we found that ChatGPT had limitations. Table 2 enumerates these along with strategies to overcome them. For example, ChatGPT’s training cutoff date was in 2021, meaning that knowledge about world events and scientific developments only go up to this point. In case studies, this cutoff led to knowledge gaps about very recent Python libraries, and blind spots about recent scientific publications. To deal with the latter limitation, one can leverage extensions that allow ChatGPT to interact directly with source material. However, while ChatGPT can summarize this material, it cannot integrate information into its existing model without fine-tuning.

Moreover, ChatGPT may produce incorrect or misleading information, which is especially pernicious because the misinformation is very convincing. For example, it may generate spurious citations that resemble real journal articles. The implication is that a knowledgeable human must check ChatGPT’s outputs. Prior research shows that humans may become overly reliant on automation. In the case of LLMs, training and education are needed to ensure that humans rely appropriately on AI.

Lastly, ChatGPT was not able to complete multi-part tasks, such as generating a large computer program or creating a course with multiple modules. To deal with this limitation, a knowledgeable human must decompose the task into simpler ones that ChatGPT can complete. This challenge is reduced, but not eliminated by new tools like the API for Advanced Data Analytics.

Table 2: Limitations of ChatGPT and Strategies to Overcome Them



Does not execute code in real-time and thus cannot directly validate its functionality or correctness

· Leverage extensions to run ChatGPT code, including Advanced Data Analytics

· Run code manually and provide output to ChatGPT

Does not have access to very recent programming libraries or updates to existing ones

· Leverage extensions to point ChatGPT to code repositories, such as ChatWithGit and AskTheCode

Cannot complete large programming tasks that require planning, decomposition, and integration of sub-tasks

· Use interactive approach where human decomposes complex tasks into simpler parts for ChatGPT to complete

Interactions primarily occur using written text

· Leverage speech-to-text and text-to-speech extensions to enable spoken interactions

Lacks deep knowledge in narrowly focused areas

· Retrain LLM with additional examples from targeted domain using public API or on-premise implementation of LLM

May produce incorrect or misleading information

· Retain human-in-the-loop

Does not have access to very recent publications or restricted or proprietary documents

· Leverage plug-ins like Accurate PDF and AskYourPDF that allow ChatGPT to interact directly with source material

· Retrain LLM with additional documents using public API or on-premise implementation of LLM

Integrative Themes from LLM Case Studies

From our explorations in the case studies, we noted five overarching themes from our technical analysis of ChatGPT and further considerations for its use.

  1. ChatGPT has remarkable range, but it is not artificial general intelligence (AGI). AGI is a hypothetical type of AI that can learn to accomplish any task that a human being can perform. We found that ChatGPT had limited ability to complete complex, multi-step tasks. In several case studies, the human needed to define narrower tasks for ChatGPT to complete.
  2. ChatGPT’s syntactic abilities are separate from the knowledge it possesses, and they have further uses. For example, ChatGPT can summarize and extract themes from source material. This capability goes beyond the types of linguistic analyses possible with existing NLP tools (e.g., latent semantic analysis). ChatGPT can also generate responses in different tones and for different audiences. Thus, different applications of ChatGPT can leverage its semantic abilities, its world knowledge, or both.
  3. Traceability is a paramount concern with ChatGPT. Traceability refers to the property of a model to be able to trace its outputs back to inputs. This property is not possible for ChatGPT; it does not store or recall information from specific sources when it responds. Rather, it generates responses based on patterns and structures present in language used during training. The implication is that although most of ChatGPT’s assertions sound plausible, some are fabricated, and all must be verified.
  4. ChatGPT’s use of world knowledge mimics multiple levels of understanding. Bloom’s Taxonomy is a framework for understanding people’s mastery of increasingly complex skills and knowledge. The taxonomy begins with remembering factual knowledge and progresses through understanding, applying, analyzing, synthesizing, and evaluating. ChatGPT made contributions across all levels of understanding, underscoring the wide range of potential uses.
  5. ChatGPT can be evaluated in terms of the quality of outputs relative to humans or the speed of outputs. Due to their subjective nature, our case studies do not directly permit evaluation of the quality of outputs. However, ChatGPT dramatically increased throughput in all the case studies. Thus, although ChatGPT does not replace humans, it may allow them to focus on the most challenging and nuanced parts of a task.

Future Considerations: LLMs Augmenting Human Intelligence

Through four case studies, we have discovered powerful opportunities for LLMs to augment human intelligence. As the AI revolution unfolds, therefore, we must remain mindful of potential harms, while equally recognizing and embracing the remarkable potential for societal benefits.

Additional Resources

Read the first post in this series, Harnessing the Power of Large Language Models For Economic and Social Good: Foundations.

Read the white paper on which this series of posts was based “Demonstrating the Practical Utility and Limitations of ChatGPT Through Case Studies” by Matthew Walsh, Dominic A. Ross, Clarence Worrell, and Alejandro Gomez.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

OpenAI. (2023). GPT-4 Technical report.

Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human factors, 39(2), 230-253.

Schwab, K. (2017). The Fourth Industrial Revolution. Crown Publishing, New York, NY

Turing, A. (1950). Computing Machinery and Intelligence. Mind, LI(236), 433–460.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

The Messy Middle of Large Language Models with Jay Palat and Rachel Dzombak

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed