【原】LLMs之Prompt：《The Prompt Report: A Systematic Survey of Prompting Techniques》翻译与解读

处女座的程序猿 2024-07-01 发布于上海

展开全文

LLMs之Prompt：《The Prompt Report: A Systematic Survey of Prompting Techniques》翻译与解读

导读：这篇论文主要描述了目前常用的自然语言输入技术—"提示"(prompting)。论文还收集了近1.6万篇与提示技术相关的文献，并使用机器学习算法分析这些文献，提取出58种文本提示技术和40多种多媒体提示技术。论文重点介绍了上下文学习、零示范提示、思考生成和集成这几类提示技术，并举例说明了它们在自然语言处理任务中的应用。

>> 背景痛点：虽然Prompt技术在生成式人工智能系统中得到广泛使用，但该领域缺乏统一的术语和框架，术语混乱，对什么构成Prompt缺乏本体学理解。虽然Prompt技术的文献急剧增加，但只有一小部分技术和术语为从业人员所熟知。

>> 解决方案：该论文建立了一个Prompt技术的结构化理解，包括33个词汇术语、58种纯文本Prompt技术、40种其他模态(如多语种、多模态)的技术。对自然语言前缀Prompt的全部文献进行了元分析。

>> 核心思路步骤：

● 采用PRISMA系统评审过程，识别出58种基于文本的Prompt技术，并构建一个分类法。

讨论了多语种Prompt技术，如思维链、上下文学习、示例选择等。论文首先总结了目前主流的提示技术，将其分为以下几类：

基于上下文学习(in-context learning)。使用示例来教导模型完成任务，而无需重新训练权重。

零示范提示：使用简单的说明来引导模型，而无需示例。

思考生成：使用提示来引导模型表达其推理步骤。

分解：将复杂问题分解成简单子问题。

集成：使用多个提示来综合模型预测。

自我批判：使用提示来评估和改进模型自身预测。

● 探讨了多模态Prompt技术，如图像Prompt、音频视频Prompt、3D Prompt等。

● 介绍了一些Prompt技术的扩展，如利用外部工具的代理Prompt、用于评估Prompt输出的方法等。

● 分析了Prompt技术在安全性、对齐性等方面的一些问题及相应的缓解措施。

● 通过两个案例研究，对比评测了多种Prompt技术的表现。

>> 优势：

● 建立了Prompt领域的统一术语和分类框架，有利于从业者理解和实践。

● 系统全面地调研和总结了现有的Prompt技术，为后续研究提供资源参考。

● 探讨了Prompt技术在多语种、多模态、评估、安全等多方面的扩展应用。

● 案例研究对比了不同技术的实际效果，提供了实践指导。

总的来说，这篇论文系统地梳理和归纳了Prompt技术的方方面面，为该领域的后续发展奠定了基础。系统梳理了目前提示技术的分类体系，提出了术语的明晰定义，并对代表性技术进行了详细介绍。作者通过大量文献的梳理，给出了目前研究领域的较全面综述，为进一步提高提示效果奠定了理论基础。

《The Prompt Report: A Systematic Survey of Prompting Techniques》翻译与解读

地址

论文地址：https:///pdf/2406.06608

时间

2024 年6月6日

最新：2024 年6月17日

作者

马里兰大学、OpenAI、斯坦福大学、Microsoft、范德堡大学、普林斯顿大学等等

Abstract

Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a prompt due to the area’s nascency. This paper establishes a structured understanding of prompts, by assembling a taxonomy of prompting techniques and analyzing their use. We present a comprehensive vocabulary of 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques, and 40 techniques for other modalities. We further present a meta-analysis of the entire literature on natural language prefix-prompting.

生成式人工智能(GenAI)系统正越来越多地部署在工业和研究环境的各个部分。开发人员和最终用户通过使用提示或提示工程与这些系统进行交互。虽然提示是一个广泛和高度研究的概念，但由于该领域的兴起，存在相互冲突的术语和对提示构成的本体论理解不足。本文通过对提示技术进行分类和分析，建立了对提示的结构化理解。我们提出了33个词汇的综合词汇表，58个纯文本提示技术的分类，以及40个其他模式的提示技术。我们进一步对自然语言前缀提示的全部文献进行了荟萃分析。

8 Conclusion

Generative AI is a novel technology, and broader understanding of models’ capabilities and limita-tions remains limited. Natural language is a flexi-ble, open-ended interface, with models having few obvious affordances. The use of Generative AI therefore inherits many of the standard challenges of linguistic communication—e.g., ambiguity, the role of context, the need for course correction—while at the same time adding the challenge of communicating with an entity whose “understand-ing” of language may not bear any substantial re-lationship to human understanding. Many of the techniques described here have been called “emer-gent”, but it is perhaps more appropriate to say that they were discovered—the result of thorough ex-perimentation, analogies from human reasoning, or pure serendipity.	生成式人工智能是一项新技术，对模型的能力和局限性的广泛理解仍然有限。自然语言是一种灵活的、开放的接口，其模型几乎没有明显的启示。因此，生成式人工智能的使用继承了语言交流的许多标准挑战。与此同时，与一个对语言的“理解”可能与人类的理解没有任何实质性关系的实体进行交流，这也增加了挑战。这里描述的许多技术都被称为“emer-gent事件”，但也许更恰当的说法是它们是被发现的——是彻底实验的结果，是人类推理的类比，或者纯粹的偶然发现。
The present work is an initial attempt to catego-rize the species of an unfamiliar territory. While we make every attempt to be comprehensive, there are sure to be gaps and redundancies. Our inten-tion is to provide a taxonomy and terminology that cover a large number of existing prompt engineer-ing techniques, and which can accommodate future methods. We discuss over 200 prompting tech-niques, frameworks built around them, and issues like safety and security that need to be kept in mind when using them. We also present two case studies in order to provide a clear sense of models’ ca-pabilities and what it is like to tackle a problem in practice. Last, our stance is primarily observa-tional, and we make no claims to the validity of the presented techniques. The field is new, and evalua-tion is variable and unstandardized—even the most meticulous experimentation may suffer from unan-ticipated shortcomings, and model outputs them-selves are sensitive to meaning-preserving changes in inputs. As a result, we encourage the reader to avoid taking any claims at face value and to rec-ognize that techniques may not transfer to other models, problems, or datasets.	目前的工作是对一个陌生领域的物种进行分类的初步尝试。虽然我们尽一切努力做到全面，但肯定会有空白和冗余。我们的目的是提供一个涵盖大量现有提示工程技术的分类法和术语，并且可以适应未来的方法。我们讨论了超过200种提示技术、围绕它们构建的框架，以及在使用它们时需要牢记的安全性等问题。我们还提供了两个案例研究，以便清晰地了解模型的能力以及在实践中如何解决问题。最后，我们的立场主要是观察性的，我们不主张所提出的技术的有效性。这是一个新的领域，评估是可变的和不标准化的——即使是最细致的实验也可能遭受无法预料的缺点，模型输出本身对输入中保留意义的变化很敏感。因此，我们鼓励读者避免接受任何表面价值的主张，并认识到技术可能无法转移到其他模型，问题或数据集。
To those just beginning in prompt engineering, our recommendations resemble what one would recommend in any machine learning setting: un-derstand the problem you are trying to solve (rather than just focusing on input/output and benchmark scores), and ensure the data and metrics you are working with constitute a good representation of that problem. It is better to start with simpler ap-proaches first, and to remain skeptical of claims about method performance. To those already en-gaged in prompt engineering, we hope that our tax-onomy will shed light on the relationships between existing techniques. To those developing new tech-niques, we encourage situating new methods within our taxonomy, as well as including ecologically valid case studies and illustrations of those tech-niques.	对于那些刚刚开始进行即时工程的人，我们的建议类似于任何机器学习设置中的建议:理解您试图解决的问题(而不仅仅是关注输入/输出和基准分数)，并确保您正在使用的数据和指标能够很好地表示该问题。最好先从简单的方法开始，并对方法性能的说法保持怀疑。对于那些已经从事即时工程的人，我们希望我们的税法将阐明现有技术之间的关系。对于那些开发新技术的人，我们鼓励将新方法置于我们的分类法中，并包括生态有效的案例研究和这些技术的插图。