基于RMT思路调用可prompt engineering LLM 生成论文摘要的思路初探
我通过调用GPT-3.5-turbo的API,进行了长文本的总结尝试,本次尝试是如下论文:
《Tree of Thoughts: Deliberate Problem Solving with Large Language Models》


原论文摘要:
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/ysymyth/tree-of-thought-llm.
语言模型越来越多地被用于广泛的任务中进行通用问题解决,但在推理过程中仍受限于基于标记、从左到右的决策过程。这意味着在需要探索、策略前瞻或初始决策至关重要的任务中,它们可能表现不佳。为了克服这些挑战,我们引入了一种新的语言模型推理框架,即“思路树”(Tree of Thoughts,ToT),它在流行的“思路链”提示语言模型方法的基础上进行了推广,并实现了对连贯文本单位(即“思路”)的探索,这些单位作为解决问题的中间步骤。ToT允许语言模型进行有意识的决策,通过考虑多个不同的推理路径和自我评估选择,决定下一步行动,并在必要时向前或向后跟踪以做出全局性选择。我们的实验表明,ToT极大增强了语言模型在三个需要复杂规划或搜索的新任务上的问题解决能力:24点游戏、创意写作和小型填字游戏。例如,在24点游戏中,虽然只有使用思路链提示的GPT-4解决了4%的任务,我们的方法的成功率达到了74%。包含所有提示的代码存储库:https://github.com/ysymyth/tree-of-thought-llm。
GPT-3.5-turbo总结的分点摘要:
1. Language models typically make token-level, left-to-right decisions during inference, limiting their ability to perform tasks that require exploration and strategic lookahead.
2. The Tree of Thoughts (ToT) framework enables exploration over coherent units of text, or thoughts, that serve as intermediate steps toward problem solving.
3. ToT allows language models to perform deliberate decision-making, consider multiple reasoning paths, and self-evaluate choices to decide the next course of action.
4. ToT combines the language-based capability to generate and evaluate diverse thoughts with search algorithms, such as breadth-first or depth-first search.
5. ToT significantly enhances language models' problem-solving abilities in tasks requiring non-trivial planning or search, such as Game of 24, Creative Writing, and Mini Crosswords.
6. ToT's implementation of search heuristics via LM self-evaluation and deliberation is novel and more general and flexible than existing methods, such as Chain of Thought (CoT) prompting or Self-consistency with CoT (CoT-SC).
7. The Tree of Thoughts framework enables deliberate problem-solving using language models.
8. Human problem-solving involves searching through a tree-like problem space, with heuristics guiding the search process.
9. Existing LM approaches have shortcomings in exploring different branches of the problem space and incorporating planning and search.
10. ToT addresses these shortcomings by framing problems as a search over a tree, evaluating states with LM-generated heuristics, and using search algorithms.
11. ToT is adaptable, modular, and convenient, requiring only a pre-trained LM.
12. ToT shows strong empirical performance on hard tasks, outperforming standard IO and CoT prompting.
13. The Tree of Thoughts (ToT) framework is a search-based approach that addresses shortcomings in language model (LM) approaches.
14. ToT frames problems as a search over a tree, using LM-generated heuristics to evaluate states and search algorithms to traverse the tree.
15. ToT is adaptable, modular, and convenient, and demonstrates strong empirical performance on hard tasks.
16. The ToT framework was tested on a mathematical reasoning challenge (Game of 24) and a creative writing task, showing superior performance compared to other approaches.
17. ToT outperforms input-output and chain-of-thought prompts in both tasks, and iterative refinement improves coherency scores on the creative writing task.
18. ToT offers a promising new way to use language models to solve problems requiring search or planning.
19. ToT framework is effective for solving tasks requiring search or planning.
20. ToT outperforms traditional prompts and iterative refinement improves performance.
21. ToT performs well on hard tasks in natural language processing.
22. ToT is explored on a harder search problem involving natural language in mini crosswords.
23. ToT significantly improves all metrics in mini crosswords, achieving a word-level success rate of 60%.
24. ToT's proposed Tree-of-Thought approach extends existing planning formulations.
25. ToT's Tree-of-Thought approach considers multiple feasible plans simultaneously and uses the LM for decision making.
26. ToT's self-reflection mechanism, program-guided LLM generation, and versatility make it promising for natural language processing problems.
27. ToT is a modern rendition of classical search methods and can be considered as a heuristic search algorithm.
28. ToT may not be necessary for existing tasks that GPT-4 excels at, but it could present new opportunities for real-world decision making applications.
29. ToT allows users to customize performance-cost tradeoffs, and ongoing open-source efforts should reduce costs in the future.
30. ToT improves the interpretability of model decisions and the opportunity for human alignment.
-----中文翻译------
1. 语言模型通常在推理过程中进行基于标记的从左到右的决策,限制了它们在需要探索和策略前瞻的任务中的表现能力。
2. 思路树(ToT)框架可以在连贯的文本单位(即思路)上进行探索,这些思路作为解决问题的中间步骤。
3. ToT允许语言模型进行有意识的决策,考虑多个推理路径,并自我评估选择以决定下一步行动。
4. ToT将生成和评估多样思路的语言能力与广度优先搜索或深度优先搜索等搜索算法结合起来。
5. ToT极大增强了语言模型在需要复杂规划或搜索的任务中的问题解决能力,如24点游戏、创意写作和小型填字游戏。
6. ToT通过语言模型的自我评估和思考来实现搜索启发式,其更加新颖、更具一般性和灵活性,而不同于现有方法,如思路链提示(CoT)或带有思路链的自一致性(CoT-SC)。
7. 思路树框架使得利用语言模型进行有意识的问题解决成为可能。
8. 人类问题解决涉及对问题空间进行类似树状的搜索,启发式指导搜索过程。
9. 现有的语言模型方法在探索问题空间的不同分支和整合规划和搜索方面存在局限性。
10. ToT通过将问题构建为树状搜索、使用语言模型生成的启发式评估状态以及使用搜索算法来解决这些局限性。
11. ToT是一种可适应、模块化和便利的方法,只需要一个预训练的语言模型。
12. ToT在困难任务上表现出强大的实证性能,优于标准的输入-输出和思路链提示。
13. 思路树(ToT)框架是一种解决语言模型(LM)方法中局限性的基于搜索的方法。
14. ToT将问题构建为对树状结构的搜索,使用语言模型生成的启发式评估状态,并使用搜索算法遍历树状结构。
15. ToT具有适应性、模块化和便利性,并在困难任务上展现出强大的实证性能。
16. 思路树框架在数学推理挑战(24点游戏)和创意写作任务上进行了测试,表现优于其他方法。
17. ToT在这两个任务中优于输入-输出和思路链提示,而迭代改进提高了创意写作任务的连贯性得分。
18. ToT为利用语言模型解决需要搜索或规划的问题提供了一个有前景的新方法。
19. ToT框架对于解决需要搜索或规划的任务非常有效。
20. ToT在传统提示方法上表现出色,而迭代改进提高了性能。
21. ToT在自然语言处理的困难任务上表现良好。
22. ToT在涉及自然语言的小型填字游戏中进行了探索。
23. ToT在小型填字游戏中显著提高了所有指标,实现了60%的单词级成功率。
24. ToT提出的思路树方法扩展了现有的规划表述。
25. ToT的思路树方法同时考虑多个可行方案,并利用语言模型进行决策。
26. ToT的自我反思机制、程序引导的语言模型生成和灵活性使其在自然语言处理问题上具有巨大潜力。
27. ToT是对经典搜索方法的现代演绎,可以被视为一种启发式搜索算法。
28. 对于GPT-4在现有任务上表现出色的情况下,ToT可能并非必需,但它可能为现实世界的决策应用提供新的机会。
29. ToT允许用户自定义性能与成本之间的权衡,并且正在进行的开源工作将在未来降低成本。
30. ToT改善了模型决策的可解释性和与人类对齐的机会。
总体摘要(由GPT-4 API生成):
标题:树状思维模型:使用大型语言模型进行深思熟虑的问题解决
当前的语言模型(LM)在推理过程中主要采取由左到右的逐个符号决策,限制了它们在需要探索和战略预判的任务中的性能。针对这个挑战,一篇名为《Tree of Thoughts: Deliberate Problem Solving with Large Language Models》的论文提出了一种名为“树状思维”(Tree of Thoughts,简称ToT)的框架。这种框架通过允许模型对多个推理路径进行深思熟虑和自我评估,从而提高了大型语言模型的问题解决能力。
ToT框架将问题设定为对一棵树的搜索,通过使用由语言模型生成的启发式方法评估状态,并利用搜索算法(例如广度优先或深度优先搜索)进行遍历。它可以在需要非琐碎规划或搜索的任务中,如24点游戏、创意写作和迷你填字游戏等,显著提升语言模型的问题解决能力。
ToT的另一大特点是其创新性的搜索启发式实现,通过语言模型的自我评估和审慎思考,比现有的方法(例如Chain of Thought (CoT)提示或自洽性与CoT (CoT-SC))更具一般性和灵活性。
ToT框架在执行过程中展示了强大的实证性能,优于标准的输入-输出和CoT提示。在24点数学推理挑战和创意写作任务中,ToT表现优于其他方法,并且迭代改进可以提高创意写作任务的连贯性评分。
在一个更具挑战性的迷你填字游戏搜索问题中,ToT显著提高了所有度量,实现了60%的词级成功率。ToT的"树状思维"方法扩展了现有的规划公式,考虑了多种可行计划,并使用语言模型进行决策。
ToT作为经典搜索方法的现代版本,可以被视为启发式搜索算法。虽然在GPT-4擅长的现有任务中,ToT可能并不必要,但它可能为真实世界的决策应用提供新的机会。
此外,ToT允许用户自定义性能-成本权衡,并有望在未来通过开源努力降低成本。最后,ToT提高了模型决策的可解释性,并增加了与人类理解相一致的机会。
总的来说,树状思维模型为使用语言模型解决需要搜索或规划的问题提供了一个前景广阔的新方式。通过自我反思机制、程序引导的大型语言模型生成以及其广泛的适用性,ToT在解决自然语言处理问题上具有巨大的潜力。
思路:
如之前的专栏所述:https://www.bilibili.com/read/cv24103893
参考代码(有改动):
https://github.com/cr941131/summarize-long-text-with-GPT-3.5-turbo