ByteDance AI Research Unveils Reinforced Fine-Tuning (ReFT) Method to Enhance the Generalizability of Learning LLMs for Reasoning with Math Problem Solving as an Example
One effective method to improve the reasoning skills of LLMs is to employ supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations. However, this approach has limitations in terms of generalization because it heavily depends on the provided CoT data. In scenarios like math problem-solving, each question in the training data typically has only one annotated reasoning…