Google DeepMind Introduces AlphaGeometry: An Olympiad-Level Artificial Intelligence System for Geometry

In a recent study, a team of researchers from Google DeepMind has introduced AlphaGeometry, an Artificial Intelligence (AI) system that can easily solve geometry Olympiad questions almost as well as a human gold medallist. Olympiad-level mathematical theorem proofs are noteworthy accomplishments that represent sophisticated automated reasoning abilities, especially in the difficult field of pre-university mathematics….

This AI Paper from Johns Hopkins and Microsoft Revolutionizes Machine Translation with ALMA-R: A Smaller Sized LLM Model Outperforming GPT-4

Machine translation, a crucial aspect of Natural Language Processing, has significantly increased. Yet, a primary challenge persists: producing translations beyond mere adequacy to reach near perfection. Traditional methods, while effective, often need to be improved by their reliance on large datasets and supervised fine-tuning (SFT), leading to limitations in the quality of the output. Recent…

UCLA Researchers Introduce Group Preference Optimization (GPO): A Machine Learning-based Alignment Framework that Steers Language Models to Preferences of Individual Groups in a Few-Shot Manner

Large Language Models (LLMs) are increasingly employed for various domains, with use cases including creative writing, chatbots, and semantic search. Many of these applications are inherently subjective and require generations catering to different demographics, cultural and societal norms, or individual preferences. Through their large-scale training, current language models are exposed to diverse data that allows…

ByteDance AI Research Unveils Reinforced Fine-Tuning (ReFT) Method to Enhance the Generalizability of Learning LLMs for Reasoning with Math Problem Solving as an Example

One effective method to improve the reasoning skills of LLMs is to employ supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations. However, this approach has limitations in terms of generalization because it heavily depends on the provided CoT data. In scenarios like math problem-solving, each question in the training data typically has only one annotated reasoning…

Researchers from the University of Washington and Allen Institute for AI Present Proxy-Tuning: An Efficient Alternative to Finetuning Large Language Models

The inherent capabilities of pretrained large language models are notable, yet achieving desired behaviors often requires additional adaptation. When dealing with models whose weights are kept private, the challenge intensifies, rendering tuning either excessively costly or outright impossible. As a result, striking the right balance between customization and resource efficiency remains a persistent concern in…

This AI Paper from China Introduces a Groundbreaking Approach to Enhance Information Retrieval with Large Language Models Using the INTERS Dataset

Large Language Models (LLMs) have exhibited remarkable prowess across various natural language processing tasks. However, applying them to Information Retrieval (IR) tasks remains a challenge due to the scarcity of IR-specific concepts in natural language. Addressing this, the idea of instruction tuning has emerged as a pivotal method to elevate LLMs’ capabilities and control. While…

Stability AI Releases Stable Code 3B: A 3 Billion Parameter Large Language Model (LLM) that Allows Accurate and Responsive Code Completion

Stable AI has recently released a new state-of-the-art model, Stable-Code-3B, designed for code completion in various programming languages with multiple additional capabilities. The model is a follow-up on the Stable Code Alpha 3B. It is trained on 1.3 trillion tokens including both natural language data and code data in 18 programming languages and codes.  Compared…

EASYTOOL: An Artificial Intelligence Framework Transforming Diverse and Lengthy Tool Documentation into a Unified and Concise Tool Instruction for Easier Tool Usage

Large Language Models (LLMs) have emerged as a transformative force in artificial intelligence, offering remarkable capabilities in processing and generating language-based responses. LLMs are being used in many applications, from automated customer service to generating creative content. However, one critical challenge surfacing with using LLMs is their ability to utilize external tools to accomplish intricate…

Assessing Natural Language Generation (NLG) in the Age of Large Language Models: A Comprehensive Survey and Taxonomy

The Natural Language Generation (NLG) field stands at the intersection of linguistics and artificial intelligence. It focuses on the creation of human-like text by machines. Recent advancements in Large Language Models (LLMs) have revolutionized NLG, significantly enhancing the ability of systems to generate coherent and contextually relevant text. This evolving field necessitates robust evaluation methodologies…

Fireworks AI Introduces FireAttention: A Custom CUDA Kernel Optimized for Multi-Query Attention Models

Mixture-of-Experts (MoE) is an architecture based on the “divide and conquer” principle to solve complex tasks. Multiple individual machine learning (ML) models (called experts) work individually based on their specializations to provide the most optimal results. To better understand their use cases, Mistral AI recently released Mixtral, an open-source high-quality MoE model that outperformed or…