FAMO: A fast optimization method for multi-task learning (MTL) that mitigates the conflicting gradients using O(1) space and time

Multitask learning (MLT) involves training a single model to perform multiple tasks simultaneously, leveraging shared information to improve performance. While MLT is beneficial, it also presents challenges in managing large models and cross-task optimization. Average loss optimization can result in suboptimal performance if tasks progress unevenly. Balancing task performance and optimization strategies is crucial for effective MLT.

Existing solutions to mitigate the under-optimization problem in multitasking learning include gradient manipulation techniques. These methods calculate a new average loss update vector, ensuring that all task losses decrease more evenly. Although these approaches have improved performance, they can become computationally expensive for many tasks and model sizes. This is because at each iteration all task gradients must be calculated and stored, which introduces significant spatial and temporal complexities. In contrast, computing the average gradient is more efficient and requires less computational effort per iteration.

To overcome these limitations, a research team from the University of Texas at Austin, Salesforce AI Research and Sony AI recently published a new paper. In their work, they introduced Fast Adaptive Multitask Optimization (FAMO), a method intended to address the problem of under-optimization in multitask learning without the computational overhead associated with existing gradient manipulation techniques.

FAMO dynamically adjusts task weights to ensure balanced loss reduction across all tasks, leveraging loss history rather than computing all task gradients. Key contributions include introducing FAMO, an MTL optimizer with O(1) space and time complexity per iteration, and demonstrating its comparable or superior performance over existing methods in various MTL benchmarks with significant improvements in computational efficiency.

The proposed approach includes two main ideas: achieving balanced loss reduction across tasks and amortizing the computation over time.

  1. Balanced loss rate improvement:
  • FAMO aims to reduce all task losses as evenly as possible. It defines the improvement rate for each task based on the change in loss over time.
  • By formulating an optimization problem, FAMO seeks an update direction that maximizes the worst-case improvement rate across all tasks.
  1. Fast approach through amortization over time:
    • Instead of solving the optimization problem at each step, FAMO performs a single-stage gradient descent on a parameter representing task weights, thereby amortizing the computation over the optimization trajectory.
    • This is achieved by updating the task weights based on the change in log losses and approximating the gradient.

In practice, FAMO reparameterizes the task weights to ensure they stay within a valid range and introduces regularization to focus more on recent updates. The algorithm iteratively updates the task weights and parameters based on the observed losses to find a balance between task performance and computational efficiency.

Overall, FAMO provides a computationally efficient approach to multitasking optimization by dynamically adjusting task weights and amortizing computation over time. This results in improved performance without the need for extensive gradient calculations.

To evaluate Famo, the authors conducted empirical experiments in various test environments. They started with a toy-2-task problem and demonstrated Famo’s ability to efficiently mitigate conflicting gradients (CG). Famo consistently performed well when compared to state-of-the-art methods in MLT supervised and reinforcement learning benchmarks. Compared to methods such as NASHMTL, significant increases in efficiency, particularly in training time, were observed. In addition, an ablation study on the regularization coefficient γ highlighted the robustness of Famo in various environments, except for specific cases such as CityScapes where optimizing γ could stabilize the performance. The evaluation highlighted Famo’s effectiveness and efficiency in various multitasking learning scenarios.

In summary, FAMO represents a promising solution to the challenges of MLT by dynamically adjusting task weights and amortizing the computation over time. The method effectively mitigates under-optimization problems without the computational overhead associated with existing gradient manipulation techniques. Through empirical experiments, FAMO was able to demonstrate consistent performance improvements in various MLT scenarios, thereby proving its effectiveness and efficiency. With its balanced approach to loss reduction and efficient optimization strategy, FAMO offers a valuable contribution to the field of multitasking learning and paves the way for more scalable and effective machine learning models.

Visit the Paper. All credit for this research goes to the researchers of this project. Also don’t forget to follow us Twitter. Join our… Telegram channel, Discord channelAnd LinkedIn Grupp.

If you like our work, you will love ours Newsletter..

Don’t forget to join our 41k+ ML SubReddit

Mahmoud is a PhD student in machine learning. He also holds one
Bachelor’s degree in physics and master’s degree in
Telecommunications and network systems. Its current areas of
The research covers computer vision, stock market prediction and profundity
Learn. He wrote several scientific articles on personal information
Identifying and studying the robustness and stability of depths

Source link