摘要
In multimodal meta-learning, previous works modulate the meta- learned network to adapt to tasks, based on task embeddings extracted by an task encoder. However, it is ignored that similarity exists in the tasks from the same mode, and the similarity cannot be utilized when generating the initialization and loss for the task-learner in previous methods. In this paper, we propose a new method to leverage the task similarity in multimodal meta-learning, which provides a better-suited initialization and loss for the task-learner with consideration of the task characteristics. In our proposed Task Embedding Adaptation (TEA), a Transformer is introduced to refine the task embeddings by encouraging information aggregation among similar tasks. The enhanced task embeddings can be utilized to infer the task mode more accurately and modulate the meta- learner to generate a better task-specific initialization. Furthermore, Modulated Adaptive Loss module is proposed to generate task-specific loss adaptively based on the loss network and enhanced task embeddings obtained by TEA. In addition, we present an efficient Mixed-Loop Learning strategy to ensure the training efficiency of the task encoder and loss network instead of the traditional two-loop learning strategy. Extensive experiments on multimodal few-shot classification demonstrate that our method achieves state-of-the-art performance, and our method also performs well in regression and reinforcement learning.