SpeechPrompt

Introduction

Self-supervised learning (SSL) has revolutionized the field of computer vision (CV), natural language processing (NLP), and speech processing. By pre-training a model on a large amount of unlabeled data in a self-supervised manner, the model can learn universal representations that benefit downstream tasks.

However, to utilize these SSL models for downstream tasks, we usually follow the "pre-train, fine-tune paradigm". That is, we need to (1) design a downstream model, (2)fine-tune the model, and (3) store the parameters of the model. This causes a lot of computation and storage costs.

On the other hand, the "prompting paradigm" has been widely used in the NLP field. By leveraging the pre-trained language model's (LM) knowledge, prompt tuning optimizes a limited number of parameters for downstream tasks. Prompt tuning can serve a large number of downstream tasks in a unified manner with computation and storage efficiency.

However, the prompting paradigm has never been explored in the speech processing before. Recently, various spoken language models have been developed , which opens the door to apply prompt tuning for speech processing tasks ...

SpeechPrompt

Make Spoken Language Models Versatile!

NEWS

Introduction