- [June 2023] SpeechGen paper is available [link]
- [May. 2023] SpeechPrompt v2 code released [link]
- [March 2023] SpeechPrompt v2 paper is available. [link]
- [Oct. 2022] Website of SeechPrompt project is borned
- [Oct. 2022] "SpeechPrompt" is an research topic in the JSALT workshop [website]
- [June 2022] SpeechPrompt v1 is accepted at INTERSPEECH 2022 [paper] [code]
- [March 2022] SpeechPrompt v1 is available on arXiv [link]
Self-supervised learning (SSL) has revolutionized the field of
computer vision (CV), natural language processing (NLP), and speech processing.
By pre-training a model on a large amount of unlabeled data in a self-supervised manner,
the model can learn universal representations that benefit downstream tasks.
However, to utilize these SSL models for downstream tasks, we usually follow the "pre-train, fine-tune paradigm". That is, we need to (1) design a downstream model, (2)fine-tune the model, and (3) store the parameters of the model. This causes a lot of computation and storage costs.
On the other hand, the "prompting paradigm" has been widely used in the NLP field. By leveraging the pre-trained language model's (LM) knowledge, prompt tuning optimizes a limited number of parameters for downstream tasks. Prompt tuning can serve a large number of downstream tasks in a unified manner with computation and storage efficiency.
However, the prompting paradigm has never been explored in the speech processing before. Recently, various spoken language models have been developed , which opens the door to apply prompt tuning for speech processing tasks ...