News

The paper releasing SEC4SR has been accepted by IEEE Transactions on Dependable and Secure Computing (TDSC), 2022.

About

This is the official webpage for paper SEC4SR: A Security Analysis Platform for Speaker Recognition.

In this paper, we systematically investigate transformation and adversarial training based defenses for speaker recognition systems (SRSs) and thoroughly evaluate their effectiveness using both non-adaptive and adaptive attacks under the same settings.

In summary, we make the following main contributions:

We perform the largest-scale evaluation of defenses against adversarial attacks in the speaker recognition domain, involving 23 defenses and 15 attacks. Our study provides lots of useful insights and findings that could advance research on adversarial examples in this domain and assist the maintainers of SRSs to deploy suitable defense solutions to enhance their systems.
We propose a new type of feature-level transformations dedicated for speaker recognition, called Feature Compression (FeCo). Though it solely cannot defeat white-box adaptive attacks, it is effective against black-box adaptive attacks and definitely enhances the robustness of adversarially trained models against white-box adaptive attacks.
We develop SEC4SR, the first platform for systematic and comprehensive evaluation of different adversarial attacks and defenses in the speaker recognition domain. It features mainstream SRSs, proper voice datasets, white-box and black-box attacks, techniques for mounting adaptive attacks, evaluation metrics and diverse defense solutions. We release our platform to foster further research in this direction.

Empirical Study Result

Transformation against Non-adaptive Attacks

Transformation against Adaptive Attacks

Transformation+Adversarial-Training against Adaptive Attacks

Additional Results (not appear in the paper)

Appendix C. Tuning the Parameters of Transformations

Audio Files

We provides our audio files for percetibility measurement and other purposes.

Original audios
Adversarial audios produced by non-adaptive attacks
Adversarial audios produced by adaptive attacks against different defenses:
- Time-domain defenses: [QT] [AT] [AS] [MS]
- Frequency-domain defenses: [DS] [LPF] [BPF]
- Speech compression defenses: [OPUS] [SPEEX] [AMR] [AAC-V] [AAC-C] [MP3-V] [MP3-C]
- Feature-level defense: [FeCo (EOT)]

For adversarial audios, after unzip, the directory A-B/X/X-Y/Z means the audios are crafted by the attack X with the attack paramter Y against the defense A with the defense parameter B on the speaker Z. For example, FeCo-ok-kmeans-raw-0_2-L2/FGSM/FGSM-0_002-50/1998 means the audios are crafted by FGSM attack with $\varepsilon=0.002$ and EOT_size $r=50$ against the defense FeCo with the defense parameter $cl_m=kmeans$ and $cl_r=0.2$ on the speaker 1998.

Platform: SEC4SR

To perform the above empirical study, we establish a SECurity evaluation platform FOR Speaker Recognition (SEC4SR).

Want to re-produce our experimental results, do something new with our platform, or even extend SEC4SR? Go to Code for SEC4SR for detailed instructions.