TuringBench is a benchmark environment that contains :
We've built a few resources to help you get started with the dataset. These datasets will be hosted on huggingfaces' datahub: data repo. We ask contributors to submit their code and/or model weights at turingbench@gmail.com so we can run the model on the test set to preserve the integrity of the results. Because TuringBench is an ongoing effort, we expect the dataset to increase. To keep up to date with major changes to the dataset.
Ask us questions at our emails turingbench@gmail.com.
The TuringBench Datasets will assist researchers in building robust Machine learning and Deep learning models that can effectively distinguish machine-generated texts from human-written. This Leaderboard is for the Turing Test scenario.
DETECTOR | F1 score |
---|---|
GROVER detector (Zellers et al. '19) |
0.5746 |
GPT-2 detector (OpenAI '19) |
0.5293 |
GLTR (Gehrmann et al. '19) |
0.3476 |
BERT (Devlin et al. '19) |
0.7944 |
RoBERTa (Liu et al. '19) |
0.5209 |
AVERAGE |
0.5534 |