TuringBench is a benchmark environment that contains :
from datasets import load_dataset
import pandas as pd
# AA
train = load_dataset('turingbench/TuringBench', name='AA', split='train')
train = pd.DataFrame.from_dict(train)
test = load_dataset('turingbench/TuringBench', name='AA', split='test')
test = pd.DataFrame.from_dict(test)
valid = load_dataset('turingbench/TuringBench', name='AA', split='validation')
valid = pd.DataFrame.from_dict(valid)
# GPT-1 TT task
TT_gpt1 = load_dataset('turingbench/TuringBench', name='TT_gpt1', split='train')
TT_gpt1 = pd.DataFrame.from_dict(TT_gpt1)
We've built a few resources to help you get started with the dataset. These datasets will be hosted on huggingfaces' datahub: data repo. We ask contributors to submit their code and/or model weights at turingbench@gmail.com so we can run the model on the test set to preserve the integrity of the results. Because TuringBench is an ongoing effort, we expect the dataset to increase. To keep up to date with major changes to the dataset.
Ask us questions at our emails turingbench@gmail.com.
The TuringBench Datasets will assist researchers in building robust Machine learning and Deep learning models that can effectively distinguish machine-generated texts from human-written texts. This Leaderboard is for the Authorship Attribution scenario.
Rank | Model | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|
1 May 5, 2021 |
RoBERTa (Liu et al., '19) |
0.8214 | 0.8126 | 0.8107 | 0.8173 |
2 May 5, 2021 |
BERT (Devlin et al., '18) |
0.8031 | 0.8021 | 0.7996 | 0.8078 |
3 May 5, 2021 |
BertAA (Fabien et al., '20) |
0.7796 | 0.7750 | 0.7758 | 0.7812 |
4 May 5, 2021 |
OpenAI detector |
0.7810 | 0.7812 | 0.7741 | 0.7873 |
5 May 5, 2021 |
SVM (3-grams) (Sapkota et al. '15) |
0.7124 | 0.7223 | 0.7149 | 0.7299 |
6 May 5, 2021 |
N-gram CNN (Shreshta et al., '17) |
0.6909 | 0.6832 | 0.6665 | 0.6914 |
7 May 5, 2021 |
N-gram LSTM-LSTM (Jafariakinabad, '19) |
0.6694 | 0.6824 | 0.6646 | 0.6898 |
8 May 5, 2021 |
Syntax-CNN (Zhang et al. '18) |
0.6520 | 0.6544 | 0.6480 | 0.6613 |
9 May 5, 2021 |
Random Forest | 0.5893 | 0.6053 | 0.5847 | 0.6147 |
10 May 5, 2021 |
WriteprintsRFC (Mahmood et al. '19) |
0.4578 | 0.4851 | 0.4651 | 0.4943 |
The TuringBench Datasets will assist researchers in building robust Machine learning and Deep learning models that can effectively distinguish machine-generated texts from human-written. This Leaderboard is for the Turing Test scenario.