cardiffnlp/twitter-roberta-base-sentiment

1年前发布 3 00

Twitter-roBERTa-base for Se...

收录时间：

2025-06-02

打开网站手机查看

cardiffnlp/twitter-roberta-base-sentiment

打开网站

Twitter-roBERTa-base for Sentiment Analysis

This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see XLM-T).

Reference Paper: TweetEval (Findings of EMNLP 2020).
Git Repo: Tweeteval official repository.

Labels:
0 -> Negative;
1 -> Neutral;
2 -> Positive
New! We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets.
See twitter-roberta-base-sentiment-latest and TweetNLP for more details.

Example of classification

from transformers import AutoModelForSequenceClassification from transformers import TFAutoModelForSequenceClassification from transformers import AutoTokenizer import numpy as np from scipy.special import softmax import csv import urllib.request # Preprocess text (username and link placeholders) def preprocess(text): new_text = [] for t in text.split(" "): t = '@user' if t.startswith('@') and len(t) > 1 else t t = 'http' if t.startswith('http') else t new_text.append(t) return " ".join(new_text) # Tasks: # emoji, emotion, hate, irony, offensive, sentiment # stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary task='sentiment' MODEL = f"cardiffnlp/twitter-roberta-base-{task}" tokenizer = AutoTokenizer.from_pretrained(MODEL) # download label mapping labels=[] mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt" with urllib.request.urlopen(mapping_link) as f: html = f.read().decode('utf-8').split("\n") csvreader = csv.reader(html, delimiter='\t') labels = [row[1] for row in csvreader if len(row) > 1] # PT model = AutoModelForSequenceClassification.from_pretrained(MODEL) model.save_pretrained(MODEL) text = "Good night " text = preprocess(text) encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores) # # TF # model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) # model.save_pretrained(MODEL) # text = "Good night " # encoded_input = tokenizer(text, return_tensors='tf') # output = model(encoded_input) # scores = output[0][0].numpy() # scores = softmax(scores) ranking = np.argsort(scores) ranking = ranking[::-1] for i in range(scores.shape[0]): l = labels[ranking[i]] s = scores[ranking[i]] print(f"{i+1}) {l} {np.round(float(s), 4)}")

Output:
1) positive 0.8466 2) neutral 0.1458 3) negative 0.0076

BibTeX entry and citation info

Please cite the reference paper if you use this model.
@inproceedings{barbieri-etal-2020-tweeteval, title = "{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification", author = "Barbieri, Francesco and Camacho-Collados, Jose and Espinosa Anke, Luis and Neves, Leonardo", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.findings-emnlp.148", doi = "10.18653/v1/2020.findings-emnlp.148", pages = "1644--1650" }

数据统计

暂无评论

您必须登录才能参与评论！

立即登录

暂无评论...

cardiffnlp/twitter-roberta-base-sentiment

Twitter-roBERTa-base for Sentiment Analysis

Example of classification

BibTeX entry and citation info

数据统计

相关导航

暂无评论

网址

GI加速器

青苹果影院

大西瓜影视

老弟影视

YY直播

笔趣阁

热门推荐