訂閱
糾錯(cuò)
加入自媒體

使Twitter數(shù)據(jù)對(duì)可樂進(jìn)行客戶情感分析

介紹可口可樂(Coca-Cola)和百事可樂(PepsiCo)是軟飲料行業(yè)的知名品牌,兩家公司均躋身《財(cái)富》500強(qiáng)。在競(jìng)爭(zhēng)激烈的市場(chǎng)中擁有廣泛產(chǎn)品線的公司彼此之間存在著激烈的競(jìng)爭(zhēng),并在隨后的幾乎所有垂直產(chǎn)品市場(chǎng)中不斷爭(zhēng)奪市場(chǎng)份額。通過從每家公司的官方推特下載5000條推文來分析這兩家公司的客戶情緒,并在R中進(jìn)行分析。在這一分析中,我們可以了解如何從品牌的社交媒體參與(在本例中為推特)中分析客戶情緒。目錄涉及的軟件包及其應(yīng)用什么是情緒分析?清除文本詞云在一天和一周內(nèi)發(fā)布推文推特?cái)?shù)據(jù)的情感評(píng)分客戶推特的情感分析結(jié)論R中使用的軟件包

什么是情緒分析?情感分析是一種文本挖掘技術(shù),它為文本提供上下文,能夠從主觀抽象的源材料中理解信息,借助Facebook、Instagram等社交媒體平臺(tái)上的在線對(duì)話,幫助理解對(duì)品牌產(chǎn)品或服務(wù)的社會(huì)情感,推特或電子郵件。眾所周知,計(jì)算機(jī)不理解我們的通用語言,為了讓他們理解自然語言,我們首先將單詞轉(zhuǎn)換成數(shù)字格式。接下來我們將嘗試一步一步地去實(shí)現(xiàn)這一過程。清除文本我們已經(jīng)從Twitter下載了數(shù)據(jù)集,由于推特的文本形式包含了鏈接、hashtags、推特er句柄名稱和表情符號(hào),為了刪除它們,我們?cè)赗中編寫了函數(shù)ions。刪除這些無用信息后,所有文本都將轉(zhuǎn)換為小寫,刪除英語中沒有意義的停止詞(如冠詞、介詞等)、標(biāo)點(diǎn)符號(hào)和數(shù)字,然后再將它們轉(zhuǎn)換為文檔術(shù)語矩陣。文檔術(shù)語矩陣:是一個(gè)矩陣,包含每個(gè)單詞在每個(gè)文檔上出現(xiàn)的次數(shù)。removeURL <- function(x) gsub(“(f|ht)tp(s?)://S+”, “”, x, perl=T)
removeHashTags <- function(x) gsub(“#S+”, “”, x)
removeTwitterHandles <- function(x) gsub(“@S+”, “”, x)
removeSlash <- function(x) gsub(“n”,” “, x)
removeEmoticons <- function(x) gsub(“[^x01-x7F]”, “”, x)
data_pepsi$text <- iconv(data_pepsi$text, to = “utf-8”)
pepsi_corpus <- Corpus(VectorSource(data_pepsi$text))
pepsi_corpus <- tm_map(pepsi_corpus,tolower)
pepsi_corpus <- tm_map(pepsi_corpus,removeWords,stopwords(“en”))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeHashTags))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeTwitterHandles))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeURL))
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeSlash))
pepsi_corpus <- tm_map(pepsi_corpus,removePunctuation)
pepsi_corpus <- tm_map(pepsi_corpus,removeNumbers)
pepsi_corpus <- tm_map(pepsi_corpus,content_transformer(removeEmoticons))
pepsi_corpus <- tm_map(pepsi_corpus,stripWhitespace)
pepsi_clean_df <- data.frame(text = get(“content”, pepsi_corpus))
dtm_pepsi <- DocumentTermMatrix(pepsi_corpus)
dtm_pepsi <- removeSparseTerms(dtm_pepsi,0.999)
pepsi_df <- as.data.frame(as.matrix(dtm_pepsi))
data_cola$text <- iconv(data_cola$text, to = “utf-8”)
cola_corpus <- Corpus(VectorSource(data_cola$text))
cola_corpus <- tm_map(cola_corpus,tolower)
cola_corpus <- tm_map(cola_corpus,removeWords,stopwords(“en”))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeHashTags))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeTwitterHandles))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeURL))
cola_corpus <- tm_map(cola_corpus,content_transformer(removeSlash))
cola_corpus <- tm_map(cola_corpus,removePunctuation)
cola_corpus <- tm_map(cola_corpus,removeNumbers)
cola_corpus <- tm_map(cola_corpus,content_transformer(removeEmoticons))
cola_corpus <- tm_map(cola_corpus,stripWhitespace)
cola_clean_df <- data.frame(text = get(“content”, cola_corpus))
dtm_cola <- DocumentTermMatrix(cola_corpus)
dtm_cola <- removeSparseTerms(dtm_cola,0.999)
cola_df <- as.data.frame(as.matrix(dtm_cola))
詞云wordcloud是測(cè)試數(shù)據(jù)的一種表示形式,它通過增加測(cè)試數(shù)據(jù)的大小來突出顯示最常用的單詞,該技術(shù)用于將文本可視化為圖像,是單詞或標(biāo)簽的集合。在R中,可以使用worldcloud2包來實(shí)現(xiàn),以下是它的輸出代碼。word_pepsi_df <- data.frame(names(pepsi_df),colSums(pepsi_df))
names(word_pepsi_df) <- c(“words”,”freq”)
word_pepsi_df <- subset(word_pepsi_df, word_pepsi_df$freq > 0)
wordcloud2(data = word_pepsi_df,size = 1.5,color = “random-light”,backgroundColor = “dark”)
word_cola_df <- data.frame(names(cola_df),colSums(cola_df))
names(word_cola_df) <- c(“words”,”freq”)
word_cola_df <- subset(word_cola_df, word_cola_df$freq > 0)
wordcloud2(data = word_cola_df,size = 1.5,color = “random-light”,backgroundColor = “dark”)
百事可樂和可口可樂的推特?cái)?shù)據(jù)的詞云

正如我們所知,詞云中的詞大小取決于其在推特中的頻率,因此詞會(huì)不斷變化, just, native, right, racism很多出現(xiàn)在百事可樂客戶的推特中,而get和support等詞更多地出現(xiàn)在可口可樂客戶的推特中。在一天和一周內(nèi)發(fā)布推文由于推特收集的時(shí)間跨度超過一周,因此我們可以分析大多數(shù)用戶活躍或用戶在該品牌上發(fā)布最多推文的時(shí)間和工作日,這可以通過使用ggplot2庫的折線圖來可視化。下面是與輸出一起使用的函數(shù)data_pepsi$Date <- as.Date(data_pepsi$created_at)
data_pepsi$hour <- hour(data_pepsi$created_at)
data_pepsi$weekday<-factor(weekdays(data_pepsi$Date),levels=c(“Monday”,”Tuesday”,”Wednesday”,”Thursday”,”Friday”,”Saturday”,”Sunday”))
ggplot(data_pepsi,aes(x= hour)) + geom_density() + theme_minimal() + ggtitle(“Pepsi”)
ggplot(data_pepsi,aes(x= weekday)) + geom_bar(color = “#CC79A7”, fill = “#CC79A7”) + theme_minimal() +ggtitle(“Pepsi”) + ylim(0,1800)
data_cola$Date <- as.Date(data_cola$created_at)
data_cola$Day <- day(data_cola$created_at)
data_cola$hour <- hour(data_cola$created_at)
data_cola$weekday<-factor(weekdays(as.Date(data_cola$Date)),levels=c(“Monday”,”Tuesday”,”Wednesday”,”Thursday”,”Friday”,”Saturday”,”Sunday”))
ggplot(data_cola,aes(x= hour)) + geom_density() + theme_minimal() + ggtitle(“Coca-Cola”)
ggplot(data_cola,aes(x=
weekday)) + geom_bar(color = “#CC79A7”, fill = “#CC79A7”) + theme_minimal()

從上面的圖表中,我們可以看到百事可樂和可口可樂在下午3-4點(diǎn)和凌晨1點(diǎn)左右都出現(xiàn)了峰值,因?yàn)槿藗兿矚g在工作無聊或深夜使用社交媒體,這在我們的工作中是顯而易見的。

一周內(nèi)推特的分布情況

當(dāng)每日推文顯示在條形圖上時(shí),對(duì)于百事來說,周四是推特?cái)?shù)量最多的一天,這是因?yàn)樗麄儼l(fā)布了季度報(bào)告,但就可口可樂而言,周二我們看到的推特?cái)?shù)量最少。推特?cái)?shù)據(jù)的情感評(píng)分在本節(jié)中,我們把推特?cái)?shù)據(jù)分為積極的、消極的和中立的,這可以通過使用sendimentR包來實(shí)現(xiàn),該軟件包為每個(gè)詞典單詞分配一個(gè)從-1到+1的情感評(píng)分,并取推特中每個(gè)單詞的平均值,得到每個(gè)推特的最終情感評(píng)分。sentiments <- sentiment_by(get_sentences(pepsi_clean_df$text))
data$sentiment_score <- round(sentiments$ave_sentiment,2)
data$sentiment_score[data_pepsi$sentiment_score > 0] <- “Positive”
data$sentiment_score[data_pepsi$sentiment_score < 0] <- “Negative”
data$sentiment_score[data_pepsi$sentiment_score == 0] <- “Neutral”
data$sentiment_score <- as.factor(data$sentiment_score)
ggplot(data,aes(x = sentiment_score)) + geom_bar(color = “steelblue”, fill = “steelblue”) + theme_minimal()
幾乎75%的推特用戶都持肯定態(tài)度,因?yàn)檫@兩個(gè)品牌在他們的客戶中相當(dāng)受歡迎。顧客推特的情感分析推特的情緒是由Syuzhet軟件包執(zhí)行的,該軟件包根據(jù)十個(gè)情緒指數(shù)對(duì)每個(gè)詞典單詞進(jìn)行評(píng)分,包括憤怒、預(yù)期、厭惡、恐懼、喜悅、悲傷、驚訝、信任、消極和積極。如果我們把索引上每個(gè)詞的值加起來,所有推特的情緒都可以用條形圖表示。cols <- c(“red”,”pink”,”green”,”orange”,”yellow”,”skyblue”,”purple”,”blue”,”black”,”grey”)
pepsi_sentimentsdf <- get_nrc_sentiment(names(pepsi_df))
barplot(colSums(pepsi_sentimentsdf),
main = “Pepsi”,col = cols,space = 0.05,horiz = F,angle = 45,cex.a(chǎn)xis = 0.75,las = 2,srt = 60,border = NA)
cola_sentimentsdf <- get_nrc_sentiment(names(cola_df))
barplot(colSums(cola_sentimentsdf),
main = “Coca-Cola”,col = cols,space = 0.05,horiz = F,angle = 45,cex.a(chǎn)xis = 0.75,las = 2,srt = 60,border = NA)

上面的輸出是所有情緒在條形圖上的顯示,因?yàn)閺臈l形圖可以很清楚地看出,積極性對(duì)兩家公司都起主導(dǎo)作用,這進(jìn)一步加強(qiáng)了我們的上述假設(shè)。繼續(xù)跟蹤圖表中的變化可以作為對(duì)新產(chǎn)品或廣告的反饋。最常用詞word_pepsi_df$words <- factor(word_pepsi_df$words, levels = word_pepsi_df$words[order(word_pepsi_df$freq)])
word_cola_df$words <- factor(word_cola_df$words, levels = word_cola_df$words[order(word_cola_df$freq)])
ggplot(word_pepsi_df[1:15,],aes(x = freq, y = words)) + geom_bar(stat = “identity”, color = “#C4961A”,fill = “#C4961A”) + theme_minimal() + ggtitle(“Pepsi”)
ggplot(word_cola_df[1:15,],aes(x = freq, y = words)) + geom_bar(stat = “identity”, color = “#C4961A”,fill = “#C4961A”) + theme_minimal() + ggtitle(“Coca-Cola”)
createNgram <-function(stringVector, ngramSize){
ngram <- data.table()
ng <- textcnt(stringVector, method = “string”, n=ngramSize, tolower = FALSE)
if(ngramSize==1){
ngram <- data.table(w1 = names(ng), freq = unclass(ng), length=nchar(names(ng)))
}
else {
ngram <- data.table(w1w2 = names(ng), freq = unclass(ng), length=nchar(names(ng)))
}
return(ngram)
}
pepsi_bigrams_df <- createNgram(pepsi_clean_df$text,2)
cola_bigrams_df <- createNgram(cola_clean_df$text,2)
pepsi_bigrams_df$w1w2 <- factor(pepsi_bigrams_df$w1w2,levels = pepsi_bigrams_df$w1w2[order(pepsi_bigrams_df$freq)])
cola_bigrams_df$w1w2 <- factor(cola_bigrams_df$w1w2,levels = cola_bigrams_df$w1w2[order(cola_bigrams_df$freq)])
names(pepsi_bigrams_df) <- c(“words”, “freq”, “l(fā)ength”)
names(cola_bigrams_df) <- c(“words”, “freq”, “l(fā)ength”)
ggplot(pepsi_bigrams_df[1:15,],aes(x = freq, y = words)) + geom_bar(stat = “identity”, color = “#00AFBB”,fill = “#00AFBB”) + theme_minimal() + ggtitle(“Pepsi”)
ggplot(cola_bigrams_df[1:15,],aes(x = freq, y = words)) + geom_bar(stat = “identity”, color = “#00AFBB”,fill = “#00AFBB”) + theme_minimal() + ggtitle(“Coca-Cola”)

二元語法二元語法是一對(duì)字詞,當(dāng)句子被拆分成兩個(gè)字詞時(shí)產(chǎn)生的。獲取單詞的上下文是有用的,因?yàn)閱蝹(gè)單詞通常不提供任何上下文。

結(jié)論我們可以看到,從現(xiàn)有的社交媒體參與度來看,公司可以分析客戶的情緒,并據(jù)此制定業(yè)務(wù)戰(zhàn)略,來用于制定公司決策(例如啟動(dòng)產(chǎn)品線)。

聲明: 本文由入駐維科號(hào)的作者撰寫,觀點(diǎn)僅代表作者本人,不代表OFweek立場(chǎng)。如有侵權(quán)或其他問題,請(qǐng)聯(lián)系舉報(bào)。

發(fā)表評(píng)論

0條評(píng)論,0人參與

請(qǐng)輸入評(píng)論內(nèi)容...

請(qǐng)輸入評(píng)論/評(píng)論長(zhǎng)度6~500個(gè)字

您提交的評(píng)論過于頻繁,請(qǐng)輸入驗(yàn)證碼繼續(xù)

  • 看不清,點(diǎn)擊換一張  刷新

暫無評(píng)論

暫無評(píng)論

人工智能 獵頭職位 更多
掃碼關(guān)注公眾號(hào)
OFweek人工智能網(wǎng)
獲取更多精彩內(nèi)容
文章糾錯(cuò)
x
*文字標(biāo)題:
*糾錯(cuò)內(nèi)容:
聯(lián)系郵箱:
*驗(yàn) 證 碼:

粵公網(wǎng)安備 44030502002758號(hào)