A Sentiment Analysis of LiveJournal Posts
Project Head: Y. Pavlova
Project Participants: K. Maslinsky, V. Seneva, T. Efimova, and E. Tereshenko
This project is carried out with the support of the HSE Academic Development Foundation in Saint Petersburg in 2012.
Sentiment analysis or tonal analysis concerns the automatic identification of the tone of a text, in other words, the prevalence in it of an emotion or attitude (negative, positive, neutral, or complex). The researchers focused on approbation of this method to determine attitudes toward the theme of Islam in blog comments made in the Russian language. The researchers adapted the Sentistrength software product to the Russian language, and approbated it on Russian language data. The adaptation process included translating the English software vocabulary into Russian, recording the frequency of vocabulary based on the comments in LiveJournal, inserting frequency words in the vocabulary, and coding it on an emotions scale of -5 to 5.
The approbation was based on three samplings of comments in LiveJournal. Each of the samplings contained 1,000 comments left on the posts of LiveJournal’s top bloggers. Two samplings extracted between August 15 and September 15 and during December 2011 were conditionally called ‘Islamic’, as the posts contained the roots of the words ‘Islam’ and ‘Muslim’. The third controlling sampling was randomly formed during both time periods. It helped to determine whether the emotional messages of comments left about Islamic topics differed from the emotional message of random comments.
Frequency distributions obtained from the automatic text analysis showed that hardly any of the comments made in both the ‘Islamic’ samplings during the two time periods and the random sampling display obvious emotional coloring; the majority of comments were rated 0 and 1. Coders used text coding to verify the results. The results of the manual coding led researchers to conclude that despite a similar prevalence of 0 and 1 comment ratings, significantly more comments were coded -3, -4, and -5, in comparison with an automatic analysis. The biggest difference was noticed in the September sampling, during Uraza bayram (a Muslim religious holiday). This result indicates that this topic caused emotional resonance among LiveJournal users; the comments were largely negative.
The difference between automatic text analysis and manual coding can be explained by the fact that the compiled dictionary doesn’t include all of the basic words that can express emotional attitudes in blogs, especially regarding Islam. The SentiStrength software product analyses only separate words, disregarding grammar and context. Thus, the automatic method of text research helped analyse big data and provided quick results, which helped determine the degree of social stress towards a topic, an event, or a person. On the other hand, it was difficult to take into account all the aspects during vocabulary compilation, which launched the programme. The researchers were also engaged in the process of eternal improving of the vocabulary and data verifying.
Sentistrength is a software product used for sentiment analysis. It was developed by Mike Thelwall, Head of the Statistical Cybernetics Research Group at the University of Wolverhampton, and Research Associate of the Oxford Internet Institute in Great Britain.
Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.