The first sentiment lexicon for analyzing Internet-documents in Russia
Under the project supported by the Russian Humanitarian Scientific Fund LINIS research team has created the sentiment lexicon for analyzing Internet-documents.
Nowadays it is the only socio-political-oriented lexicon in Russia in the sphere of automatic sentiment analysis.
This scientific work gives social researchers an opportunity of analyzing data on Internet-involved population attitude to critical social problems at a new level.
The lexicon is based on data hosted on the blog platform Live Journal which were selected by the topic-modeling method.
The LINIS researchers have created the crowdsourcing online-platform LINIS CROWD for about 90 volunteers from 16 cities to make the sentiment mark-up around 20 000 posts and 8000 words.
The quality test of the lexicon was done by matching the benchmark texts to the results obtained by the software SentiStrength.
The prediction quality turned out comparable with the results of SentiStrength developer M. Thelwall obtained by the analysis of short English texts.
The lexicon being publicly available, research workers and developers can use it for benchmarking.
LINIS research worker are planning to use the obtained data for automatic machine-leaning sentiment analysis in the future.
According to Sergei Koltcov, one of the perspective avenue of research is the analysis of comments on posts in Live Journal as this type of content frequently contains the most emotional vocabulary.