• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Menu

The Development of a Methodology for the Network and Semantic Analysis of Blogs for Sociological Purposes

Project Head: E. Koltsova

Project Participants: L. Pivovarova, K. Maslinsky, E. Tereshenko, Y. Pavlova, and T. Efimova

This project, which researches the discussion of socially important topics in the Russian blogosphere, is being carried out with the support of the HSE Scientific Foundation as part of the contest ‘Teachers and Students 2011–2012’. The researchers collected and analysed an array of Internet data. The long-term goal of this project is focused on developing a new method that would raise the quality of social Internet research to the next level.

The research showed that the top blog posts are dedicated to permanent themes associated with private and recreational issues, and to variable topics on sociopolitical issues. A third part of posts represents noise that is impossible to interpret. In December 2011, public interest in protests and elections increased due to the narrowing of the other socio-political topics. By April 2012, the interest in protests and elections slightly decreased, but socio-political themes expanded due to the noise. The research also showed that commenting communities depend somewhat on the author of the posts published in the community. Thus, communication was usually organized around people, not interests.

While examining a large number of algorithms and software products, the researchers found out that the methodology for analysing large text and network data is in the development stage and, at present, does not take the form of ready-made software products with transparent, tested algorithms suitable for use in social research. The research required that custom software be written. Thus, about ten different models and/or scripts for downloading, preprocessing, and analysing data were created.

Software

gCluto is a graphical version of the programme Cluto, academic software designed for the offline clustering of rather large texts, based on the bag-of-words approach. The programme contains 17 algorithms, including flat and hierarchical clustering and graph-based algorithms.

The Stanford Topic Modeling Toolbox is a programme for topic modeling for sociologists and other researchers wanting to analyse data, mostly texts.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.