Availability of alternative information is often said to induce social discontent and to give rise to protest forms of political participation. But does this relation really exist, and is it universal? In contrast to previous studies, where generalized Internet use is most often a proxy for online information consumption and general political participation is a proxy for protest participation, we render a test of relationship specifically between online news consumption and protest participation. We explore self-reported cross-sectional data for 48 nations. The analysis provides empirical evidence that the likelihood of individual protest participation is positively associated with online news consumption. The study also shows that the magnitude of the effect varies depending on a political context: surprisingly, despite total control offline as well as online media, autocratic countries demonstrated effects of online news higher than in hybrid regimes where civilians usually have the access to Internet media that provide information which is alternative to the pro-government news agenda.
The advent of personalized medicine and wide-scale drug tests has led to the development of methods intended to automatically mine and extract information regarding drug reactions from user reviews. For medical purposes, it is often important to know demographic information on the authors of these reviews; however, existing studies usually either presuppose that this information is available or disregard the issue. We study automatic mining of demographic information from user-generated texts, comparing modern natural language processing techniques, including extensions of topic models and deep neural networks, for this problem on datasets mined from health-related web sites.
The ability of social media to rapidly disseminate judgements on ethnicity and to influence offline ethnic relations creates demand for the methods of automatic monitoring of ethnicity-related online content. In this study we seek to measure the overall volume of ethnicity-related discussion in the Russian-language social media and to develop an approach that would automatically detect various aspects of attitudes to those ethnic groups. We develop a comprehensive list of ethnonyms and related bigrams that embrace 97 Post-Soviet ethnic groups and obtain all messages containing one of those words from a two-year period from all Russian-language social media (N=2,660,222 texts). We hand-code 7,181 messages where rare ethnicities are over-represented and train a number of classifiers to recognize different aspects of authors’ attitudes and other text features. After calculating a number of standard quality metrics, we find that we reach good quality in detecting intergroup conflict, positive intergroup contact, and overall negative and positive sentiment. Relevance to the topic of ethnicity and general attitude to an ethnic group are least well predicted, while some aspects such as calls for violence against an ethnic group are not sufficiently present in the data to be predicted.
An important role of digital inequality for hindering the development of civil society is being increasingly acknowledged. Simultaneously, differ-ences in availability and the practices of use of social network sites (SNS) may be considered as major manifestations of such digital divide. While SNS are in principle highly convenient spaces for public discussion, lack of access or dom-ination by socially insignificant small talk may indicate underdevelopment of the public sphere. At the same time, agenda differences between regions may signal about local problems. In this study we seek to find out whether regional digital divide exists in such a large country as Russia. We start from a theory of uneven modernization of Russia and use the data from its most popular SNS “VK.com” as a proxy for measuring digital inequality. By analyzing user activi-ty data from a sample of 77,000 users and texts from a carefully selected sub-sample of 36,000 users we conclude that regional level explains an extremely small share of variance in the overall variation of behavioral user data. A nota-ble exception is attention to the topics of Islam and Ukraine. However, our data reveal that historically geographical penetration of “VK.com” proceeded from the regions considered the most modernized to those considered the most tradi-tional. This finding supports the theory of uneven modernization, but it also shows that digital inequality is subject to change with time.
Packet processing increasingly involves heterogeneous requirements. We consider the well-known model of a shared memory switch with bounded-size buffer and generalize it in two directions. First, we consider unit-sized packets labeled with an output port and a processing requirement (i.e., packets with heterogeneous processing), maximizing the number of transmitted packets. We analyze the performance of buffer management policies under various characteristics via competitive analysis that provides uniform guarantees across traffic patterns (Borodin and ElYaniv 1998). We propose the Longest-Work-Drop policy and show that it is at most 2-competitive and at least -competitive. Second, we consider another generalization, posed as an open problem in Goldwasser (2010), where each unit-sized packet is labeled with an output port and intrinsic value, and the goal is to maximize the total value of transmitted packets. We show first results in this direction and define a scheduling policy that, as we conjecture, may achieve constant competitive ratio. We also present a comprehensive simulation study that validates our results.
With rapid growth of online social network sites, the issue of health-related online communities and its social and behavioral implications has become increasingly important for public health. Unfortunately, online communities often become vehicles for promotion of pernicious misinformation, in particular, that HIV-virus is a myth (AIDS-denialism). This study seeks to explore online users’ behavior and interactions within AIDS-denialists community to identify and estimate the number of those, who potentially are most susceptible to AIDS-denialists arguments - “the risk group” in terms of becoming AIDS-denialists. Social network analysis was used for examining the most numerous AIDS-denialist community (over 15,000 members) in the most popular Russian SNS “VK.com”. In addition, content-analysis was used for collecting data on attitudes towards AIDS-denialists arguments and participants’ self-disclosed HIV-status. Two datasets were collected to analyze friendship ties and communication interactions among community members. We have identified the core of online community - cohesive and dedicated AIDS-denialists, and the risk group: users who communicate with core members, and, thus, can be more susceptible to the AIDS-denialist propaganda and their health behaviours (e.g. refusing treatment). Analysis allowed to significantly reduce the target audience for possible intervention campaigns and simultaneously increase the accuracy of determining the risk group composition.
Internet Studies is an interdisciplinary and multidisciplinary field of fundamental and applied research that integrate different research disciplines with a common object, that is the Internet. This review article gives a definition and a brief description of the structure of Internet Studies as part of the social sciences and introduces research agenda of this field, including most cutting edge research issues. The agenda of Internet Studies related to classical sociological issues are analyzed in more detail: inequality, online communities and social capital as well as topics related to the study of transformations in different spheres of society - politics, public health and medicine, education. Two main theoretical approaches are briefly described, within which the influence of the Internet on society is interpreted: the network society theory and critical theory of the Internet and society. We conclude that the present directions of Internet research have many intersections with each other, and the perspective of a more complete study of the mechanisms, that mediate social changes related to the Internet and connect online and offline sociality into a single space, opens at these intersections.
This paper provides the reader with a report on 9th Russian Summer School in Information Retrieval (RuSSIR 2015).
Automatic assessment of sentiment in large text corpora is an important goal in social sciences. This paper describes a methodology and the results of the development of a system for Russian language sentiment analysis that includes: a publicly available sentiment lexicon, a publicly available test collection with sentiment markup and a crowdsourcing website for such markup. The lexicon is aimed at detecting sentiment in user-generated content (blogs, social media) related to social and political issues. Its proto- type was formed based on other dictionaries and on the topic modeling per- formed on a large collection of blog posts. Topic modeling revealed relevant (social and political) topics and as a result—relevant words for the lexicon prototype and relevant texts for the training collection. Each word was as- sessed by at least three volunteers in the context of three di erent texts where the word occurred while the texts received their sentiment scores from the same volunteers as well. Both texts and words were scored from −2 (negative) to +2 (positive). Of 7,546 candidate words, 2,753 got non-neu- tral sentiment scores. The quality of the lexicon was assessed with SentiSt- rength software by comparing human text scores with the scores obtained automatically based on the created lexicon. 93% of texts were classi ed correctly at the error level of ±1 class, which closely matches the result of SentiStrength initial application to the English language tweets. Negative classes were much larger and better predicted. The lexicon and the text col- lection are publicly available at http://linis-crowd.org.
Applicability limits of the particle-in-сell (PIC) method for the calculation of jet gasdynamic flows under conditions of pressure variations by four or five orders of magnitude are studied. Three approaches permitting one to determine real limits of the model adequacy from the side of low pressures are considered. Based on the analysis of the results, it is shown that the PIC method adequately operates in the pressure range of 5–105 Pa in spite of the fact that, formally, the PIC method can operate also at lower pressures.
In this work, we compare two extensions of two different topic models for the same problem of recommending full-text items: previously developed SVD-LDA and its counterpart SVD-ARTM based on additive regularization. We show that ARTM naturally leads to the inference algorithm that has to be painstakingly developed for LDA.
Buffering architectures and policies for their efficient management constitute one of the core ingredients of a network architecture. In this work we introduce a new specification language, BASEL, that allows to express virtual buffering architectures and management policies representing a variety of economic models. BASEL does not require the user to implement policies in a high-level language; rather, the entire buffering architecture and its policy are reduced to several comparators and simple functions. We show examples of buffer management policies in BASEL and demonstrate empirically the impact of various settings on performance.
Purpose – The paper addresses the problem of what drives the formation of latent discussion communities, if any, in the blogosphere: topical composition of posts or their authorship? The purpose of this paper is to contribute to the knowledge about structure of co-commenting.
Design/methodology/approach – The research is based on a dataset of 17,386 full text posts written by top 2,000 LiveJournal bloggers and over 520,000 comments that result in about 4.5 million edges in the network of co-commenting, where posts are vertices. The Louvain algorithm is used to detect communities of co-commenting. Cosine similarity and topic modeling based on latent Dirichlet allocation are applied to study topical coherence within these communities.
Findings – Bloggers unite into moderately manifest communities by commenting roughly the same sets of posts. The graph of co-commenting is sparse and connected by a minority of active non-top commenters. Communities are centered mainly around blog authors as opinion leaders and, to a lesser extent, around a shared topic or topics.
Research limitations/implications – The research has to be replicated on other datasets with more thorough hand coding to ensure the reliability of results and to reveal average proportions of topic-centered communities.
Practical implications – Knowledge about factors around which co-commenting communities emerge, in particular clustered opinion leaders that often attract such communities, can be used by policy makers in marketing and/or political campaigning when individual leadership is not enough or not applicable.
Originality/value – The research contributes to the social studies of online communities. It is the first study of communities based on co-commenting that combines examination of the content of commented posts and their topics.
We study topic models designed to be used for sentiment analysis, i.e., models that extract certain topics (aspects) from a corpus of documents and mine sentiment-related labels related to individual aspects. For both direct applications in sentiment analysis and other uses, it is desirable to have a good lexicon of sentiment words, preferably related to different aspects in the words. We have previously developed a modification for several popular sentiment-related LDA extensions that trains prior hyperparameters β for specific words. We continue this work and show how this approach leads to new aspect-specific lexicons of sentiment words based on a small set of “seed” sentiment words; the lexicons are useful by themselves and lead to improved sentiment classification.
We present a novel approach to analyze and visualize opinion polarisation on Twitter based on graph features of communication networks extracted from tweets. We show that opinion polarisation can be legibly observed on unimodal projections of artificially created bimodal networks, where the most popular users in retweet and mention networks are considered nodes of the second mode. For this purpose, we select a subset of top users based on their PageRank values and assign them to be the second mode in our networks, thus called pseudo-bimodal. After projecting them onto the set of “bottom” users and vice versa, we get unimodal networks with more distinct clusters and visually coherent community separation. We developed our approach on a dataset gathered during the Russian protest meetings on 24th of December, 2011 and tested it on another dataset by Conover  used to analyze political polarisation, showing that our approach not only works well on our data but also improves the results from previous research on that phenomenon.
Efficient packet classification is a core concern for network services. Traditional multi-field classification approaches, in both software and ternary content-addressable memory (TCAMs), entail tradeoffs between (memory) space and (lookup) time. TCAMs cannot efficiently represent range rules, a common class of classification rules confining values of packet fields to given ranges. The exponential space growth of TCAM entries relative to the number of fields is exacerbated when multiple fields contain ranges. In this work, we present a novel approach which identifies properties of many classifiers which can be implemented in linear space and with worst-case guaranteed logarithmic time and allows the addition of more fields including range constraints without impacting space and time complexities. On real-life classifiers from Cisco Systems and additional classifiers from ClassBench (with real parameters), 90–95% of rules are thus handled, and the other 5–10% of rules can be stored in TCAM to be processed in parallel.
In this paper, we describe the rules and results of the FactRuEval informa- tion extraction competition held in 2016 as part of the Dialogue Evaluation initiative in the run-up to Dialogue 2016. The systems were to extract in- formation from Russian texts and competed in two named entity extraction tracks and one fact extraction track. The paper describes the tasks set be- fore the participants and presents the scores achieved by the contending systems. Additionally, we dwell upon the scoring methods employed for evaluating the results of all the three tracks and provide some preliminary analysis of the state of the art in Information Extraction for Russian texts. We also provide a detailed description of the composition and general orga- nization of the annotated corpus created for the competition by volunteers using the OpenCorpora.org platform. The corpus is publicly available and is expected to evolve in the future.