Packet processing increasingly involves heterogeneous requirements. We consider the well-known model of a shared memory switch with bounded-size buffer and generalize it in two directions. First, we consider unit-sized packets labeled with an output port and a processing requirement (i.e., packets with heterogeneous processing), maximizing the number of transmitted packets. We analyze the performance of buffer management policies under various characteristics via competitive analysis that provides uniform guarantees across traffic patterns (Borodin and ElYaniv 1998). We propose the Longest-Work-Drop policy and show that it is at most
This paper provides the reader with a report on 9th Russian Summer School in Information Retrieval (RuSSIR 2015).
Automatic assessment of sentiment in large text corpora is an important goal in social sciences. This paper describes a methodology and the results of the development of a system for Russian language sentiment analysis that includes: a publicly available sentiment lexicon, a publicly available test collection with sentiment markup and a crowdsourcing website for such markup. The lexicon is aimed at detecting sentiment in user-generated content (blogs, social media) related to social and political issues. Its proto- type was formed based on other dictionaries and on the topic modeling per- formed on a large collection of blog posts. Topic modeling revealed relevant (social and political) topics and as a result—relevant words for the lexicon prototype and relevant texts for the training collection. Each word was as- sessed by at least three volunteers in the context of three di erent texts where the word occurred while the texts received their sentiment scores from the same volunteers as well. Both texts and words were scored from −2 (negative) to +2 (positive). Of 7,546 candidate words, 2,753 got non-neu- tral sentiment scores. The quality of the lexicon was assessed with SentiSt- rength software by comparing human text scores with the scores obtained automatically based on the created lexicon. 93% of texts were classi ed correctly at the error level of ±1 class, which closely matches the result of SentiStrength initial application to the English language tweets. Negative classes were much larger and better predicted. The lexicon and the text col- lection are publicly available at http://linis-crowd.org.
Applicability limits of the particle-in-сell (PIC) method for the calculation of jet gasdynamic flows under conditions of pressure variations by four or five orders of magnitude are studied. Three approaches permitting one to determine real limits of the model adequacy from the side of low pressures are considered. Based on the analysis of the results, it is shown that the PIC method adequately operates in the pressure range of 5–105 Pa in spite of the fact that, formally, the PIC method can operate also at lower pressures.
Buffering architectures and policies for their efficient management constitute one of the core ingredients of a network architecture. In this work we introduce a new specification language, BASEL, that allows to express virtual buffering architectures and management policies representing a variety of economic models. BASEL does not require the user to implement policies in a high-level language; rather, the entire buffering architecture and its policy are reduced to several comparators and simple functions. We show examples of buffer management policies in BASEL and demonstrate empirically the impact of various settings on performance.
Purpose – The paper addresses the problem of what drives the formation of latent discussion communities, if any, in the blogosphere: topical composition of posts or their authorship? The purpose of this paper is to contribute to the knowledge about structure of co-commenting.
Design/methodology/approach – The research is based on a dataset of 17,386 full text posts written by top 2,000 LiveJournal bloggers and over 520,000 comments that result in about 4.5 million edges in the network of co-commenting, where posts are vertices. The Louvain algorithm is used to detect communities of co-commenting. Cosine similarity and topic modeling based on latent Dirichlet allocation are applied to study topical coherence within these communities.
Findings – Bloggers unite into moderately manifest communities by commenting roughly the same sets of posts. The graph of co-commenting is sparse and connected by a minority of active non-top commenters. Communities are centered mainly around blog authors as opinion leaders and, to a lesser extent, around a shared topic or topics.
Research limitations/implications – The research has to be replicated on other datasets with more thorough hand coding to ensure the reliability of results and to reveal average proportions of topic-centered communities.
Practical implications – Knowledge about factors around which co-commenting communities emerge, in particular clustered opinion leaders that often attract such communities, can be used by policy makers in marketing and/or political campaigning when individual leadership is not enough or not applicable.
Originality/value – The research contributes to the social studies of online communities. It is the first study of communities based on co-commenting that combines examination of the content of commented posts and their topics.
We present a novel approach to analyze and visualize opinion polarisation on Twitter based on graph features of communication networks extracted from tweets. We show that opinion polarisation can be legibly observed on unimodal projections of artificially created bimodal networks, where the most popular users in retweet and mention networks are considered nodes of the second mode. For this purpose, we select a subset of top users based on their PageRank values and assign them to be the second mode in our networks, thus called pseudo-bimodal. After projecting them onto the set of “bottom” users and vice versa, we get unimodal networks with more distinct clusters and visually coherent community separation. We developed our approach on a dataset gathered during the Russian protest meetings on 24th of December, 2011 and tested it on another dataset by Conover  used to analyze political polarisation, showing that our approach not only works well on our data but also improves the results from previous research on that phenomenon.
Efficient packet classification is a core concern for network services. Traditional multi-field classification approaches, in both software and ternary content-addressable memory (TCAMs), entail tradeoffs between (memory) space and (lookup) time. TCAMs cannot efficiently represent range rules, a common class of classification rules confining values of packet fields to given ranges. The exponential space growth of TCAM entries relative to the number of fields is exacerbated when multiple fields contain ranges. In this work, we present a novel approach which identifies properties of many classifiers which can be implemented in linear space and with worst-case guaranteed logarithmic time and allows the addition of more fields including range constraints without impacting space and time complexities. On real-life classifiers from Cisco Systems and additional classifiers from ClassBench (with real parameters), 90–95% of rules are thus handled, and the other 5–10% of rules can be stored in TCAM to be processed in parallel.
In this paper, we describe the rules and results of the FactRuEval informa- tion extraction competition held in 2016 as part of the Dialogue Evaluation initiative in the run-up to Dialogue 2016. The systems were to extract in- formation from Russian texts and competed in two named entity extraction tracks and one fact extraction track. The paper describes the tasks set be- fore the participants and presents the scores achieved by the contending systems. Additionally, we dwell upon the scoring methods employed for evaluating the results of all the three tracks and provide some preliminary analysis of the state of the art in Information Extraction for Russian texts. We also provide a detailed description of the composition and general orga- nization of the annotated corpus created for the competition by volunteers using the OpenCorpora.org platform. The corpus is publicly available and is expected to evolve in the future.
High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites in tissues, cell cultures, and agar plates with cellular resolution, but it is hampered by the lack of bioinformatics tools for automated metabolite identification. We report pySM, a framework for false discovery rate (FDR)-controlled metabolite annotation at the level of the molecular sum formula, for high-mass-resolution imaging mass spectrometry (https://github.com/alexandrovteam/pySM). We introduce a metabolite-signal match score and a target–decoy FDR estimate for spatial metabolomics.
Current social structures can be described more effectively with reference to value orientations, consumer patterns and Internet use rather than classic demographics. This approach to social stratification results into the idea of social milieus more flexible than the picture provided by rigid class categorisations. Social milieus differ in many respects; we argue that they also differ in their media diets. In the 21st century, Russia is a fundamentally fragmented society with post-industrial, industrial, rural and migrant communities showing divergent relations to state social policies as well as varying patterns of public deliberation and consumption, including media use. Social fragmentation is, thus, mirrored in the fragmentation of the media systems; moreover, one more dimension, namely media hybridisation, intervenes and influences the formation of closed-up communicative milieus based on both social patterns and digital divide. Of the several societal milieus observed by social scientists in Russia, some are seriously under-represented in the media system; and deep differences in media consumption, agenda setting, and public deliberation exist between all of them. Recently, a major value-based societal cleavage was revealed during the 2011-2012 protest rallies within the "For fair elections/white-ribbon" movement. Our research in to the media consumption patterns of the participants shows a correlation between media usepatterns in the post-industrial urban "public counter-sphere" (consisting of the intelligentsia, the "creative class", students and other white-collar workers) and their perceived political freedom and self-reported online political behaviour. The research is expanded throughsearches for echo chambers and/or opinion crossroads in Russian Facebook vs. its Russian analogue Vkontakte. Results of an online survey with participants of the protest rallies (N=652), 11 in-depth interviews and 5 expert interviews were used to interpret the relations between self-reported media consumption dynamics and perceived political behaviour. The results show that the media diet of protest participants indicates a strong preference for several media clusters, especially social media, oppositional, and alternative-agenda media, while the consumption of traditional media and video is either plummeting or irrelevant. Facebook is flagged up as an echo chamber facilitating the protests.
A new variant of the method of probability density distribution recovery for solving topical modeling problems is described. Disadvantages of the Gibbs sampling algorithm are considered, and a modified variant, called the “granulated sampling method,” is proposed. Based on the results of statistical modeling, it is shown that the proposed algorithm is characterized by higher stability as compared to other variants of Gibbs sampling.
The Internet routing ecosystem is facing substantial scalability challenges on the data plane. Various “clean slate” architectures for representing forwarding tables (FIBs), such as IPv6, introduce additional constraints on efficient implementations from both lookup time and memory footprint perspectives due to significant classification width. In this work, we propose an abstraction layer able to represent IPv6 FIBs on existing IP and even MPLS infrastructure. Feasibility of the proposed representations is confirmed by an extensive simulation study on real IPv6 forwarding tables, including low-level experimental performance evaluation.
This book constitutes the thoroughly refereed proceedings of the 8 th Russian Summer School on Information Retrieval, RuSSIR 2014, held in Nizhniy Novgorod, Russia, in August 2014.
The 14 papers presented were selected from various submissions. The papers focus on visualization for information retrieval along with other topics related to information retrieval.
We consider the fundamental problem of managing a bounded size queue buffer where traffic consists of packets of varying size, each packet requires several rounds of processing before it can be transmitted out, and the goal is to maximize the throughput, i.e., total size of successfully transmitted packets. Our work addresses the tension between two conflicting algorithmic approaches: favoring packets with fewer processing requirements as opposed to packets of larger size. We present a novel model for studying such systems and study the performance of online algorithms that aim to maximize throughput.
Social studies of the Internet have adopte large-scale text mining for unsupervised discovery o topics related to specific subjects. A recently develope approach to topic modeling, additive regularizatio of topic models (ARTM), provides fast inference an more control over the topics with a wide variety o possible regularizers than developing LDA extensions We apply ARTM to mining ethnic-related conten from Russian-language blogosphere, introduce a ne combined regularizer, and compare models derived fro ARTM with LDA. We show with human evaluations tha ARTM is better for mining topics on specific subjects finding more relevant topics of higher or comparabl quality.