The International Symposium on Web Algorithms (iSWAG) is dedicated to all academic and industrial researchers working on algorithmic problems related to the web. The aim of iSWAG 2015 is to cover as completely as possible the field of research on algorithms for solving web related problems.
Information produced and shared on social networks constitute a valuable source for inferring trends and events in the real world. In this paper we show how this can be exploited concretely through quantitative analysis of social content. We present an analysis of the statistical correlation between the security prices of some IT companies and the performance polarity of the same companies as expressed in tweets. Companies’ performance polarities are obtained by applying Conditional Random Fields to the considered streams of tweets. An evaluation of both the classification model and the performed regression analysis is also presented.paper
Research paper recommender systems (RSs) aim to alleviate information overload for researchers. Existing approaches using collaborative filtering or hybrid approaches typically allow only one rating criterion (overall liking) for users to evaluate papers. We conducted a focus group qualitative study to explore the most important criteria for rating research papers that can be used to control the paper recommendation by enabling users to set the weight for each criterion. We investigate also the effect of using different rating criteria on the user interface design and how the user can control the weight of the criteria.paper
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (MASKED VBYTE) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed.paper
Perception the visual layout of a web page and analyzing its DOM structure is a fundamental step in automatic adaptation process of web pages. In this paper, we present a new hybrid web page segmentation algorithm dedicated to vibro-tactile access on touch-screen devices. This hybrid algorithm combines three approaches for adapting web page contents to be presented on touchscreen devices, and for testing new navigation paradigm. The proposed algorithm is essential in our framework which aims to enhance the ability of Visually Impaired Persons (VIP) to navigate the Web by converting web pages into vibrating pages using a graphical vibro-tactile language. We present a comparison between automatic segmented pages (obtained by our proposed algorithm) and manual segmented pages. The objectives of this comparison are, on the one hand, to know how users understand web layout structures based on their visual perception, and on the other hand, to explore the main differences between automatic and manual segmentation.paper
Approximate auto complete facilitates and speeds up input writing by offering a list of suggestions which complements the few characters typed by the user. In this paper we present a method based on a trie to do an efficient 1 edit error approximate auto complete in client server architecture, and we discuss different strategies to improve the efficiency of auto completion, depending on different scenarios. We also propose a method that reduces the number of outgoing transition tested in each node - especially in the first levels - of the trie. It uses a hash based index to generate candidate characters to be tested at each branch. We present a library (named appacolib), in fact a set of different language libraries, to be of use either on the server or the client browser or both to rapidly answer approximate requests on an UTF-8 dictionary.paper
Term weighting schemes are commonly used in information retrieval field to extract the most relevant terms of documents. The main contribution of this paper consists in defining a new term weighting scheme based on entropy. We believe that this scheme is particularly well adapted to compare queries from e-commerce sites. These queries have their own specificities. They tend to be short and a large proportion of them are unique queries, i.e. have no historical record. We claim that widely used weighting schemes, such as tf-idf, are not well-adapted to this kind of queries. This claim is backed up by numerical experiments where the proposed entropy-based approach is incorporated into a collaborative filtering framework. In this framework, well suited to e-commerce search engines, we found out, on real e-commerce purchase data, that the proposed weighting scheme outperforms the tf-idf weighting scheme.paper