iSWAG 2015

iSWAG Symposium

The International Symposium on Web Algorithms (iSWAG) is dedicated to all academic and industrial researchers working on algorithmic problems related to the web. The aim of iSWAG 2015 is to cover as completely as possible the field of research on algorithms for solving web related problems.

Accepted full papers

Correlation between Stock Prices and polarity of companies’ performance in Tweets : a CRF-based Approach

Ekaterina Shabunina

Information produced and shared on social networks constitute a valuable source for inferring trends and events in the real world. In this paper we show how this can be exploited concretely through quantitative analysis of social content. We present an analysis of the statistical correlation between the security prices of some IT companies and the performance polarity of the same companies as expressed in tweets. Companies’ performance polarities are obtained by applying Conditional Random Fields to the considered streams of tweets. An evaluation of both the classification model and the performed regression analysis is also presented.


Multi-dimensional Ratings for Research Paper Recommender Systems: A Qualitative Study

Shaikhah Al-Otaibi and Julita Vassileva

Research paper recommender systems (RSs) aim to alleviate information overload for researchers. Existing approaches using collaborative filtering or hybrid approaches typically allow only one rating criterion (overall liking) for users to evaluate papers. We conducted a focus group qualitative study to explore the most important criteria for rating research papers that can be used to control the paper recommendation by enabling users to set the weight for each criterion. We investigate also the effect of using different rating criteria on the user interface design and how the user can control the weight of the criteria.


Vectorized VByte Decoding

Jeff Plaisance, Nathan Kurz and Daniel Lemire

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (MASKED VBYTE) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed.


Supervised Segmentation of Web Pages for Vibro-Tactile Access on Touch-Screen Devices

Waseem Safi, Fabrice maurel, Jean-Marc Routoure, Pierre Beust and Gaël Dias

Perception the visual layout of a web page and analyzing its DOM structure is a fundamental step in automatic adaptation process of web pages. In this paper, we present a new hybrid web page segmentation algorithm dedicated to vibro-tactile access on touch-screen devices. This hybrid algorithm combines three approaches for adapting web page contents to be presented on touchscreen devices, and for testing new navigation paradigm. The proposed algorithm is essential in our framework which aims to enhance the ability of Visually Impaired Persons (VIP) to navigate the Web by converting web pages into vibrating pages using a graphical vibro-tactile language. We present a comparison between automatic segmented pages (obtained by our proposed algorithm) and manual segmented pages. The objectives of this comparison are, on the one hand, to know how users understand web layout structures based on their visual perception, and on the other hand, to explore the main differences between automatic and manual segmentation.


JQuery UI like approximate autocomplete

Ibrahim Chegrane, Djamal Belazzougui and Mathieu Raffinot

Approximate auto complete facilitates and speeds up input writing by offering a list of suggestions which complements the few characters typed by the user. In this paper we present a method based on a trie to do an efficient 1 edit error approximate auto complete in client server architecture, and we discuss different strategies to improve the efficiency of auto completion, depending on different scenarios. We also propose a method that reduces the number of outgoing transition tested in each node - especially in the first levels - of the trie. It uses a hash based index to generate candidate characters to be tested at each branch. We present a library (named appacolib), in fact a set of different language libraries, to be of use either on the server or the client browser or both to rapidly answer approximate requests on an UTF-8 dictionary.


An entropy-based term weighting scheme and its application in e-commerce search engines

Yang Jiao, Jérémie Jakubowicz and Matthieu Cornec

Term weighting schemes are commonly used in information retrieval field to extract the most relevant terms of documents. The main contribution of this paper consists in defining a new term weighting scheme based on entropy. We believe that this scheme is particularly well adapted to compare queries from e-commerce sites. These queries have their own specificities. They tend to be short and a large proportion of them are unique queries, i.e. have no historical record. We claim that widely used weighting schemes, such as tf-idf, are not well-adapted to this kind of queries. This claim is backed up by numerical experiments where the proposed entropy-based approach is incorporated into a collaborative filtering framework. In this framework, well suited to e-commerce search engines, we found out, on real e-commerce purchase data, that the proposed weighting scheme outperforms the tf-idf weighting scheme.