User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
l3sintern:research_seminar_10 [2010/10/27 14:09]
l3sintern:research_seminar_10 [2011/01/13 12:29] (current)
Line 436: Line 436:
 Speakers: Katerina Speakers: Katerina
-**Topic(s)** +**Data Cleaning ​for data integration (45 mins)**
- +
-Entity Resolution ​for data integration (45 mins): Overview of existing approaches in entity resolution+
 +Overview of literature relevant to Entity Resolution, i.e., the task of identifying and merging data that refer/​describe the same real world object such as a location, a person, or a conference. Existing approaches are presented and discussed grouped in four categories: atomic similarity methods for comparing strings, similarity methods for sets of strings, facilitating inner-relationships,​ and methods related to uncertain data management.
 ===== Nov 12 ==== ===== Nov 12 ====
Line 447: Line 446:
 Speakers: Ivana Marenzi ​ Speakers: Ivana Marenzi ​
-**Topic(s)**+**Topic(s)** ​   ​Collaborative Web - Google Wave
-Google Wave: introductory tutorial ​+Overview and discussion about the collaboration platform ​Google Wave( 
 +Since I´m preparing a lecture on this topic for the WebScience course in two weeks, the idea is to involve the participants in a preliminary discussion on current communication and collaboration tools, describe the most relevant functionalities of Google Wave, and give possible examples.  
 +The final goal of the WebScience course lecture will be to collect students ideas about new scenarios in which Google Wave could be useful.
 ===== Nov 19 ==== ===== Nov 19 ====
 +**organized by:** Ernesto Diaz-Aviles
 +Speaker: Zeno Gantner
 +**Learning Attribute-to-Feature Mappings for Cold-Start Recommendations**
 +Cold-start scenarios in recommender systems are
 +situations in which no prior events, like ratings or clicks, are
 +known for certain users or items. To compute predictions in
 +such cases, additional information about users (user attributes,
 +e.g. gender, age, geographical location, occupation) and items
 +(item attributes, e.g. genres, product categories, keywords) must
 +be used.
 +We describe a method that such entity (e.g. user or item)
 +attributes to the latent features of a matrix (or higher-
 +dimensional) factorization model. With such mappings, the
 +factors of a MF model trained by standard techniques can
 +be applied to the new-user and the new-item problem, while
 +retaining its advantages, in particular speed and predictive
 +We use the mapping concept to construct an attribute-
 +aware matrix factorization model for item recommendation
 +from implicit, positive-only feedback. Experiments on the new-
 +item problem show that this approach provides good predictive
 +accuracy, while the prediction time only grows by a constant
 +//​MyMediaLite:​ a recommender system algorithm library//
 +MyMediaLite is a lightweight,​ multi-purpose library of recommender
 +system algorithms. It addresses the two most common scenarios in
 +collaborative filtering: rating prediction (e.g. on a scale of 1 to 5
 +stars) and item prediction from implicit feedback (e.g. from clicks or
 +purchase actions).
 +The library is open source/free software, distributed under the terms
 +of the GNU General Public License.
 +===== Nov 26 ====
 **organized by:** Eelco Herder **organized by:** Eelco Herder
Line 457: Line 503:
 Speakers: Ricardo Kawase, George Papadakis Speakers: Ricardo Kawase, George Papadakis
-**Topic(s)**+**The Art of Multi-faceted Tagging ​(Ricardo)** 
 +TagMe!, a social tagging front-end for Flickr images, that provides multifaceted tagging functionality:​ It enables users to attach tag assignments to a specific area within an image and to categorize tag assignments. Moreover, TagMe! maps tags and categories to DBpedia URIs to clearly define the meaning of freely-chosen words. The experiments reveal the benefits of these additional tagging facets. For example, the exploitation of the facets significantly improves the performance of FolkRank-based search. 
 +Further, we demonstrate the benefits of TagMe! tagging facets for learning semantics within folksonomies. 
 +**Incorporating Context Into Real-Time Prediction of Revisitation (George)** 
 +Users frequently return to Web pages they have visited in the past for various reasons. Apart from backtracking,​ they revisit a number of favorite or important pages that they monitor as well as pages that pertain to tasks reoccurring on an infrequent basis. In this paper, we introduce a collection of methods that effectively facilitates revisitation by predicting the next page request, based on contextual information they incorporate. Unlike existing approaches, our methods are real-time, since they do not require any training and configuration of machine learning algorithms. We evaluate them over 
 +a large, real-world dataset, andt the outcomes suggest a significant improvement over established prediction methods that do not take context into account.  
 +===== **Monday**, Nov 29, **17:00** ==== 
 +**organized by:** Gideon Zenz 
 +Speaker: Daniel Wichert 
 +**User interface for interactive and iterative search in structured data** 
 +Nowadays more and more information is stored in huge databases or other structured data formats like ontologies in OWL. To get the right information from a database it is necessary to know a specific query language like SQL. Most users are not familiar with these languages and the systems behind. 
 +The QUICK system developed at L3S starts a search process with a common Google like search query and than finds the right information in an iterative way. However, the current version of QUICK has only a simple and rather inflexible user interface. 
 +The motivation of my master thesis was to develop a new improved user interface for the successor system, which supports different user search strategies by the most expedient design of the user interface components. 
 +Therefore I evaluated the components with a framework from Max Wilson. This evaluation framework reviews the support of user tactics and search strategies by counting up the steps that are necessary to reach the users aim. 
 +Based on this framework I optimised the search interface and compared different approaches. The result is one approach that is similar to facetted browsing and a second one that uses a 2D graph representation. In my talk I will present these results and demonstrate a prototype that shows both solutions with the possibility to switch between different representations of the actual search iteration. 
 +===== **Wednesday**,​ Dec 1, **16:00** ==== 
 +Speaker: ​ Julia Preusse 
 +**Analysis of the WebUni Online Student Community** 
 +Nowadays, Online Social Networks present a huge opportunity to gather 
 +various information about topics such as communication patterns, 
 +structure of social networks and flow of information. Despite the 
 +popularity of large-scale online social networks, smaller local 
 +platforms such as WebUni Magdeburg maintain their attractiveness. We 
 +believe that this is the first study to examine the complete data of a 
 +smaller social network that exists for longer than seven years. 
 +In our study, we prove that WebUni is a scale-free small-world network 
 +based on analysis of the social network graph and the guestbook 
 +network. ​ We surprisingly detect that the rating network, which is 
 +based on users’ hidden ratings, is also a scale-free small-world 
 +network, even though to the best of our knowledge state-of-the-art 
 +theories cannot be used to explain this fact. 
 +The WebUni database contains quantitative information on private user 
 +interactions as well as on public ones. We use these data to compute 
 +the ratio of public and private communication for outgoing and 
 +incoming interactions of a user. We observe that users tend to have a 
 +similar ratio for outgoing and incoming interactions,​ although the 
 +outgoing communication is slightly more private. Comparing the overall 
 +public and private interactions of a user, we notice active users to 
 +have a balanced ratio of public and private interaction or to be more 
 +biased towards private interactions. ​ Network newbies and inactive 
 +users on the other hand are biased towards either completely public or 
 +solely private interactions. 
 +To overcome friendship inflation, we make use of awell-known concept 
 +of Granovetter from sociology: the strength of ties. It enables us to 
 +measure the strength of different ties ranging from friendship, mutual 
 +rating and mutual guestbook writing to combinations of each of 
 +them. We discover that friendship is a solid foundation of a strong 
 +tie, but not sufficient. ​ Friendship paired with mutual rating and 
 +guestbook writing improves the strength of ties, whereas constraints 
 +such as mutual positive rating unexpectedly do not. We are finally 
 +able to verify that the strength of ties theory holds for WebUni. A 
 +combination of minimum number of reciprocal guestbook posting, 
 +friendship and minimum number of reciprocal ratings returns up to 46 
 +strong ties that satisfy Granovetter’s definition. 
-Analyzing and predicting recurrent behavior on the Web.  
 ===== Dec 17 ==== ===== Dec 17 ====
Line 471: Line 590:
-"​Evaluation of Search User Interface"+Presentation (30 min) about "​Evaluation of Search User Interfaces" ​based on the chapter 
 +The Evaluation of Search User Interfaces 
 +"What should be measured when assessing a search interface? Traditional information retrieval research focuses on evaluating the proportion of relevant documents retrieved in response to a query. In evaluating search user interfaces, this kind of measure can also be used, but is just one component within broader usability measures. Usable interfaces are defined in terms of learnability,​ efficiency, memorability,​ error reduction, and user satisfaction (Nielsen, 2003b, Shneiderman and Plaisant, 2004). However, search interfaces are usually evaluated in terms of three main aspects of usability: effectiveness,​ efficiency, and satisfaction. 
 +This presentation summarizes some major methods for evaluating user interfaces, followed by a set of guidelines about special considerations to ensure successful search usability studies and avoid common pitfalls. The presentation concludes with general recommendations for search interface evaluation."​ 
 ===== Jan 7 ==== ===== Jan 7 ====
Line 477: Line 604:
 **organized by: **  **organized by: ** 
-Speakers: ​+Speakers: ​Elena
 **Topic(s)** **Topic(s)**
 +DivQ: Diversification for Keyword Search over Structured Databases
 +Keyword queries over structured databases are notoriously
 +ambiguous. No single interpretation of a keyword query can
 +satisfy all users, and multiple interpretations may yield
 +overlapping results. This paper proposes a scheme to balance the
 +relevance and novelty of keyword search results over structured
 +databases. Firstly, we present a probabilistic model which
 +effectively ranks the possible interpretations of a keyword query
 +over structured data. Then, we introduce a scheme to diversify the
 +search results by re-ranking query interpretations,​ taking into
 +account redundancy of query results. Finally, we propose α-
 +nDCG-W and WS-recall, an adaptation of α-nDCG and S-recall
 +metrics, taking into account graded relevance of subtopics. Our
 +evaluation on two real-world datasets demonstrates that search
 +results obtained using the proposed diversification algorithms
 +better characterize possible answers available in the database than
 +the results of the initial relevance ranking.
Line 486: Line 631:
 **organized by: ** Dimitris ​ **organized by: ** Dimitris ​
-Speakers: Julien, Marco+Speakers: ​Dimitris, ​Julien, Marco
-**Topic(s)**+==== Topic(s)==== 
 +=== Efficient Discovery of Frequent Subgraph Patterns in Uncertain Graph Databases (Dimitris) === 
 +Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging,​ when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination,​ that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach. 
 +=== Time-Aware Entity-Based Multi-Document Summarisation (Julien) === 
 +Automatic news multi-document summarisation received increased intention lately to 
 +cope with the increasing amount of news articles and sources. Summarisation of  
 +news article has the additional challenge that document (news articles) are timestamped,​ 
 +and often relate events which themselves inscribe in time 
 +We propose three contributions which we believe will help improving summarisation quality: 
 +  - Considering named entities in news article 
 +  - Considering time for summarisation and for summary layout 
 +  - Considering time references in the text in addition to article timestamps 
 +For this we augment a state-of-the-art summarisation technique with named entities and 
 +time references, and adapt a state-of-the-art news event detection to cluster sentences 
 +to improve summarisation of news article. 
 +This work is in progress, and I will present the general approach and ideas, as well as 
 +the current status of the work. 
 +=== Detecting Health Events on the Social Web to Enable Epidemic Intelligence (Marco) ===  
 +Content analysis and clustering of natural language documents becomes  
 +crucial in various domains, even in public health. Recent pandemics such as Swine  
 +Flu have caused concern for public health officials.  
 +Given the ever increasing pace at which infectious diseases can spread globally,  
 +Officials must be prepared to react sooner and with greater epidemic 
 +intelligence gathering capabilities. There is a need to allow 
 +for information gathering from a broader range of sources, 
 +including the Web which in turn requires more robust processing 
 +capabilities. To address this limitation, in this paper, 
 +we propose a new approach to detect public health events 
 +in an unsupervised manner. We address the problems associated 
 +with adapting an unsupervised learner to the medical 
 +domain and in doing so, propose an approach which 
 +combines aspects from different feature-based event detection 
 +methods. We evaluate our approach with a real world 
 +dataset with respect to the quality of article clusters. Our 
 +results show that we are able to achieve a precision of 62% 
 +and a recall of 75% evaluated using manually annotated,​ 
 +real-world data.
 ===== Jan 21 ==== ===== Jan 21 ====
-**organized by:** Kerstin D.+**organized by:** 
-Speakers: ​Avaré, Ernesto, Marco+Speakers: ​
 **Topic(s)** **Topic(s)**
Line 508: Line 698:
 **Topic(s)** **Topic(s)**
 +===== tbd. ====
 +**organized by:** Kerstin D.
 +Speakers: Avaré, Ernesto
l3sintern/research_seminar_10.1288188596.txt.gz · Last modified: 2010/10/27 14:09 by marenzi