Eco System : Blog-o-Sphere Browsing

Scientific Background:

Media sharing and social networking websites have attracted many millions of users, resulting in vast collections of user generated content. The content typically has a hidden structure and is spread over several platforms, each supporting specific media types and latent, interrelated topics: this has resulted in a Social Networking Divide.

It is our goal to close this gap and support a more Open Social Networking environment, where bloggers can more easily cope with the cognitive challenges of efficiently finding and effectively analyzing relevant information, when inundated with its volume, variety and evolution.

3 – Stage Pipeline Paradigm:

Phase I: Preprocessing Stage - existing, raw blog data will be converted into a format suitable for analysis. XML parsing, database technology will be needed in phase.

Phase II: Analysis Stage – preprocessed data will be used as input into existing tools. Data Mining technology will be needed in the phase.

Phase III: Interaction Stage – the mined knowledge will be presented visually. Information Visualization will be needed in this phase.

An important aspect of this project is the selection of a software paradigm to support the seamless interaction between the stages in the pipleline. See Figure Software Paradgim Selection.

The Big Picture:

Some Requirements:

  • A software design is required to support each phase in a flexible and extensible way. The goal is to support a framework in which different tools and / or interaction paradigms can be used.
  • The design should allow new capabilities to be added to the software pipeline without major changes to the underlying architecture. Subcomponents should be well defined, and independent.
  • The components must be implemented and tested in isolation before the entire pipeline is integrated. It is recommended that the implementation of the pipeline components be divided among team members.
  • Generic interface so data can be exported into a variety of formats for analysis. In general modules in the pipeline should be easily extended or replaced.
  • Central repository where data is collected, using MySQL database
  • Documentation: log4 j , etc..
  • Testing: junit
