Vol.16 No.5&6 September 1, 2017
Engineering the Web in the Big Data Era
Editorial
(pp361-362)
Philipp Cimiano, Flavius Frasincar, and Daniel
Schwabe
Relaxation of Keyword Pattern Graphs on RDF Data
(pp363-398)
Ananya Dass, Cem Aksoy, Aggeliki Dimitriou, and
Dimitri Theodoratos
One of the facets of the data explosion
in recent years is the growing of the repositories of
RDF
Data on the Web. Keyword search is a popular technique for querying
repositories of
RDF
graph data. Recently, a number of approaches leverage a structural
summary of the graph data to address the typical keyword search related
problems of: (a) identifying relevant results among a multitude of
candidates, and (b) performance
scalability.
These approaches compute queries (pattern graphs) corresponding to
alternative interpretations of the keyword query and the user selects
one that matches her intention to be evaluated against the data. Though
promising, these approaches suffer from a drawback: because summaries
are approximate representations of the data, they might return empty
answers or miss results which are relevant to the user intent. In this
paper, we present a novel approach which combines the use of the
structural summary and the user feedback with a relaxation technique for
pattern graphs. We leverage pattern graph
homomorphisms
to define relaxed pattern graphs that are able to extract more results
potentially of interest to the user. We introduce an operation on
pattern graphs and we prove that it is complete, that is, it can produce
all relaxed pattern graphs. To guarantee that the result pattern graphs
are as close to the initial pattern graph as possible, we devise
different metrics to measure the degree of relaxation of a pattern
graph. We design an algorithm that computes relaxed pattern graphs with
non-empty answers in relaxation order. To improve the successive
computation of relaxed pattern graphs, we suggest
subquery
caching and
multiquery
optimization techniques adapted to the context of this computation.
Finally, we run experiments on different real
datasets
which demonstrate the effectiveness of our ranking of relaxed pattern
graphs, and the efficiency of our system and optimization techniques in
computing relaxed pattern graphs and their answers.
Getting the Query Right for Crisis Informatics Design Issues for
Web-Based Analysis Environments
(pp399-432)
Mario Barrenechea, Sahar Jambi, Ahmet A. Aydin,
Mazin Hakeem, and Ken M.
Anderson
Web-based data analysis environments
are powerful platforms for exploring large data sets. To ensure that
these environments meet the needs of analysts, a human-centered design
perspective is needed. Interfaces to these platforms should provide
flexible search, support user-generated content, and enable
collaboration. We report on our efforts to design and develop a web
interface for a custom
analytics
platform---EPIC Analyze---which
provides interactive search over large Twitter data sets collected
during crisis events. We performed seven think-aloud sessions with
researchers who regularly analyze crisis data sets and compiled their
feedback. They identified a need for a ``big picture'' view of an event,
flexible querying capabilities, and user-defined coding schemes. Adding
these features allowed EPIC Analyze to meet the needs of these analysts
and enable exploratory research on crisis data. In performing this work,
we identified an opportunity to migrate the software architecture of
EPIC Analyze to one based on
microservices.
We report on the lessons learned in performing this migration and the
impact it had on EPIC
Analyze's
capabilities. We also reflect on the benefits a
microservices
approach can have on the design of data-intensive software systems like
EPIC Analyze.
Architecting Liquid Software
(pp433-470)
Andrea Gallidabino, Cesare Pautasso, Tommi Mikkonen,
Kari Systa, Jari-Pekka Voutilainen, and Antoro Taivalsaari
The Liquid Software metaphor refers to
software that can operate seamlessly across multiple devices owned by
one or multiple users. Liquid software applications can take advantage
of the computing, storage and communication resources available on all
the devices owned by the user. Liquid software applications can
also dynamically migrate from one device to another, following the
user's attention and usage context. The key design goal in Liquid
Software development is to minimize the additional efforts arising from
multiple device ownership (e.g., installation, synchronization and
general maintenance of personal computers,
smartphones,
tablets, home and car displays, and wearable devices), while keeping the
users in full control of their devices, applications and data. In this
paper we present the design space for Liquid Software, categorizing and
discussing the most important architectural dimensions and technical
choices. We also provide an introduction and comparison of two
frameworks implementing Liquid Software capabilities in the context of
the World Wide Web.
A Semantic Framework for Sequential
Decision Making
(pp471-504)
Patrick Philipp, Maria Maleshkova,
Achim Rettinger, and Darko Katic
Current developments in the medical
domain, not unlike many other sectors, are marked by the growing
digitalization
of data, including patient records, study results, clinical guidelines
or imagery. This trend creates the opportunity for the development of
innovative decision support systems to assist physicians in making a
diagnosis or preparing a treatment plan. Similar conditions hold for the
Web, where massive amounts of raw text are to be processed and
interpreted automatically, e.g. to eventually add new information to a
knowledge base. To this end, complex tasks need to be solved, requiring
one or more interpretation algorithms (e.g. image- or natural language
processors) to be chosen and executed based on heterogeneous data. We,
therefore, propose the first approach to a semantic framework for
sequential decision making and develop the foundations of a Linked agent
who executes interpretation algorithms available as Linked
APIs
\cite{speiser2011}
on a data-driven, declarative basis
\cite{stadtmueller2013}
by integrating structured knowledge formalized in
RDF
and OWL, and having access to meta components for planning and learning
from experience. We evaluate our framework based on automatically
processing brain images, the ad-hoc
combination of surgical phase recognition algorithms and experiential
learning to optimally pipeline entity linking approaches.
Other Research Articles
Identifying the
Influential bloggers: A modular approach based on Sentiment Analysis
(pp505-523)
Umar Ishfaq, Hikmat
Ullah Khan, and Khalid Iqbal
The social web provides an easy and quick medium
for public communication and online social interactions. In the web log,
short as a blog, the bloggers share their views in the form of creating
and commenting on blog posts. The bloggers who influence other users in
a blogging community are known as the influential bloggers.
Identification of such influential bloggers has vast applications in
advertising, online marketing and e-commerce. This paper investigates
the problem of identifying influential bloggers and presents a model
which consists of two modules: Activity and Recognition. The activity
module takes into account a blogger’s activity and recognition module
measures a blogger’s influence in his/her social community. The
integration of activity and recognition modules identifies the active as
well as influential bloggers. The proposed model, MIBSA (Model to find
Influential Bloggers using Sentiment Analysis), takes into account the
existing and novel features of sentiment expressed in content generated
by a blogger. The model is evaluated against the existing standard
models using the real world blogging data. The results confirm that
sentiment expressed in blog content plays an important role in measuring
a blogger’s influence and should be considered as a feature for finding
the top influential bloggers in the blogosphere.
Web Access Mining through Dynamic
Decision Trees with Markovian Features
(pp524-536)
Arpad Gellert
In this work we propose a hybrid web access
prediction method consisting in a dynamic decision tree and different
order Markov predictors as components. The predictions generated by the
Markov chain components are used as features within the dynamic decision
tree. Our goal is to use this hybrid technique in order to anticipate
and prefetch the web pages and files accessed by the users through
browsers, reducing thus the load times. We use a decision tree to select
the most predictive features from a considered feature set and based on
those selected features we generate predictions. In our application, the
feature set includes the current link, the type of the current link as
well as the predictions of different order Markov chains. The optimal
configuration of the proposed hybrid technique provides an average web
page prediction accuracy of 72.57%.
Back
to JWE Online Front Page
|