Projects
Data analytics Dashboard
Visualizing insights from data
As a data scientist, there are moments
where you have to do the same tasks of analysis mutliple times and
have to show the results to your colligue in a well-designed format.
The input data here is in a fixed structured format. If
that is the case, building a Dashboard with R shiny can be a compact
and interesting solution. R shiny is a popular choice among
data scientists for sharing data insights. As shown in the image,
the design and the type of analytics might vary between
applications.
Chat-log Network Analysis
Understanding user behaviour
Suppose a number of users in an IRC based session are chatting
by sending/receiving messages. A chat-log data will then include the
names of sender & receiver and a time-stamp, which indicates
the time a message has been sent. If we use a network represenation, we
will have users as nodes and their interactions as edges. A small
example depicts here the connections between four users and each
connection has a lable indicating the time-stamp. The questions here
are:
- What is the definition of activity in a chat session?
- How do we measure the activity of users in an
observation time slot?
Network analytic measures provide meaningful insights
into a networked system. In this example, it helps the analysis of
users' behaviour. As shown here, within a specific time
observation, analyzing different kinds of motifs, e.g., triangle
motifs, will shed light on to the communicational patterns
between three or more users in the session. However, the
obtained result may not be longer su if we observe that the
interactions captured in the captured motif belong to the different
conversation threads. So, the questions here are:
- Does it make sense at all if we look for motifs disregarding
the chat topics?
- By which criteria can we evaluate the significance of
each conversation thread?
Please cite the following paper if you are interested
in any part of this topic:
Sude Tavassoli and Katharina A. Zweig, "Analyzing the
activity of a person in a chat by combining network analysis and
fuzzy logic", in Advances in Social Networks Analysis and
Mining (ASONAM), 2015 IEEE/ACM International
Conference, pp.1565-1568, 25-28 Aug.
2015.
Figure: (a)
a time-stamped multi-graph in which
every connection between a pair of
vertices has a label showing its time of
incidence. S AB shows
the frequency of sending/receiving
messages between the users A and
B.
(b) each
edge between two vertices (i,j) has
a unique time of occurrence in an
observation time period T. The
dashed line between every pair of time
labels
(tk-1, tk )
represents the duration that two users
interact until a new pair of users start
chatting. And, \Delta AB shows the
summation of these durations,
when a pair of users were engaged
in sending/receiving ms in an
observation time section T
(t1 to
t10).
For designing the knowledge framework proposed in paper [7], Weka is used.
For developing lexicons and designing the
morphological analyzer proposed in papers [6,9], we used xerox
finite state (lexc language). All the lexicons are available on
Github.