NewsReader Storyteller

Please follow the next link to see the demo after reading the explanation below.

http://nlesc.github.io/UncertaintyVisualization/

 

1. Introduction

NewsReader Storyteller is a tool to visualise event structures generated by the NewsReader software as structured stories. Following van den Akker et al, we show interlinked actor-centric and event-centric stories from the same data set.

The storyline demo shows an ordering of events on a timeline to approximate stories. We define a story as a sequence of events structured according to some explanatory model in which:

  1. there is at least one climax event that is the critical turning point in a sequence of events;
  2. there is a series of events that precede the climax event and result in the critical situation. The preceding events are to some extent conditional for the climax event to take place according to some common sense explanatory model;
  3. there is a series of events that follow the climax event as a consequence of the critical situation and common-sense response with respect to the climax;

Stories can be told in many ways and from many different perspectives. Storyteller visualises stories in 4 different ways, each focusing on different aspects:

  1. actor centric stories
  2. event centric stories
  3. climax overview of all stories
  4. source representation of the story as text

Users can make selections in these views and see the effect of the selection in the other views.

Interaction, selection and filtering

Interaction is a very important element in this demo, since it allows the user to get a feel for the dataset by making selections and zooming in to interesting elements of his or her choice. In fact, it is such an important element, that some of the charts cannot even be easily read without making selections and filtering the data interactively. This is done intentionally, since it allows the user to perceive the complexity of the dataset before drawing conclusions based on subsets of it.

On the top of the page, beneath the bar showing the data file, the global filter state is shown. Users can make new filters by clicking or selecting items or areas in the charts. This adds elements to the filter state. The data shown in the graphs is the result of the filter state applied to the complete set of data.

Clicking elements in the filter state display removes them from the filter state, thus allowing the user to easily view their selections and undo them if necessary.

Refreshing the webpage is also a valid mode of interaction, since it also lifts all selections and shows the complete dataset again.

2. Actor centric storyline

The metroline map at the top of the page is a visualization of the co-participation of actors in the events of the storyline. The chart lists all the major participants on the Y axis and shows the events in which they participate in a timeline, where each actor’s timeline has a different color. Events are ovals on each line. If different participants take part in the same event, the lines are bent towards the same event, showing an intersection and therefore co-participation of the actors. An event receives a descriptive label. Hovering the mouse cursor over an event will show further details.

Axis ordering

The X axis is a simple timeline stretching between the first of the events shown and the last.

The Y axis is however slightly more complicated. In essence it is a simple ordinal scale, but the ordering of the elements in the scale is of great importance to the legibility of the resulting chart, since if this is done improperly, the resulting chart will have many more curved lines than necessary, resulting in a jumbled mess.

We solved this legibility problem by ordering the elements in such a way that the metro lines are traveling straight for as long as possible. We do this by re-ordering the elements on the Y axis in order of co-appearance on the timeline, from bottom to top.

We start by determining the first and bottom-most line. We do this by sorting the events on the chart in time, and selecting the firstmost element. We determine the actors in this event, and then loop over all events which share these actors, in order of appearance on the timeline. Every time a new co-actor is found in one of these events, it is added to the list.

Once all events have been processed in this manner, this algorithm results in a clustering of events that have similar actors, making the resulting graph much easier to understand.

Filtering on actors

To the left of the metroline chart, an alphabetic list is given of all of the actors present in the dataset. The length of the colored bar gives an indication of the number of events in which the actor participates. Holding the mouse over one of these elements shows the number of events.

By clicking on one of the items in the actor list, a filter is applied to the dataset. For the metroline chart, this means that only those lines are shown with events in which the selected actor participates and the lines of other actors that co-participate in at least one event of the selected one. Selecting more than one actor selects only those events in which all selected actors co-participate. If no events are shown, this means there are no events left unfiltered. Note that this is quickly the case since events usually only have one or two actors.

3. Event centric storyline

The event centric view is the second view in the demo. It shows time ordered sequences of events in different rows. Each row approximates the structure of a story: preceding events that build up towards a climax event, which is followed by further events as a consequence. The size (and color) of the event bubbles represents the climax score of the event. This climax score can be based on many different properties such as sentiment and impact but currently it is based on the number of mentions of an event in the collection of documents processed and the prominence of the mention. The latter is based on the sentence number or offset of the mention: events mentioned early on count more than events later in the text. The more an event is mentioned in prominent positions, the higher its climax score.

Events with a high score can create connections with other events detected through bridging relation, e.g. causal relations or sharing the same participants. Climax events (having a high climax score) pull other events into their story based on these bridging relations. Together they represent an approximation of a story, where we expect events to increase in bubble size when getting closer to the climax event (the biggest bubble in a row) and then decrease after the climax event.

The first row presents the story derived from the climax event with the highest score (normalised to 100). The next rows show stories based on other events scoring lower and not connected to the main story. Stories are labeled with the words that name the climax event. In addition to the label, each row has a colored bar that indicates the cumulative value of the climax scores of all the events in a story. Note that the story with the highest climax event does not necessarily have the largest cumulative score. If it is mentioned a lot but poorly connected to other events, the cumulative score may still be lower than that of other stories.

Filtering on stories

The user can select a story in the same way as for actors, by clicking on the index to the left. In this case, however, selecting more than one story or row just adds data to the representation. This is intentionally different from the actor centric view, where multiple selections intersect (stories can, in this context, by definition not intersect). Since any filters that are chosen are applied globally, any selection in one view is projected on the other.

4. Climax overview of all stories

The third view also plots events on time (X-axis) and for climax score (Y-axis). However in this case, it does not group events from the same story in a row but plots the events on the Y axis based on their score, where symbols indicate group membership (or in this case, story membership). In this way, you can see how events from the same story are spread over climax scores and time.

Filtering on statistics

In this view,  the user can make selections by dragging a region in the Y/X space with the mouse. A region thus represents a segment in time and a segment of climax score at the same time. This enables both selecting time intervals (by dragging a full height box between two particular time frames) and selection of the most (or least) influential events. Note that selecting a single story in the event centric view logically reduces the events to those of one symbol only. Selecting an actor may result in one or more stories however.

5. Textual representation of the story

Finally at the bottom, we see text snippets from the original texts that were used to derive the event data. Depending on the selection made in the previous view, it lists all the events in rows with all the text fragments in which they are mentioned and the event word highlighted. The event labels are given separately as well, where synonyms are grouped together. Furthermore, the table shows the climax score, the date and the group name or story label. No selections can be made through this view.

6. Loading data files

You can use the LOAD button to load any data file in the right JSON format. This file will replace the default file(s) in the demo. You can obtain a JSON file by uploading so-called NAF files into the online demo to create JSON from NAF. NAF files can be obtained by processing textual data through one of the NewsReader pipelines.

7. JSON data structure

The JSON structure contains 2 lists:

  1. all the events with various properties
  2. text snippets (sources) from the original news sources in which these events are mentioned with a uri that identifies the source.

Each event contains 16 data elements:

    1. event: event instance ID that is unique within one SEM-RDF data set. In the case of events extracted across different languages, the event-instance is represented through a set of Wordnet Interlingual-Index records (ILI).
    2. actors: list of all the actors that participate in the event with their roles. The role labels come from different event ontologies: PropBank, FrameNet, ESO.
    3. classes (OPTIONAL): list of event ontology classes subdivided by namespaces for their respective ontologies: FrameNet (fn), ESO (eso)
    4. climax: normalised score that indicates the relevance of the event within a story
    5. fnsuperframes (OPTIONAL): set of FrameNet frames that are parents of the classes given for an event. This can be used to create different types of groupings.
    6. topics (OPTIONAL): set of Eurovoc concepts associated with the document in which an event was detected
    7. group: label that uniquely identifies an event-group to which the event belongs. Event groups are the basis for event-centric story visualisations.
    8. groupName: preferred label of the event with the highest climax score within the group.

 

  • groupScore: highest climax score within the group of events indicating the relevance of the story.
  • instance: unique URI for representing the event instance, persistent across SEM-RDF data sets.
  • labels: all the different wordings for mentioning the event.
  • prefLabel: preferred label for the event based on frequency.
  • sentence: first sentence in which the event is mentioned.
  • size: size indicator to represent the event, derived from the climax score.
  • time: date to which the event is anchored.
  • mentions: reference to mentions of the events, where each mention is structured as follows:

 

    1. uri: URI for the source text in which a mention is found
    2. char: character offsets for the raw text inside the source
    3. tokens: NewsReader token identifiers for the tokens that make up the mention
    4. terms: NewsReader term identifiers for the terms that make up the mention
    5. sentence: NewsReader sentence identifier in which the event is mentioned

 

The events are followed by the sources, which is a list of pairs of the original text to which the offsets of point and the URIs of these texts.  Mentions in the event structures can be resolved through the offsets in the text of the corresponding URL.

Abbreviated example of the JSON structure:

{ “timeline”:

“events”: [

{

“actors”: {

“eso/possession-owner_2”: [“dbp:Aer_Lingus”],

“eso/possession-theme”: [“ne:the_purchase_of_new_long-haul_airliners”]

},

“classes”: {“eso”: [“Buying”],”fn”: [“Commerce_buy”]},

“climax”: 38,

“event”: “ev20”,

“fnsuperframes”: [“Transfer”,”Eventive_affecting”,”Lose_possession”,”Giving”],

“group”: “038:[\”purchase\”,\”buy\”]”,

“groupName”: “[\”purchase\”,\”buy\”]”,

“groupScore”: “038”,

“instance”: “http://en.wikinews.org/wiki/Aer_Lingus_buys_twelve_new_long-haul_Airbus_jets#ev20”,

“labels”: [“purchase”,”buy”],

“mentions”: [

{

“char”: [“103″,”112”],

“sentence”: [“3”],

“terms”: [“t19”],

“tokens”: [“w19”],

“uri”: [“http://en.wikinews.org/wiki/Aer_Lingus_buys_twelve_new_long-haul_Airbus_jets”]

},

{

“char”: [“11″,”15”],

“sentence”: [“1”],

“terms”: [“t3”],

“tokens”: [“w3”],

“uri”: [“http://en.wikinews.org/wiki/Aer_Lingus_buys_twelve_new_long-haul_Airbus_jets”]

},

{

“char”: [“715″,”723”],

“sentence”: [“6”],

“terms”: [“t134”],

“tokens”: [“w134”],

“uri”: [“http://en.wikinews.org/wiki/Aer_Lingus_buys_twelve_new_long-haul_Airbus_jets”]

}

],

“prefLabel”: [“purchase”],

“sentence”: “1”,

“size”: “9.5”,

“time”: “20060901”

},

…… etc…..

 

sources“: [

{

text“: “Airbus parent EADS wins £13 billion UK RAF airtanker contract__March 27, 2008__European Aeronautic Defence & Space NV (EADS), the parent company of European airframer Airbus, has won a £13 billion contract to supply the United Kingdom’s Royal Air Force (RAF) with aerial refueling tankers to replace the nation’s current ageing fleet.__AirTanker Ltd., an EADS-led consortium, have signed a 27-year contract with the Defense Ministry to supply 14 new Airbus A330-200 passenger airliner converted for the task. They will be owned by AirTanker, who retains commercial leasing rights to five which can carry 290 passengers plus cargo, but will fly in RAF livery. They replace existing Lockheed Tristar and Vickers VC-10 aircraft. The first aircraft will be in service by 2011 and all by 2016.__Rolls-Royce, part of the consortium, will supply engines. France’s Thales will supply electronics, Wimborne, UK’s Cobham will manufacture refueling equipment and Southampton, UK’s VT Group will provide service management.__Last month, Northrop Grumman and EADS defeated Boeing to win a massive order for 179 tankers from the United States Air Force. Airbus has also inked recent deals with the Royal Australian Air Force, the Royal Saudi Air Force and the UAE Air Force.”,

uri“: “http://en.wikinews.org/wiki/Airbus_parent_EADS_wins_%c2%a313_billion_UK_RAF_airtanker_contract”

},

{

text“: “Boeing pushes back 737 replacement development__May 24, 2008__United States airframer Boeing has announced that development of a replacement for their 737 narrowbody airliner, begun two years ago, has been pushed back several years, Boeing saying that further advancement of technology is required.__Spokeswoman Sandy Angers said that that the team formed to look at the development had been merged into the parent product development team and would cease looking at specific designs. Boeing say airlines demand performance improvements of 15-20% are required if a new airliner is to be commercially viable.__\”We’ve reduced our airplane-design effort and are focusing more on the technology breakthroughs,\” said Angers. \”We need technology breakthroughs in engines, aerodynamics, materials and other systems. You can’t simply shrink the 787 and expect the same benefits for the narrow-body market. We’ve got difficult challenges.\”__One important difference is that the plastic composite used for the fuselage of the 787 would not offer as significant a weight saving on a smaller aircraft. The delivery date for the plane is now anticipated to be around 2020, and not 2015 as previously hoped. This coincides with the expected date for Airbus to deliver their A320 replacement. Industry analysts predict development of one to trigger development of the other. Boeing had hoped to have their’s ready for 2012.__Boeing Commercial Airplanes CEO Scott Carson said \”We’re continuing our research effort until we find the right solution. It has to be a 25-year product.\”__Since its 1967 debut 5,700 737s have been deliverd to date, with orders for 2,200 more, keeping the jet in production until at least 2014. It is Boeing’s most popular airliner.”,

uri“: “http://en.wikinews.org/wiki/Boeing_pushes_back_737_replacement_development”

},

……. etc…..

],

“headline”: “contextualEvent”,

“text”: “NewsReader timeline”,

“type”: “default”

}}