There is a vast wealth of information available in textual format that the Semantic Web cannot yet tap into: 80% of data on the Web and on internal corporate intranets is unstructured, hence analysing and structuring the data – social analytics and next generation analytics – is a large and growing endeavour. The goal of the 1st workshop on Semantic Web and Information Extraction is to bring researchers from the fields of Information Extraction and the Semantic Web together to foster inter-domain collaboration. To make sense of the large amounts of textual data now available, we need help from both the Information Extraction and Semantic Web communities. The Information Extraction community specialises in mining the nuggets of information from text: such techniques could, however, be enhanced by annotated data or domain-specific resources. The Semantic Web community has already taken great strides in making these resources available through the Linked Open Data cloud, which are now ready for uptake by the Information Extraction community. The workshop invites contributions around three particular topics: 1) Semantic Web-driven Information Extraction, 2) Information Extraction for the Semantic Web, and 3) applications and architectures on the intersection of Semantic Web and Information Extraction.


The Semantic Web aims to add a machine tractable, repurposable layer to complement the existing web of natural language hypertext. In order to realise this vision, the creation of semantic annotation, the linking of Web pages to ontologies and the creation, evolution and interrelation of ontologies must become automatic or semi-automatic processes. Information Extraction, a form of natural language analysis, is becoming a central technology to link Semantic Web models with documents. On the other hand, traditional Information Extraction can be enhanced by the addition of semantic information, enabling disambiguation of concepts, reasoning and inference to take place over the documents. The primary goal of this workshop is to advance the understanding of the relationship between Information Extraction and Semantic Web.  With the adoption of the Web 2.0 paradigm, these technologies further face new challenges because of their inherent multi-source nature, while the rapidly increasing use of social media  also brings a new set of problems in dealing with degraded forms of text such as incorrect grammar, spelling and so on. Information Extraction now has to deal not just with isolated texts or single narratives but with large scale repositories or sources — in one or many languages — containing a multiplicity of views, opinions, or commentaries on particular topics, entities or events, in very diverse styles and formats. New methods and tools thus need to be developed to deal with the changing face of data and the changing needs of society. Furthermore, traditional platforms and architectures for Information Extraction are not necessarily capable of smooth handling of the transition to more semantic forms of annotation. While language analysis tools may not require sophisticated ontology handling mechanisms, the ensuing lack of interoperability can be problematic when embedding such tools and platforms in Semantic Web architectures.


Participants will come from various areas of research that are represented in the Semantic Web and Information Extraction communities such as: artificial intelligence, ontology population, data mining, machine learning, knowledge representation, and web information systems. Some participants will probably be especially interested in particular application areas, such as the biomedical domain, government, cultural heritage, or entertainment.


We welcome high-quality papers about current trends in the areas listed in the following, non-exhaustive list of topics. We will seek application-oriented, as well as more theoretical papers and position papers. Each submission should explicitly address one or more of the three main topics. In addition to presenting specific results, the paper should discuss the more general implications for the topics and/or subtopics that it addresses. Where feasible, contributions should include a system demonstration that illustrates the key ideas of the work and encourages interactive discussion at the workshop. There will also be an opportunity to present late-breaking work or novel ideas as a 2-minute lightning talk during the afternoon; these topics may be the stimulus for further debate during the open discussion period.

1. Semantic Web-driven Information Extraction

  • Integrating ontologies/Linked Open Data with Language Resources

  • Enriching Information Extraction systems with Semantic Web data/technologies

  • Complex Semantic Web-driven Information Extraction tasks e.g., relation extraction, event extraction

  • Methods and metrics for evaluation of semantic annotations with respect to ontologies

  • Incorporating semantics into Machine Learning approaches

  • Recognition and representation of temporal information and dynamics

  • Data aggregation, consolidation and enrichment

2. Information Extraction for the Semantic Web

  • Extraction from unstructured versus semi-structured textual sources

  • Dealing with the imperfections of Information Extraction techniques in the Semantic Web setting and their impact

  • Multi-source or multilingual Information Extraction for ontology population

  • Information extraction subtasks (e.g., terminology extraction, relation extraction, coreference resolution) for the Semantic Web

  • Methods and metrics for evaluation of Information Extraction for the Semantic Web

3. Applications and Architecture

  • Ontology-based Information Extraction for specific domains and applications, e.g. business analytics, healthcare and biomedicine, cultural heritage etc.

  • Information Extraction for social media mining

  • Scalability of tools and resources

  • Platforms and architectures for automatic and semi-automatic semantic annotation

  • Tools and methodologies for building and managing complex processing workflows


Workshop papers submission deadline: 3 July 2013

Workshop paper acceptance notification: 2 August 2013

Workshop camera-ready copies due: 16 August 2013

Workshops: 12-13 September 2013


Submissions should explicitly address one or more of the three main workshop topics and not exceed 8 pages including references. In addition to presenting specific results, the paper should discuss the more general implications for the questions that it addresses. The workshop proceedings will be published online through Abstracts for lightning talks should describe ongoing or late-breaking work concerning one or more of the three main workshop topics and should not exceed 2 pages. The abstracts will be reviewed lightly by the organising committee for appropriateness to the workshop and published on the workshop website.

All submissions must be in PDF format and must follow the RANLP template (

Contributions must be submitted through the SWAIE 2013 Workshop

EasyChair page (

Please direct any questions regarding the workshop to

For more information: visit the SWAIE 2013 Website.


Diana Maynard, University of Sheffield

Marieke van Erp, VU University Amsterdam

Brian Davis, DERI Galway