The N2 corpus: A semantically annotated collection of Islamist extremist stories

Mark A. Finlayson; Jeffry R. Halverson; Steven Corman

The N2 corpus: A semantically annotated collection of Islamist extremist stories

Mark A. Finlayson, Jeffry R. Halverson, Steven Corman

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

We describe the N2 (Narrative Networks) Corpus, a new language resource. The corpus is unique in three important ways. First, every text in the corpus is a story, which is in contrast to other language resources that may contain stories or story-like texts, but are not specifically curated to contain only stories. Second, the unifying theme of the corpus is material relevant to Islamist Extremists, having been produced by or often referenced by them. Third, every text in the corpus has been annotated for 14 layers of syntax and semantics, including: referring expressions and co-reference; events, time expressions, and temporal relationships; semantic roles; and word senses. In cases where analyzers were not available to do high-quality automatic annotations, layers were manually double-annotated and adjudicated by trained annotators. The corpus comprises 100 texts and 42, 480 words. Most of the texts were originally in Arabic but all are provided in English translation. We explain the motivation for constructing the corpus, the process for selecting the texts, the detailed contents of the corpus itself, the rationale behind the choice of annotation layers, and the annotation procedure.

Original language	English (US)
Title of host publication	Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
Editors	Nicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
Publisher	European Language Resources Association (ELRA)
Pages	896-902
Number of pages	7
ISBN (Electronic)	9782951740884
State	Published - 2014
Event	9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland Duration: May 26 2014 → May 31 2014

Publication series

Name	Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Other

Other	9th International Conference on Language Resources and Evaluation, LREC 2014
Country/Territory	Iceland
City	Reykjavik
Period	5/26/14 → 5/31/14

Keywords

Multi-layered annotation
Narrative corpora
Religious texts

ASJC Scopus subject areas

Linguistics and Language
Library and Information Sciences
Education
Language and Linguistics

Cite this

Finlayson, M. A., Halverson, J. R., & Corman, S. (2014). The N2 corpus: A semantically annotated collection of Islamist extremist stories. In N. Calzolari, K. Choukri, S. Goggi, T. Declerck, J. Mariani, B. Maegaard, A. Moreno, J. Odijk, H. Mazo, S. Piperidis, & H. Loftsson (Eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 896-902). (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014). European Language Resources Association (ELRA).

The N2 corpus: A semantically annotated collection of Islamist extremist stories. / Finlayson, Mark A.; Halverson, Jeffry R.; Corman, Steven.
Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. ed. / Nicoletta Calzolari; Khalid Choukri; Sara Goggi; Thierry Declerck; Joseph Mariani; Bente Maegaard; Asuncion Moreno; Jan Odijk; Helene Mazo; Stelios Piperidis; Hrafn Loftsson. European Language Resources Association (ELRA), 2014. p. 896-902 (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Finlayson, MA, Halverson, JR & Corman, S 2014, The N2 corpus: A semantically annotated collection of Islamist extremist stories. in N Calzolari, K Choukri, S Goggi, T Declerck, J Mariani, B Maegaard, A Moreno, J Odijk, H Mazo, S Piperidis & H Loftsson (eds), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, European Language Resources Association (ELRA), pp. 896-902, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 5/26/14.

Finlayson MA, Halverson JR, Corman S. The N2 corpus: A semantically annotated collection of Islamist extremist stories. In Calzolari N, Choukri K, Goggi S, Declerck T, Mariani J, Maegaard B, Moreno A, Odijk J, Mazo H, Piperidis S, Loftsson H, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 896-902. (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).

Finlayson, Mark A. ; Halverson, Jeffry R. ; Corman, Steven. / The N2 corpus : A semantically annotated collection of Islamist extremist stories. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. editor / Nicoletta Calzolari ; Khalid Choukri ; Sara Goggi ; Thierry Declerck ; Joseph Mariani ; Bente Maegaard ; Asuncion Moreno ; Jan Odijk ; Helene Mazo ; Stelios Piperidis ; Hrafn Loftsson. European Language Resources Association (ELRA), 2014. pp. 896-902 (Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014).

@inproceedings{664947133e384294ba16bca670704011,

title = "The N2 corpus: A semantically annotated collection of Islamist extremist stories",

abstract = "We describe the N2 (Narrative Networks) Corpus, a new language resource. The corpus is unique in three important ways. First, every text in the corpus is a story, which is in contrast to other language resources that may contain stories or story-like texts, but are not specifically curated to contain only stories. Second, the unifying theme of the corpus is material relevant to Islamist Extremists, having been produced by or often referenced by them. Third, every text in the corpus has been annotated for 14 layers of syntax and semantics, including: referring expressions and co-reference; events, time expressions, and temporal relationships; semantic roles; and word senses. In cases where analyzers were not available to do high-quality automatic annotations, layers were manually double-annotated and adjudicated by trained annotators. The corpus comprises 100 texts and 42, 480 words. Most of the texts were originally in Arabic but all are provided in English translation. We explain the motivation for constructing the corpus, the process for selecting the texts, the detailed contents of the corpus itself, the rationale behind the choice of annotation layers, and the annotation procedure.",

keywords = "Multi-layered annotation, Narrative corpora, Religious texts",

author = "Finlayson, {Mark A.} and Halverson, {Jeffry R.} and Steven Corman",

year = "2014",

language = "English (US)",

series = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",

publisher = "European Language Resources Association (ELRA)",

pages = "896--902",

editor = "Nicoletta Calzolari and Khalid Choukri and Sara Goggi and Thierry Declerck and Joseph Mariani and Bente Maegaard and Asuncion Moreno and Jan Odijk and Helene Mazo and Stelios Piperidis and Hrafn Loftsson",

booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",

note = "9th International Conference on Language Resources and Evaluation, LREC 2014 ; Conference date: 26-05-2014 Through 31-05-2014",

}

TY - GEN

T1 - The N2 corpus

T2 - 9th International Conference on Language Resources and Evaluation, LREC 2014

AU - Finlayson, Mark A.

AU - Halverson, Jeffry R.

AU - Corman, Steven

PY - 2014

Y1 - 2014

N2 - We describe the N2 (Narrative Networks) Corpus, a new language resource. The corpus is unique in three important ways. First, every text in the corpus is a story, which is in contrast to other language resources that may contain stories or story-like texts, but are not specifically curated to contain only stories. Second, the unifying theme of the corpus is material relevant to Islamist Extremists, having been produced by or often referenced by them. Third, every text in the corpus has been annotated for 14 layers of syntax and semantics, including: referring expressions and co-reference; events, time expressions, and temporal relationships; semantic roles; and word senses. In cases where analyzers were not available to do high-quality automatic annotations, layers were manually double-annotated and adjudicated by trained annotators. The corpus comprises 100 texts and 42, 480 words. Most of the texts were originally in Arabic but all are provided in English translation. We explain the motivation for constructing the corpus, the process for selecting the texts, the detailed contents of the corpus itself, the rationale behind the choice of annotation layers, and the annotation procedure.

AB - We describe the N2 (Narrative Networks) Corpus, a new language resource. The corpus is unique in three important ways. First, every text in the corpus is a story, which is in contrast to other language resources that may contain stories or story-like texts, but are not specifically curated to contain only stories. Second, the unifying theme of the corpus is material relevant to Islamist Extremists, having been produced by or often referenced by them. Third, every text in the corpus has been annotated for 14 layers of syntax and semantics, including: referring expressions and co-reference; events, time expressions, and temporal relationships; semantic roles; and word senses. In cases where analyzers were not available to do high-quality automatic annotations, layers were manually double-annotated and adjudicated by trained annotators. The corpus comprises 100 texts and 42, 480 words. Most of the texts were originally in Arabic but all are provided in English translation. We explain the motivation for constructing the corpus, the process for selecting the texts, the detailed contents of the corpus itself, the rationale behind the choice of annotation layers, and the annotation procedure.

KW - Multi-layered annotation

KW - Narrative corpora

KW - Religious texts

UR - http://www.scopus.com/inward/record.url?scp=85030218102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030218102&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85030218102

T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

SP - 896

EP - 902

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Goggi, Sara

A2 - Declerck, Thierry

A2 - Mariani, Joseph

A2 - Maegaard, Bente

A2 - Moreno, Asuncion

A2 - Odijk, Jan

A2 - Mazo, Helene

A2 - Piperidis, Stelios

A2 - Loftsson, Hrafn

PB - European Language Resources Association (ELRA)

Y2 - 26 May 2014 through 31 May 2014

ER -

The N2 corpus: A semantically annotated collection of Islamist extremist stories

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this