Biomedical informatics techniques for processing and analyzing Web blogs of military service members

Sergiy Konovalov, Matthew Scotch, Lori Post, Cynthia Brandt

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Introduction: Web logs ("blogs") have become a popular mechanism for people to express their daily thoughts, feelings, and emotions. Many of these expressions contain health care-related themes, both physical and mental, similar to information discussed during a clinical interview or medical consultation. Thus, some of the information contained in blogs might be important for health care research, especially in mental health where stress-related conditions may be difficult and expensive to diagnose and where early recognition is often key to successful treatment. In the field of biomedical informatics, techniques such as information retrieval (IR) and natural language processing (NLP) are often used to unlock information contained in free-text notes. These methods might assist the clinical research community to better understand feelings and emotions post deployment and the burden of symptoms of stress among US military service members. Methods: In total, 90 military blog posts describing deployment situations and 60 control posts of Operation Enduring Freedom/Operation Iraqi Freedom (OEF/OIF) were collected. After "stop" word exclusion and stemming, a "bag-of-words" representation and term weighting was performed, and the most relevant words were manually selected out of the high-weight words. A pilot ontology was created using Collaborative Protégé, a knowledge management application. The word lists and the ontology were then used within General Architecture for Text Engineering (GATE), an NLP framework, to create an automated pipeline for recognition and analysis of blogs related to combat exposure. An independent expert opinion was used to create a reference standard and evaluate the results of the GATE pipeline. Results: The 2 dimensions of combat exposure descriptors identified were: words dealing with physical exposure and the soldiers' emotional reactions to it. GATE pipeline was able to retrieve blog texts describing combat exposure with precision 0.9, recall 0.75, and F-score 0.82. Discussion: Natural language processing and automated information retrieval might potentially provide valuable tools for retrieving and analyzing military blog posts and uncovering military service members' emotions and experiences of combat exposure.

Original languageEnglish (US)
Pages (from-to)e45p.1-e45p.8
JournalJournal of medical Internet research
Issue number4
StatePublished - 2010
Externally publishedYes


  • Blogging
  • Combat disorders
  • Information storage and retrieval
  • Medical informatics
  • Military personnel

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Biomedical informatics techniques for processing and analyzing Web blogs of military service members'. Together they form a unique fingerprint.

Cite this