Loading…
Loading grant details…
| Funder | National Science Foundation (US) |
|---|---|
| Recipient Organization | University of Texas At Dallas |
| Country | United States |
| Start Date | Aug 01, 2023 |
| End Date | Jul 31, 2026 |
| Duration | 1,095 days |
| Number of Grantees | 4 |
| Roles | Principal Investigator; Co-Principal Investigator |
| Data Source | National Science Foundation (US) |
| Grant ID | 2311142 |
This project intends to revolutionize computerized data extraction for conflict scholars, security analysts, and practitioners who for decades have devoted significant resources to monitor, understand, and predict armed violence, social protests, and other politically relevant events worldwide. Currently, the vast majority of conflict event data are expensively coded by humans from increasingly large volumes of news reports.
This project uses recent advances in artificial intelligence and large language models to address this fundamental issue for conflict research. It builds on earlier NSF efforts that created a publicly available large language model to study inter- and intra-state conflict and armed violence, called ConfliBERT. This project expands the ConfliBERT model to multilingual settings, including Arabic and Spanish.
This will help researchers and policymakers better understand the context of local events and create a continuous data analysis process by feeding in current news stories to identify new political actors and events in real time. As the project's cyberinfrastructure develops, the research community will be empowered through training, education, and outreach with groups at local, national, and international levels, including academics and government.
In the last five years, state-of-the-art language models have revolutionized the field of natural language processing (NLP). In particular, there have been significant advances in the use of domain-specific models for understanding social processes. Our research and that of other experts in this field demonstrate how ConfliBERT outperforms prior models for coding and understanding conflict and violence from raw text (Hu, et al. 2022, Haffner, et al. 2023).
This project supports new NLP developments for conflict research and expands their access to the academic and policy communities. Specifically, it builds on earlier NSF efforts that led to the development of ConfliBERT, a domain-specific language model, publicly available at Hugging Face, trained on an expert-curated corpus about conflict and political violence (Hu et al. 2022).
This project will integrate, extend, and apply ConfliBERT and our related innovations (e.g., actor detection for network construction) into a sustainable ecosystem to engineer data from text. It will expand ConfliBERT to multilingual settings including Arabic and Spanish, update the corpora in sustainable ways and retrain ConfliBERT on a continuous basis, provide new political network data, and develop language models for users to create customized datasets and applications.
All developed cyberinfrastructure is and will continue to be broadly accessible for the community of researchers, analysts, and others with interests in conflict dynamics, security studies, and international relations.
This project funded by the NSF Office of Advanced Cyberinfrastructure is jointly supported by the Directorate for Social, Behavioral, and Economic Sciences, and the Directorate for STEM Education.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
University of Texas At Dallas
Complete our application form to express your interest and we'll guide you through the process.
Apply for This Grant