🧠AI for Data Use: Dataset Extraction
🧪 Try these examples
AI for Data Use: Dataset Extraction
This tool identifies dataset mentions (e.g., Demographic and Health Survey, Living Standards and Measurement Survey, etc.) and extracts contextual metadata such as:
- publisher
- publication year
- reference year
- geography
- acronym
- reference population
- data description
- data type
- usage context
Usage Context Definitions
- Primary mention – the dataset is the main source of analysis or results in the study.
- Supporting mention – the dataset is used alongside other data to complement or validate findings.
- Background mention – the dataset is mentioned for context or comparison but not used in the actual analysis.
How to Use
- Paste or type text into the input box (left), or select one of the provided examples.
- Click 🚀 Run Extraction to process the text.
- The model will highlight all detected dataset mentions and related entities (e.g., publisher, geography, year, usage context) directly in the text.
- Below the highlights, a deduplicated relation tree will automatically appear, showing each dataset with its extracted metadata and filtered attributes.
- You can click 🧠Show / Refresh Relation Tree anytime to rebuild or inspect the deduplicated metadata view.
Resources
- Model: https://huggingface.co/rafmacalaba/datause-extraction-v3-finetuned
- Paper (ArXiv): https://arxiv.org/pdf/2502.10263
- GLiNER Repo: https://github.com/urchade/GLiNER
- Project Docs: https://worldbank.github.io/ai4data-use/docs/introduction.html