Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PDFDataExtractor.egg-info		PDFDataExtractor.egg-info
SI		SI
demo		demo
dist		dist
docs		docs
pdfdataextractor		pdfdataextractor
templates		templates
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Repository files navigation

PDFDataExtractor

PDFDataExtractor is a toolkit for automatically extracting semantic information from PDF files of scientific articles, which features a template-based architecture with abilities to extract information from the following publishers, and more templates are currently under development:

| Elsevier
| Royal Society of Chemistry
| Advanced Material Families (Wiley)
| Angewandte
| Chemistry A European Journal
| American Chemistry Society

This guide provides a quick tour through PDFDataExtractor concepts and functionalities.

Features

| Extract metadata information from scientific PDFs, including: title, anthor, abstract, journal name, journal year, journal volume, journal page number, doi, keywords, figure captions, section titles, heading, page number and references
| Chemistry-aware PDF information extraction
| Outputs PDF articles in plain text, JSON
| Extract articles from seven main stream chemistry and physics publishers with high precision
| Automated publisher detection
| Automated articles download from reference

Developing Features

Web services for a more user friendly experience
Supports for more publishers

Citing

PDFDataExtractor:

The paper is currently under review. This project was financially supported by the Science and Technology Facilities Council (STFC), the Royal Academy of Engineering (RCSRF1819\7\10), and BASF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDFDataExtractor

Features

Developing Features

Citing

About

Languages

License

cat-lemonade/PDFDataExtractor

Folders and files

Latest commit

History

Repository files navigation

PDFDataExtractor

Features

Developing Features

Citing

About

Resources

License

Stars

Watchers

Forks

Languages