Resman
From LiquidPubWiki
Contents |
Notes on getting data from different data sources
DBLP
Datasource
DBLP provides all its data in an xml file, available at [1]. Very nice description of DBLP data is available at [1]. Some problems, related to data quality, are described in [2]
API
DBLP provides some basic API for retrieving info about people and contributions. See this doc for a description.
Scripts and tools
(Nick, please add the info below)
Scripts for loading DBLP data are at ...
Parcers are at ...
Plans for uploading of DBLP data
(Nick, please edit as you see fit)
1. Load all authors from dblp.xml
2. .lauthors
...
When reloading DBLP data (loading updates), we can use mdate field of each record to see when it was changed the last time (and load only records with date >= last upload date).
They also mention (page 6) an interesting algorithm for disambiguating authors, it would be nice to have this as a service some day.
They also describe[1] how to act when need split/merge ids.
Finally, there are .bht files that can be viewed as appendices to existing proceedings records, connected via url.
Notes
There is also a list of authors available at [2]. Note that there are only differenty syntactic names of authors, without IDS, etc. Homonyms can be retrieved using url like
.../rec/pers/m/Meier 0002:Michael/xk
and you'll get
<?xml version="1.0"?> <dblpperson name="Michael Meier"> <homonym>m/Meier:Michael</homonym> <homonym>m/Meier 0004:Michael</homonym> <homonym>m/Meier 0003:Michael</homonym> <dblpkey type="person record">homepages/m/MichaelMeier2</dblpkey> <dblpkey>conf/edbt/LausenMS08</dblpkey> <dblpkey>journals/corr/abs-0812-3788</dblpkey> </dblpperson>
SpringerLink
Link to the source, data description, example, API, link to developed xslt (see ex above)
CiteULike
Link to the source, data description, example, API, link to developed xslt (see ex above)
References
- ↑ 1.0 1.1 Ley, M. (2009). DBLP - Some Lessons Learned. PVLDB'2009, 2, pp.14935-1500
- ↑ Ley, M., Reuther, P. (2006). Maintaining an Online Bibliographical Database: The Problem of Data Quality. EGC'2006, Actes des sixiemes journees Extraction et Gestion de Connaissances, pp.5-10
