Affiliation Analysis
Bibliometric studies of computer science and database publications to date mainly focus on the number of papers and citations per author or per journal. As (commercial) bibliographic systems concentrate on journals, there is only little analysis regarding the affiliations of authors in computer science and database research.
We analyze author affiliations of publications to determine the main institutions contributing research to a specific field. For instance, we determine top affiliations in terms of number of papers (productivity) and also aggregate the numbers at varying level of detail, e.g. cities, countries, and continents.
Author affiliations in publications are given in quite heterogeneous form. Before any analyses on these data can be undertaken, the affiliation mentions denoting the same real world institutions have to be aligned. For this, we investigated into web-based affiliation recognition, matching, and clustering (cf. our publications).
Interpreting multiple-author papers as collaborations, bonds within and across institutions, cities, countries, and continents become visible (e.g. see illustration).

Illustrating collaborations within and across major countries publishing database research
Project Members
Publications
| |||
| |||
| |||
|
See also Citation Analysis and Semantic Content
Dataset example
With the following archive we provide some of our data for download – contained therein:
- affiliation strings, mostly as available from ACM, though in cases also the original PDFs were taken into account
- correspondences between affiliation strings on institution level, i.e. neglecting departments etc.
Download: affiliationstrings.zip
Note: Other object matching datasets available via Benchmark datasets for entity resolution.
Exemplary results of ten years of database publications
The following tables present initial results of an affiliation analysis of publications of the last decade (2000–2009) that appeared in the top conferences SIGMOD and VLDB and in the VLDBJ and TODS journals. It is also browsable along affiliation via our publication categorizer.
Notes on table headings:
- papers: productivity of regarded entity using total counting of papers
- frac: fractional counting (other columns always total counting)
- affils: number of affiliations within entity
- years 2000–2004 and 2005–2009 as first and second, respectively
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Data overview on continental level (subsuming Africa, Oceania, and South America into Southern Hemisphere)
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Summary per five year spans and decade
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Base data by venue
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Base data by year
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Top countries
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Top authors
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Countries by research papers only
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Countries by industrial papers only
SQLSTATE[HY000] [2003] Can’t connect to MySQL server on ‘dbserv2’ (113)
Countries by demo papers only