German English

Affiliation Analysis

Bibliometric studies of computer science and database publications to date mainly focus on the number of papers and citations per author or per journal. As (commercial) bibliographic systems concentrate on journals, there is only little analysis regarding the affiliations of authors in computer science and database research.

We analyze author affiliations of publications to determine the main institutions contributing research to a specific field. For instance, we determine top affiliations in terms of number of papers (productivity) and also aggregate the numbers at varying level of detail, e.g. cities, countries, and continents.

Author affiliations in publications are given in quite heterogeneous form. Before any analyses on these data can be undertaken, the affiliation mentions denoting the same real world institutions have to be aligned. For this, we investigated into web-based affiliation recognition, matching, and clustering (cf. our publications).

Interpreting multiple-author papers as collaborations, bonds within and across institutions, cities, countries, and continents become visible (e.g. see illustration).


Illustrating collaborations within and across major countries publishing database research

Project Members

Publications

PDF
further information
Google Scholar
Aumueller, D.; Rahm, E.
Affiliation analysis of database publications
ACM SIGMOD Record, Vol. 40, No. 1, pp 26-31, March 2011
2011-03-31
PDF
further information
Google Scholar
Aumueller, David; Rahm, Erhard
Web-based Affiliation Matching
14th International Conference on Information Quality 2009 (ICIQ’09)
2009-11
PDF
further information
Google Scholar
Aumueller, David
Retrieving metadata for your local scholarly papers
BTW 2009
2009-03
PDF

Google Scholar
Aumueller, David
Towards web supported identification of top affiliations from scholarly papers
BTW 2009
2009-03

See also Citation Analysis and Semantic Content

Dataset example

With the following archive we provide some of our data for download – contained therein:

  • affiliation strings, mostly as available from ACM, though in cases also the original PDFs were taken into account
  • correspondences between affiliation strings on institution level, i.e. neglecting departments etc.

Download: affiliationstrings.zip

Note: Other object matching datasets available via Benchmark datasets for entity resolution.

Exemplary results of ten years of database publications

The following tables present initial results of an affiliation analysis of publications of the last decade (2000–2009) that appeared in the top conferences SIGMOD and VLDB and in the VLDBJ and TODS journals. It is also browsable along affiliation via our publication categorizer.

Notes on table headings:

  • papers: productivity of regarded entity using total counting of papers
  • frac: fractional counting (other columns always total counting)
  • affils: number of affiliations within entity
  • years 2000–2004 and 2005–2009 as first and second, respectively
contyaffilspapersfrac2000_20042005_2009researchindustrialdemovldbsigmodvldbjtods
North America2841983176783011531397258328870771188154
Europe217642504257385436461603191678868
Asia9551339018832540729772191886046
S.H.28795121586331336121714

Data overview on continental level (subsuming Africa, Oceania, and South America into Southern Hemisphere)


periodpapersvldbsigmodvldbjtodsresearchconf_resindustrialdemoconfjournal
first half11205344169971757587151212950170
second half159670156419014111478161602891265331
decade27161235980289212190414033115012215501

Summary per five year spans and decade


venuepapers2000_20042005_2009researchindustrialdemo
vldb1235534701805180250
sigmod980416564598131251
vldbj2899919028900
tods2127114121200

Base data by venue


yearpapersvldbsigmodvldbjtodsresearchconf_resindustrialdemoconfjournal
20001888676141212195313616226
200120392762312138103283716835
2002212106742111143111323718032
2003225111792015163128174519035
20042921391112121192150435725042
20052931331082428202150385324152
2006276126942036197141265322056
20073171391272526226175276426651
20083601461236427270179306026991
20093501571125724252171395926981

Base data by year


countryaffilspapersfrac2000_20042005_2009researchindustrialdemovldbsigmodvldbjtods
USA2601868163178710811316247305816733179140
Germany692431841081351472373129682620
Canada23228136951331602345104831823
China292111514916217633281753124
Singapore51167534821021134449176
France3288585038566264918165
Italy28885837515732836241216
India2087615136561318443436
Switzerland106750155241620401863
Australia1759401247463102810138
United Kingdom11563513434628261569
Israel955411738431112415106
Korea11503522284361222143
Greece104832173144041415163
Denmark534211519273420455
The Netherlands1033241518262516863
Japan132519178174411923
Austria515108782511103
Belgium413821112013208
Spain9106558113232

Top countries


authorpapers2000_20042005_2009researchindustrialdemo
Surajit Chaudhuri5022283848
Divesh Srivastava4925243928
Nick Koudas4413313626
Jiawei Han4116253407
H. V. Jagadish3816223305
Beng Chin Ooi3615213204
Alon Halevy3618183033
Minos Garofalakis3619173501
Raghu Ramakrishnan358273221
Yufei Tao3413213400
Philip S. Yu3310232841
Kian-Lee Tan3214182705
Jeffrey Naughton3218143110
Dimitris Papadias3216163200
Gerhard Weikum2992017012
Laks V. S. Lakshmanan282172305
Jennifer Widom271892214
Samuel Madden276212214
Dan Suciu2713142502
Elke A. Rundensteiner27121510017

Top authors


countryaffilspapersfracfirstsecond
USA18013161132545771
China2417612437139
Canada15160936199
Germany501471096978
Singapore4102642478
Italy2357362235
France2456373224
India1456383026
Australia144631541
United Kingdom1046291036

Countries by research papers only


countryaffilspapersfracfirstsecond
USA115247231118129
Canada102316149
Germany2223171112
India1013976
France86424

Countries by industrial papers only


countryaffilspapersfracfirstsecond
USA116305267124181
Germany3573582845
Canada1145272025
China1932231220
Italy1628201315

Countries by demo papers only