Methodology for Global AI Talent Tracker

MacroPolo’s Global AI Talent Tracker yields insights into the balance and flow of AI research talent across countries, but these insights comprise just a few pieces of a much more complex puzzle. Completing that puzzle will require further efforts by researchers, but we hope our effort has opened the door much wider. 

This study is centered on answering these main questions:

  1. Where do top-tier AI researchers work today?
  2. What path did they take to get there?
  3. Where do top-tier AI researchers come from?

Below we explain our methodology and the choices we made in collecting and analyzing the data. 

Why focus on top-tier AI researchers?

There is a robust debate about what type of talent is most important to increase national or institutional AI capabilities. While some argue that countries should prioritize cultivating a large workforce of relatively lower-skilled AI engineers, others contend that it’s more important to prioritize developing and attracting elite researchers. 

This study isn’t intended to address that debate or take a position. We chose to focus on top-tier AI researchers because we believe this cohort is the most likely to lead the way on new areas of important and potentially breakthrough research, as well as to apply AI to highly complex real-world problems.

Why did we choose NeurIPS?

NeurIPS is generally recognized as one of the top conferences—or perhaps the top conference–in AI. The research presented at NeurIPS has a specific focus on theoretical advances in neural networks and deep learning, two of the subfields that have driven many of the recent advances in AI. 

NeurIPS is among the top two AI conferences in both the number of papers submitted and the selectivity of papers accepted. Given its popularity and selectivity, we used a random sample of papers accepted at the 2019 conference, representative of all papers accepted, and tracked the authors of those papers. 

Source

How we collected the author data

Given that NeurIPS 2019 had a total of 1,428 accepted papers, gathering granular educational and career information on all researchers is very time consuming and costly. Therefore we opted to select a random sample of 175 papers that contained a total of 675 authors. Sampling at the paper level has two positive attributes: it makes our sample both representative of the quality of papers accepted at the conference and allows us to make estimates at both the author level and the paper level. 

This sample yields a Confidence Level of 95% and a Margin of Error of 7% for estimates made about the entire population of 1,428 papers. For estimates made regarding subpopulations—such as the post-graduation employment of international students in the United States—there is a marginal decrease in the Confidence Level and an increase in the Margin of Error.

For the NeurIPS 2019 Oral Presentations (acceptance rate of 0.5%), there were 36 papers and 131 authors. We gathered the same career and educational data for these authors to serve as a proxy for the “most elite” (approximately top 0.5%) AI researchers. Given the smaller population, we were able to collect data for all 131 authors, yielding a true population statistic with zero Margin of Error. 

How we coded the author data:

For all authors in our sample, we used LinkedIn, personal websites, and other publicly available sources to gather the following information: (1) undergraduate university and country; (2) graduate university and country; (3) current institutional affiliation and country; (4) The country where the headquarters of the authors’ current affiliation is located (e.g. a researcher working at Google in Toronto would have their current country designated as “Canada” and headquarter country designated as “USA”); (5) whether the researcher is currently a graduate student; (6) institution type: private sector vs. academia.

Current country & HQ country:

We coded this separately because the “HQ country” is a better measure of different countries’ AI capabilities, while “current country” is a measure of the geographic distribution of labor. In other words, the flag that a particular entity or firm flies is a better proxy for the capabilities of that country than where the labor of that entity is disbursed. 

The gap between results derived from “HQ country” and “current country” metrics is relatively minute, meaning a large majority of these researchers are working in the country of their institution’s headquarters. In the very small number of cases where there is divergence between HQ and current location, we chose a metric based on the variable of interest (capability vs. geography).

For authors who have changed their affiliation since publishing the paper in our sample, we used their updated affiliation rather than the affiliation listed on the paper.

Multiple institutional affiliations:

If authors list multiple affiliations on a paper, we used the affiliation from the email listed as their contact information on the paper. (This applies to all statistics in the study except in the case of institutional rankings.)

Institution rankings:

For our ranking of the top 25 institutions, we gathered data for the entire population of 1,428 accepted papers and used a “fractional count” method to assign credit to institutions. In a fractional count, each paper is given a value of 1, and that value is then divided up equally between authors. Authors with multiple affiliations have their share for that paper divided equally between their institutions. 

For example, consider a paper that was co-authored by two researchers: one affiliated solely with Tsinghua University and the other with dual affiliations at Stanford University and Google. Tsinghua would be credited with a count of 0.5, while Stanford and Google would both receive 0.25 each. We adopted a fractional count method for this metric because it is representative of the role of the institution in generating a research paper.

Graduate school and country: Master’s vs. PhD

When coding the institution and country affiliation for an author’s graduate school, we used the highest degree they have pursued. For example, if the author received a Master’s from Tsinghua University in China and is pursuing a PhD from Stanford University, we coded Stanford University as their graduate institutional affiliation and “USA” as their graduate country affiliation.

Regional categorizations included in data:

Asia: China (includes Hong Kong), India, Iran, Mongolia, Malaysia, Japan, Pakistan, Philippines, Vietnam, Russia, South Korea, Singapore, Taiwan

Europe: Austria, Belgium, Croatia, Czech Republic, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Netherlands, Poland, Romania, Spain, Sweden, Switzerland, Yugoslavia

United Kingdom: England, Scotland, Wales, Northern Ireland

Note that there are some countries in Asia and Europe that are not included in the above lists, because no authors from those countries appeared in our sample. In cases where a country or a region has fewer than 10 author affiliations from our sample, they are counted as “Others”.

Comparable Research Literature and Further Readings:

Our study emphasizes the depth and granularity of information on AI researchers in our dataset, while other comparable studies have emphasized greater breadth by looking at the current affiliations of authors at multiple conferences. These studies can both expand our understanding into different tiers of AI research and act as cross-checks on the validity of the above results. 

  1. A 2019 study by Gleb Chuvpilo looked at the researchers behind papers at NeurIPS and ICML, another top machine learning conference. The study used the fractional count method and compiled lists of the top regions, countries, and institutions represented at the conferences. Despite drawing on data from both ICML and NeurIPS, and using Fractional Count as opposed to author totals, the results from Chuvpilo’s study mapped closely onto ours, all falling well within the margin of error for our sample. 
  2. An earlier report by JF Gagne, founder of Element AI, looked at publications across 21 conferences in 2018, analyzing the gender breakdown, graduate school, and current institutional affiliations of the papers’ authors. This larger dataset captured more of the middle-tier of research papers than our study, making it not directly comparable to our results. But one metric from the study looked exclusively at “high-impact” papers (top 18%, based on a combination of conference selection and citations). The country shares for the “high-impact” papers also map very closely onto our results, again all falling well within the margin of error.

Citation-based metrics vs. Publication-based metrics:

One main methodological divide among metrics assessing AI research capabilities is between metrics based on citations and those based on conference publications. While both methods can bring valuable insights, in this study we have opted for conference publications. We believe that the acceptance metric for a large and selective conference like NeurIPS strikes a good balance between the quantity and the quality of research papers considered.

Citation counts can provide a measure of quality, but we believe they are more susceptible to irregularities and behaviors that do not necessarily reflect the quality and importance of the research. Examples of this include outsized citation counts for survey papers (“easy cite”); outsized counts for papers on a highly specific topic (“only cite”); gaming of citation counts via “citation cartels”; and cultural biases that affect the visibility of research by different groups. 

On balance, studies based on citation counts tend to credit China with a substantially larger share of global AI research, particularly when the threshold for the number of citations required decreases, and the body of papers in the dataset grows very large (above 100,000).

Metrics based on conference acceptances are also subject to some irregularities, including those due to biases held by the reviewers of papers. Some of these biases can be muted by a double-blind review process, but these mechanisms remain imperfect. (NeurIPS 2019 was predominantly double-blind, with the exception of being single-blind for senior area chairs and program chairs.) Further biases in NeurIPS attendees stem from the geographic location of the conference, though attending the conference is not a requirement for a paper’s acceptance.

While acknowledging the limitations of a conference-based approach, on balance we believe that it captures a large and meaningful sample of the researchers that are driving forward the fields of AI and machine learning and making an impact on private sector companies.

Future research:

Future studies could add greatly to our understanding of AI talent. The term “artificial intelligence” encompasses a wide array of techniques and sub-fields, such as computer vision, natural language processing, robotics, and reinforcement learning. Crafting better science and technology policy will require a more granular understanding of the strengths and weaknesses in each sub-field and how they compare across countries. This is a project we hope to engage in going forward.

Credits

Research Design: Matt Sheehan, Ishan Banerjee

Data Lead: Ishan Banerjee

Page Design: Annie Cantara, Young Kim

Development: Chris Roche

For questions or comments, please email Matt Sheehan (msheehan@paulsoninstitute.org) or Damien Ma (dma@paulsoninstitute.org).