Methodology - MacroPolo Methodology - MacroPolo

Methodology

For questions or comments about the project, you can contact Matt Sheehan (msheehan@paulsoninstitute.org) or Joy Dantong Ma (jma@paulsoninstitute.org).

Acknowledgment

MacroPolo would like to thank Jeffrey Ding, a D.Phil Researcher at Oxford’s Governance of AI Program (GovAI), for his generous support and involvement in this project. GovAI strives to help humanity capture the benefits and mitigate the risks of AI through research on the political challenges arising from AI.

Disclosure: Matt Sheehan, one of the product managers for ChinAI, previously did contract work for Dr. Kai-Fu Lee, founder of the venture capital firm Sinovation Ventures. Sinovation Ventures is one of the dozens of investors represented on The Companies relationship map. Matt is no longer employed by Dr. Lee, and his contributions to ChinAI were conceived and executed independently of that past work.

The Data

All information for The Data was drawn from publicly available sources, namely a detailed examination of WeChat Wallet features, the third-party partner companies that provide many of the services, and WeChat’s own privacy policy. In analyzing this data environment, it’s important to note a couple of caveats.

First, the functions presented in the above section are not universal across all users’ WeChat Wallets. Different users in different markets often have access to slightly different functionality within the app. The functions presented here are intended as a representative sample of some of the most common and most interesting services available to users.

Second, though all of these services can be accessed through the WeChat Wallet, not all of the data generated are accessible by Tencent, the parent company of WeChat. Tencent has access to data generated by buttons in the top half of the wallet: payment functions at the top, as well as buttons under the “Powered by Tencent” label. For the buttons listed as “Powered by third-party operator,” the data generated belong to these third-party companies (Mobike, Meituan-Dianping, etc.), who are in principle not obligated to share the data with Tencent.

Finally, though this feature describes the data generated within WeChat, that is not a guarantee that such data were put to productive use by Tencent or the third-party providers. Tencent has recently been criticized by investors and observers for failing to effectively monetize the vast troves of data it has generated. From this perspective, the underutilized data could be a signal of different things: the inability of a Chinese tech giant to adapt to the AI age, or a vast untapped resource that will drive the company to greater heights and profitability.

The Talent

To map China’s AI talent, we used three main sources. First, we analyzed the change in the rankings of the top 15 universities in AI research from the years 2013-2014 to the years 2017-2018. We used CSRankings (Emery D. Berger), which ranks top computer science institutions around the world based on their publication counts in the most prestigious publication venues in various computer science research areas. Through consultations with experts in the AI community, we selected three conferences as proxies for top publication venues in AI: Conference on Neural Information Processing Systems (NIPS), International Conference on Machine Learning (ICML), and Association for the Advancement of Artificial Intelligence (AAAI).

Second, we focused on an even more selective AI publication channel: papers accepted for oral presentations at the NIPS conference in 2017 (around 1% acceptance rate). Taking these publications to represent the “best-in-class” research on AI, we scanned public profiles of researchers involved with these publications to determine two key data points: 1) where these researchers currently work; 2) where the researchers did their undergraduate studies. For the corresponding figures, we included the top 10 countries where researchers were currently based, as well as the top 10 countries where researchers completed their undergraduate studies. We dropped researchers from the sample in cases where we could not find publicly available data points. (Note: For researchers working at companies, we took that company’s headquarters as the “current country affiliation.”)

Third, we examined the flows over time of a Chinese AI talent pool: Microsoft Research Asia Fellowship (MSRA) awardees, a cohort that includes some of the most promising Chinese AI researchers. MSRA fellows are current PhD students at universities in Asia who receive both a scholarship for their studies as well as an internship opportunity at MSRA in Beijing. Past MSRA fellows have gone on to build China’s top AI startups (e.g. Xu Li, Co-Founder and CEO of Sensetime; Chang Huang, Co-Founder and VP of Horizon Robotics), lead AI research at China’s top tech giants and universities (e.g. Changhu Wang, director of Bytedance AI lab; Xuanzhe Liu, Associate Professor at Peking University), and take on leading roles in US tech giants (e.g. Chongyang Ma, senior research engineer at Snap; Wei Wu, Principal Applied Scientist Lead in Microsoft XiaoIce Team).

While MSRA fellows include PhD students from across Asia, including Singapore and Australia, we focus on the career trajectories of the 2009-2010 MSRA fellows cohort who received PhD funding at universities based in China, Taiwan, and Hong Kong. Our sample includes 33 of the 35 MSRA fellows in this cohort, based on publicly available data. To highlight the flows of AI talent, we decided to select three data points: 1) the location where they received their PhD (all were students in China, Taiwan, and Hong Kong); 2) the location of their first job after completing the PhD; and 3) the location of their current job.

Some MSRA fellows work for US companies in China (e.g. Xulian Peng works for Microsoft in Beijing) or, conversely, work for Chinese companies based in America (e.g. Yu-Chen Sun works for Alibaba but is based out of Seattle). For the purpose of the visualization, we used each fellow’s current work location rather than where their company’s headquarters are based. A list of past MSRA Fellows can be found here.

The Companies

Information presented in this section is collected through commercial databases, media coverage, industry reports, corporation information and regulatory filings, from both Chinese and English sources. By sorting through this information, we curated a list of AI companies, including both industry giants and leading startups, and highlighted major investors in these companies. While not an exhaustive list of all the important players, this sampling of companies constitutes a core part of the corporate AI ecosystem in China.

Among AI startups, we further tagged the companies according to their leading application areas: autonomous vehicles, facial recognition, voice and speech, robotics and automation, semiconductors, healthcare, and fintech and business intelligence. For each application area, we select 3-5 major startups. The relationship map connecting these companies aims to reflect the dynamics of interconnectedness and competition in China’s AI scene.

The Plan

The cities and projects highlighted in this section represent a small sample of AI-related projects deployed in many of China’s largest cities and provincial capitals. The projects highlighted here are not a complete list of AI efforts in these cities, or even the most high-profile projects in each place. Instead, they were chosen to represent the diversity of initiatives currently underway across the country. The information presented is drawn from local government AI promotion plans, public documents on government tenders and procurements, news reports, and interviews.

We have coded the projects according to four categories: Subsidies, Surveillance, Infrastructure Adaptation, and Education. There is some overlap between these categories (some subsidies purchase surveillance equipment, or go toward corporate research), but we believe categorizing the projects shines light on the different ways the plan is being implemented, and the relative distribution of these activities.

Surveillance technology—primarily, facial recognition systems—is the single most common application of AI being implemented by local governments today, in part because there already exist hundreds of ready-to-order products that meet local police demand for monitoring the population. The Subsidies category includes projects offering a financial reward for certain AI activities (often in the form of cash rebates or free land use), or direct government investment in a project.

In the Infrastructure Adaptation category, we’ve included projects in which local institutions either alter public infrastructure to accommodate AI applications (for example, setting aside land for autonomous vehicle testing), or procure AI products for use at public institutions (such as buying AI speech platforms for courtrooms). Finally, the Education category includes the establishment of new AI-related programs or facilities at Chinese universities.

Transparency and Data Sharing

In the spirit of transparency and encouraging further research in this space, we are sharing much of the data that we gathered in the process of creating ChinAI. Certain areas of the dataset have been omitted or coded in order to protect individual privacy. You can access the public repository through our GitHub profile here. The usage of this data is limited to non-commercial, research purposes. MacroPolo reserves the right to modify the setting for sharing data.