Big Data and Social Science
Big Data and Social Science: A Practical Guide to Methods and Tools,
Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, and Julia Lane (editors),
Chapman and Hall/CRC Press, 2016. $49.95 hardback ISBN 9781498751407.

Download the book flyer and use the included promo code to save 20% when you order from CRC Press.

Background

The world has changed for empirical social scientists. The new types of “big data” have generated an entire new research field—that of data science. That world is dominated by computer scientists who have generated new ways of creating and collecting data, developed new analytical and statistical techniques, and provided new ways of visualizing and presenting information. These new sources of data and techniques have the potential to transform the way applied social science is done.

Research has certainly changed. Researchers draw on data that are “found” rather than “made” by federal agencies; those publishing in leading academic journals are much less likely today to draw on preprocessed survey data.

The way in which data are used has also changed for both government agencies and businesses. Chief data officers are becoming as common in federal and state governments as chief economists were decades ago, and in cities like New York and Chicago, mayoral offices of data analytics have the ability to provide rapid answers to important policy questions. But since federal, state, and local agencies lack the capacity to do such analysis themselves, they must make these data available either to consultants or to the research community. Businesses are also learning that making effective use of their data assets can have an impact on their bottom line.

And the jobs have changed. The new job title of “data scientist” is highlighted in job advertisements on CareerBuilder.com and Burning-glass.com—in the same category as statisticians, economists, and other quantitative social scientists if starting salaries are useful indicators.

The goal of this book is to provide social scientists with an understanding of the key elements of this new science, its value, and the opportunities for doing better work. The goal is also to identify the many ways in which the analytical toolkits possessed by social scientists can be brought to bear to enhance the generalizability of the work done by computer scientists.

We take a pragmatic approach, drawing on our experience of working with data. Most social scientists set out to solve a real- world social or economic problem: they frame the problem, identify the data, do the analysis, and then draw inferences. At all points, of course, the social scientist needs to consider the ethical ramifications of their work, particularly respecting privacy and confidentiality. The book follows the same structure. We chose a particular problem—the link between research investments and innovation— because that is a major social science policy issue, and one in which social scientists have been addressing using big data techniques. While the example is specific and intended to show how abstract concepts apply in practice, the approach is completely generalizable. The web scraping, linkage, classification, and text analysis methods on display here are canonical in nature. The inference and privacy and confidentiality issues are no different than in any other study involving human subjects, and the communication of results through visualization is similarly generalizable.