Adapting Social Science to a Data-Driven World

Data science is an inherently social tool, let’s use it as such.

Emily Burns
The Startup

--

Photo by h heyerlein on Unsplash

You’ve heard it before: This is the era of data. The integration of big data, tech, and innovation is apparent in the boom of data science roles currently permeating the job market and companies’ evolving ability to tap into human-tech interaction and online behaviors.

This transition into the data era isn’t exclusive to Silicon Valley and its tech gods. Data is everywhere. The demand for statistics-savvy employees and machine learning specialists is field-blind when it comes to research, impacting everything from oncology to the auto industry.

Too General a Field

As the reach of data science expands, the call for specialization is becoming louder. The global ‘data scientist’ title is no longer an apt description for the increasingly diverse employees holding the position. Let’s take a look at a few common expectations of the role:

  • Interdisciplinary knowledge of statistics, computer science, mathematics, and business
  • Ability to produce translatable insight via machine learning algorithms to provide company or academic value
  • Familiarity with subtopics such as big data, data mining, data visualization, and specific domain knowledge

Sounds a bit broad, doesn’t it? This generality can incite confusion around role requirements, improbable and overwhelming employee training, and unfocused problem solving within the workspace. Explicit data science specializations will allow for more functionally directed work and concise title expectations. For an excellent breakdown of the topic, see Thomas Nield’s popular post.

Enter Social Data Science

This is one such specialization emerging from the data revolution. This specialization may sound counterintuitive at first. Data science is driven by computer science and mathematics expertise, while social science fields (think Psychology, Economics, Sociology, Public Health, etc.) have historically been associated with “soft” skills focused on understanding human behavior and societal trends. However, although data science and social science have seemingly opposing methodologies, we have to remember that data itself is inherently social. Data is the product of measuring human behavior, the central theme of social science. These fields are not mutually exclusive.

Social Data Science: Data science for social phenomena. Photo by Brooke Cagle on Unsplash

Throughout the data revolution, often the first question asked has been what data science can do for social science. Social data science asks the subsequent question: how can we integrate our knowledge of society, computer science, and statistics into a cohesive field to better interpret the behavioral data all around us? This specialization recognizes that to fully understand the complexity of a technological society, we must likewise capitalize on the evolving, interdisciplinary methods now at our disposal.

“…Social data science offers a new science of studying social data combining theories, concepts, and methods from the social sciences and the computational sciences.” — University of Oxford, Social Data Science MSc

Social data science is not just an idealistic hypothetical. Students can complete undergraduate, graduate, or post-doctorate work in the specialization at academic institutions such as the University of Copenhagen and the University of Helsinki. In the States, social data science also occasionally moonlights as computational social science. Such curriculum is interwoven with quantitative and qualitative methodologies, simultaneously applying social theory to machine learning models predicting complex human behavior, health, and hierarchies.

A Revolution Still in Progress

Despite the innovative work and positive impact in-field pioneers these interdisciplinary environments cultivate, social data science has yet to reach mainstream status at many universities. Indeed, the preceding question (what can data science do for social science?) is still wanting in response. Mathematics and computational modeling are often dreaded elective courses in social science curriculum. Even basic data analytic skills such as programming fluency, effective pipelining, and data transformation standards are not as innate among students and research staff as one would hope.

Why is this lack of integration occurring if data is irrevocably intertwined with the social world?

Social sciences can be slow to change, the impact of human subjects and lingering nostalgia for un-operational ‘Grand Theory’ abstracts sometimes leaving the field behind in a constantly evolving technological world. And, in human-to-human interaction founded fields where classically “soft” skills are exalted, technology can feel like a conflicting interest to those joining with less technical savvy or enthusiasm.

In a nutshell, technical people tend to pursue technical degrees, social people tend to pursue social degrees. Interdisciplinary unicorns (like data scientists) actively fostered in traditionally independent fields are still, well, unicorns. Being two things at once isn’t easy.

Okay, but does this lack of integration actually have a negative impact on the social sciences?

The short answer is yes. Throughout my own ongoing time within the social science realm, it hasn’t been wholly uncommon to see hypotheses built and tested on already-analyzed data, while advanced statistics and good data practices remain a handful of applied courses or ‘learn-on-the-fly’ necessity to fuel an already in-progress study. This bottom-up approach to formulating “hindsight” hypotheses can revive confounds of poor reproducibility and generalizability (and isn’t exactly statistically sound). Take a look at Cassie Kozyrkov’s helpful post for more insight here.

Remember, even if we think we have an inherent understanding of psychological phenomena because we ourselves are human, it doesn’t mean we’re utilizing the right tools to measure them. And we can’t know we’re using the wrong tools if we’re never exposed to the right ones to begin with.

“Human beings are poor examiners, subject to superstition, bias, predjudice, and a PROFOUND tendency to see what they want to see rather than what is really there.” — M. Scott Peck

Without the right tools minimizing bias, our susceptibility to drawing conclusions where there are none (a phenomenon now colloquially known as apophenia) is further compounded. This is bad news for random, uncorrelated data sets and even worse news for those in Psychology (or any other social science field) where study findings can fuel public policy and societal norms, accurate or not.

So… what does this all mean?

As social scientists with a direct line to societal impact, we shouldn’t be satisfied with “good enough” methodology. The integration of data science translates into the pursuit of better statistics and (consequently) better answers. It’s time social fields fully embrace the data revolution and hold those within the field to data science standards of study design and testing. In a global economy on the speed track of technological change and cultural inclusivity, we can’t afford to interpret current human behavior with analytics and stagnating models alone.

Apophenia: beware patterns of noise. Photo by Kyre Song on Unsplash

This is not to say that integrating data science is or will be the end-all algorithm for successfully predicting human behavior or sidestepping biased results. There are still many evolutions, tech or otherwise, to be had. And this is also not to discount those selectively interested in social phenomena outside of research where principles of data science no longer necessarily factor into day-to-day work. But this is to emphasize the need for a concretely defined, data-driven approach to social modeling that not only parallels the growth already being capitalized upon in other fields but that also distinctly differentiates academic research from sheer intuition.

For Those Already in the Social Science Field

If you have doubts about undertaking another area of study, exposure to data science does not necessarily mean exhaustive training on the topic. As discussed, there is a difference between working within the social data science field and utilizing data science principles to better social science.

To take advantage of the growing availability of statistical models and the increasing availability of data sources without pursuing a data science degree, consider the following:

Establish your network

  • Actively participate in statistics and computer science forward courses, conventions, and workshops
  • Seek exposure to interdisciplinary, data science-driven environments in both work and study spaces
  • Connect with others already in the field during local data science meet-up’s or over online platforms like LinkedIn

Establish sound data pipelines

  • As the age-old saying goes, “Garbage in, garbage out.” Fancier models don’t mean much if data sources aren’t properly handled
  • If your team works with a statistician, establish clear communication and expectations between project contributors
  • Don’t gloss over basic data analytic skills like data cleaning and transformation- just because they don’t come with the hot “data science” label doesn’t mean they aren’t crucial

The data era has also ushered in a multitude of data science boot camps, MOOCs, and online tutorials, most of which are digestible and affordable for those seeking self-study beyond what is offered in their workspace. Functional learning has never been easier. If your immediate environment is lacking in data science resources, there are a multitude of ways to build your own. For an initial taste of what these resources offer specific to social science, you can start here, here, and here.

In Conclusion

Our technological world will continue to evolve with or without us. Relevancy and accuracy depend on life-long student mindsets, no matter the field. Social data science and the incorporation of novel algorithms and data resources within the social science field offer adaptive approaches to better model and test our ever-changing societal structures. With such advances, we can capitalize upon the unquestionable opportunities produced by the current era and recognize data for the social instrument it is.

--

--

Emily Burns
The Startup

Data Analyst with a cognitive neuropsychology background and an interest in UX 👾 | AGI and new media art enthusiast 🧠🌱| https://www.linkedin.com/in/emiburns/