Seminars

Starting in the Spring semester 2023, we will begin hosting brownbag seminars with the goal of bringing scholars and graduate students from many disciplines together. We welcome any scholar or researcher with an interest in computational social science methods to attend. Speakers include faculty and graduate students across the Madison research community, especially from the Journalism & Mass Communication, Life Science Communication, Communication Arts, Political Science, and Computer Sciences, and also intelligent researchers from other universities and industry.

Coffee and snacks may be provided for our in-person attendees.

2024 Spring Brownbag Seminars

April 19: Decreased Open-source Innovation by Censorship

  • Speaker: Ouyang Rongxin (Ph.D. student in the Department of Communications and New Media, National University of Singapore)
  • Content: Technical innovations are key drivers of developments, among which open-source innovations are increasingly pivotal. However, the effects of censorship on open-source innovations remain largely unexplored, with competing explanations. To estimate, we exploited a natural experiment in 2013 using Difference-in-Difference in conjunction with Synthetic Control on a comprehensive dataset on GitHub. Results from two months show that the suspension of GitHub in China significantly hampers open-source software innovation, demonstrating censorship’s capacity to hinder information flow and impede technological progress among even the most tech-savvy communities. These effects are consolidated by the suppressed communications, reduced contributions, and several robustness tests. Our study provides the first causal evidence of censorship’s negative impact on innovation and theorizes the mechanism as a plausible explanation for the extension of transnational orders. In this talk, we will share, for the first time, the results obtained from four-year data using an improved design and a comparative analysis.
  • Talk is held on Friday (10:00 am – 11:30 am, CDT, sign up for Zoom invite: link)

April 12: A Turing test of whether AI chatbots are behaviorally similar to humans

  • Speaker: Yutong Xie (Ph.D. candidate in the School of Information, University of Michigan)
  • Content: As AI interacts with humans on an increasing array of tasks, it is important to understand how it behaves. Since much of AI programming is proprietary, developing methods of assessing AI by observing its behaviors is essential. We develop a Turing test to assess the behavioral and personality traits exhibited by AI. Beyond administering a personality test, we have ChatGPT variants play games that are benchmarks for assessing traits: trust, fairness, risk-aversion, altruism, and cooperation. Their behaviors fall within the distribution of behaviors of humans and exhibit patterns consistent with learning. When deviating from mean and modal human behaviors, they are more cooperative and altruistic. This is a step in developing assessments of AI as it increasingly influences human experiences.
  • Talk is held on Friday (11:00 am – 12:30 pm, CDT, Zoom recording available)

2023 Fall Brownbag Seminars

Nov 8: Leveraging Large Language Models for Opinion Prediction in Nationally Representative Surveys

  • Speaker: Junsol Kim (Ph.D. student in the Department of Sociology, University of Chicago)
  • Content: Large language models (LLMs) have begun transforming research practices in social sciences. In this talk, we will highlight the capacity of LLMs to recover “unmeasured public opinions” in nationally representative surveys, focusing on the General Social Survey (GSS). Our study used LLMs to predict missing responses of 68,000 Americans to 3,110 GSS questions from 1972 to 2021. Specifically, we fine-tuned LLMs to predict each survey participant’s missing responses by learning individuals latent beliefs as “digital doubles.” The model displayed impressive accuracy in recovering missing opinion trajectories over five decades (AUC=0.86, r=0.98), including the rising support for same-sex marriage. Additionally, we will discuss the limitations of using LLMs for public opinion prediction, including practical constraints, socio-demographic representation, and ethical concerns.
  • Talk is held on Wednesday (10:00 am – 11:30 am, CST)

Nov 13: Crossing Digital Borders: Computational Understandings of Multilingual Cross-Platform Information Flows

  • Speaker: Dr.Yingdan Lu (Assistant Professor of Communication Studies in the School of Communication at Northwestern University)
  • Content: Rapid advancements in technology and globalization have facilitated the flow of information, ideas, and knowledge across national and geographical boundaries. Increasingly, however, countries are imposing barriers on the transnational flow of information. This talk features two papers aimed at understanding whether and how information traverses national borders despite these barriers. The first paper employs a semi-automated system, combining deep learning and human annotation, to reveal the flow of information from the world into Chinese social media during the global spread of Covid-19. The second paper utilizes several large language models to recognize major narratives about the Russo-Ukrainian War on Chinese social media and examine the origins of these narratives across Chinese, Russian, Ukrainian, and U.S. media ecosystems. Both papers provide content-based computational frameworks for identifying multilingual, cross-country, and cross-platform digital communication, shedding light on the consequences of information control to the global information ecosystem.
  • Talk is held on Monday (9:00 am – 10:30 am, CST, Zoom recording: link)

2023 Spring Brownbag Seminars

February 3: Computational Content Analysis for Social Science Research 101

  • Speaker: Zening Duan (Ph.D. student in Mass Communication, UW-Madison)
  • Content: This talk will provide an introduction to the fundamental concepts of content analysis in the computational world, including key steps for designing, implementing, and evaluating a research project. The speaker will also present an overview of the seminar’s content for the spring semester, highlighting topics such as cross-platform data request and validation, social bots, crowdsourcing, and various computational methods including word embeddings, machine learning, computer vision, and network analysis. Attendees can expect to gain a comprehensive understanding of the concepts and practical considerations involved in conducting multimodal content analysis using computational methods.
  • Talk is held on Friday (3:30 pm – 4:30 pm, MCRC Room 5011 + Zoom)
  • Materials (Google doc)

February 17: Fantastic Bots and How to Find Them

  • Speaker: Kai-Cheng Yang (Ph.D Candidate in Informatics, Indiana University Bloomington)
  • Content: Social bots have become an important component of online social media. Deceptive bots, in particular, can manipulate online discussions of critical issues ranging from elections to public health, threatening the constructive exchange of information. In this talk, I will first introduce Botometer, a machine learning-based bot detection tool I built, and show how people can use it in daily life and research. I will then present the malicious bot activities I found, focusing on an example of bots spreading health misinformation during the COVID-19 pandemic. I will also cover my findings on how humans perceive social bots and how bots can serve as research instruments. Finally, I will conclude with an outlook on the future directions in light of the recent advancements of generative machine learning models.
  • Talk is held on Friday (3:30 pm – 5:00 pm, MCRC Room 5011 + Zoom)
  • Materials (Zoom recording)

March 2: Get the Straws for Your Bricks: Implementation and Ethics of Automated Online Data Collection

  • Speaker: Anqi Shao (Ph.D. student in Mass Communication, UW-Madison)
  • Content: The past decade has witnessed burgeoning empirical literature in leveraging big data for social science research. While large-scale datasets have been pushing forward the computational social science approaches, the challenge of data availability has become a pressing issue. Given the increasing amount of data available online, automated web scraping has become a valuable method for collecting data for social science research for its high efficiency and scalability compared to manual collection. However, ethical concerns, such as data privacy and accessibility for research, present impediments to academic research on these online data. The speaker will introduce techniques for automated data collection from static html parsing to using APIs for social media platforms, and address ethical concerns and considerations for communication researchers.
  • Talk is held on Thursday (3:30 pm – 5:00 pm, MCRC Room 5011 + Zoom)
  • Materials (Zoom recording, Python script-Retrieving conversations from Twitter API)

March 24: Computational Modeling of Science Communication through Crowdsourcing and NLP

  • Speaker: Jiaxin Pei (Ph.D Candidate in Information, University of Michigan)
  • Content: Science communication is the process to communicate scientific findings from research articles to general audiences. Recent advances in NLP enable large-scale analysis of news and social media data to answer key research questions in science communication. In this talk, I will use introduce two of my previous studies on science communication: (1) How journalists and scientists communicate certainty and uncertainty in scientific findings (2) What factors affect the information change in science communication across different types of media. Along with the two studies, I will also introduce how to integrate crowdsourcing and NLP into computational communication research.
  • Talk is held on Friday (3:30 pm – 5:00 pm, Zoom only)
  • Materials (References)

April 7: Measuring Competition in a Large Online Community for Data Science

  • Speaker: Prof. Marlon Twyman (USC Annenberg School for Communication and Journalism)
  • Content: In recent years, competition has served as a core interaction in many online communities. People now typically earn rewards for producing content and engaging with others online. The data science platform, Kaggle, contains a rich online community where people participate in data science challenges and share computational notebooks. This talk describes two studies that demonstrate how community members compete for attention and status in Kaggle. The first study investigates the relationship between a content producer’s status in the community and the reception of their content. The other study illustrates how collaboration and social network positioning contribute to positive performance in data science competitions. These studies employ multiple statistical techniques, including social network analysis and propensity score matching, to generate new insights explaining participation in competitive online communities.
  • Talk is held on FRIDAY (3:30 pm – 5:00 pm, Zoom only)
  • Materials (link will soon be available)

April 14: Ethical and Legal Challenges to Computational Research

  • Speaker: Prof. Josephine Lukito (Moody College of Communication, UT-Austin)
  • Content: The accessibility of big data has made it possible for social scientists to study vast quantities of digital and digitized data from millions of people, potentially shaping online and offline social infrastructure. And yet, researchers are also challenged by the ever-changing policies surrounding digital media research. In this talk, the speaker will highlight key legal and ethical challenges to three steps in the practice of research: data collection, data analysis, and data sharing. This talk will include advice for protecting the researcher and the users being studies, and will highlight new approaches to data collection, analysis, and sharing.
  • Talk is held on Friday (3:30 pm – 5:00 pm, MCRC Room 5011 + Zoom)
  • Materials (link will soon be available)

Vec-tionaries: Development of a word embedding-based optimization approach to extracting moral appeals from text (rescheduled to the fall semester)*

  • Speaker: Zening Duan (Ph.D. student in Mass Communication, UW-Madison)
  • Content: Social science researchers often study latent content features within messages, such as moral appeals. The challenge for measuring latent features is non-trivial. For example, human coding cannot easily scale up to process large-scale messages or reach conventional intercoder reliability even after repeated training. The rise of computational content analysis has given rise to dictionaries as a low-cost, quick-to-use measurement strategy; that said, dictionaries suffer from known shortcomings, such as insensitivity to context-specific applications. The speaker will introduce a novel computational tool for measuring latent message features, called VecOpt, that integrates information from validated dictionaries with word embeddings through a nonlinear optimization model. The same approach can be transferred to other latent message features and help advance studies on message influence on communicative processes.