The Value of Content Analyses of Reddit Communities: An Example Regarding Ex-Vanderbilt Nurse RaDonda Vaught
Event Type
Oral Presentations
TimeThursday, June 9th2:00pm - 2:30pm EDT
Reddit is a social news website characterized by user content submissions (such as links, text posts, images, and videos). Communities on Reddit, also known as “subreddits,” may offer unique and valuable insight into users’ opinions and self-reported experiences. While all social media data could offer insights, Reddit was chosen as it encourages anonymity, and therefore users are typically more willing to discuss sensitive matters. Here, we intend to demonstrate the value of obtaining insights from Reddit for healthcare-related research. This will be done through a content analysis regarding RaDonda Vaught, an ex-Vanderbilt nurse who was charged with negligent homicide after she was found guilty in a medication error that resulted in the death of a patient.

Using PRAW
PRAW, the Python Reddit API Wrapper, which is a package that allows one to interact with Reddit’s application programming interface. While PRAW is an accessible tool for many researchers, there are a few prerequisites such as basic python programming knowledge, familiarity with Reddit, and authentication credentials - which are easily obtained. Using PRAW, one can easily obtain a variety of information from Reddit. Such uses include obtaining submissions from a particular community, which can be sorted in a variety of ways (hot, new, rising, top, controversial, etc.). You can also obtain comments from submissions, also known as posts, search for keywords in various communities, and obtain information about Reddit users.

Data Collection and Preprocessing
PRAW allows for a number of different search methodologies. In our example, a keyword search of “RaDonda Vaught” yielded submissions from 25 separate communities on Reddit. Each of those communities had varying numbers of submissions related to RaDonda Vaught, with comments tied to each post. The most popular community in which RaDonda Vaught was discussed was r/nursing, which is self-described as a forum dedicated to discussing the “topics of concern to the nurses of Reddit.” For each submission, the number of upvotes, number of comments, time created, and post links and text were obtained. For each comment under a submission, information was obtained regarding the number of upvotes, time created, controversial score, whether or not the post was gilded, the number of reports, removal reason if the comment was removed, total awards, flair, which indicates how the user chooses to be identified in a certain community (ex. In r/nursing, flairs include RN, BN, MSN, and nursing student, among others), and the comment text.

Expected Results:
We anticipate seeing differences in the way different communities discuss RaDonda Vaught. As an example, subreddits dedicated to law may differ from subreddits dedicated to nursing and the medical community. We also anticipate those with a flair indicating they are a nurse to be more sympathetic than those who do not. Data analysis is currently underway, and we hope to discuss preliminary results at the conference. However, we believe the main contribution of this presentation to be our methodology.

RaDonda Vaught, recently found guilty in the death of a patient as a result of a medication error, is a topic of interest to the human factors community. By leveraging social media data from Reddit, we hope to provide rapid insights, access to users' thoughts, opinions, and beliefs that they may consider private and sensitive information, and data that would otherwise be difficult to obtain through conventional means. The use of social media for health-related research has grown alongside social media itself, and we wish to emphasize its value and encourage similar lines of research.

Presentation Overview:
During our presentation, we will break down the steps needed to obtain social media data from Reddit. This will include gaining access to Reddit’s API, collecting data, data cleaning, and the qualitative and quantitative methods that can be leveraged to understand and visualize the data.