CSE442 | A3 Relationship Exploration

Team Members: Alexander Banh, Katherine Choi, Justin Tran, Howard Yang

Each box represents roughly ~10 people from the study who fit the above criteria. Only 3000 of the 4000 answered with a relationship happiness rating.

Design Rationale

There were several iterations of designs for our interactive visualization. We wanted to create a visualization that was simple and easy to navigate while still clear in the information we wanted to convey. We ultimately settled on a simple bar to display the average ratings of relationship satisfaction.

Initial Design

One of the early ideas was to create a simple visualization that displayed the average ratings of relationship satisfaction by different categories. A user would be able to create a category through the tabs at the very top to display different visualizations for each category.

We see here that a simple graph is plotted when a user clicks on the "Age Difference" category. It's easy to see how relationship satisfaction changes as age differences continues to grow along the x-axis. Unfortunately, this visualization only allows users to see one dimension at a time. If a user wanted to see how gender and age difference affected the average ratings of relationship satisfaction, this visualization would be ineffective.

New Design

We wanted to create a visualization where users could pick and choose multiple categories if they wanted to. The previous design of plotting all the averages of each sub-component no longer worked because it would clutter the graph with many variables. The alternative solution was to create a single bar graph that would display the average rating for all participants who fit the chosen criteria.

As a user selects criteria for various categories, the bar slides back and forth to indicate the change in average ratings. This visualization is effective because the information we care about most is summed up into a single, simple visualization. There is little to interpret on the user's end, so the information can be quickly and easily understood. We chose to display the potential categories as a dropdown menu because we wanted the options to be concise (as opposed to listing all possible options as radio buttons). We also did not want to use a search bar, so that we could limit the query users would type in to only valid ones in our dataset.

Additional Design Choices

During our data exploration, we discovered that many parts of the dataset were incomplete because participants had the option to refuse a question. Additionally, it was difficult to come up with accurate averages for certain combination of categories because the sample population was not evenly distributed among certain attributes. For example, an overwhelming majority of the participants identified as "white", so there was far less data for the other races. We wanted a way to display this information so that users would know how accurate an average might be.

This design includes stick figures to represent the number of participants who fit the chosen criteria. We chose this technique because it gave a nice visual representation of the accuracy of the data.

However, the actual implementation of the interactive visualization proved to be much harder in D3 than we imagined. Instead of using stick figures to represent the population size, we chose to display 400 squares (where each square represents around 10 people). The squares would then fill up to indicate how many people were represented by the chosen criteria. This design choice also ended up being better than the initial one with stick figures because it allowed a user to see a visual percentage of the study population (how many of the 400 squares were filled). Out of the 4000 total participants, approximately 3000 participants answered the relationship satisfaction rating question. Thus, the default display shows 3/4th of the squares filled in with an average satisfaction rating of 4.46.

Development Process

Our overall development process was split into four main parts: (1) Data Wrangling (2) Data Exploration (3) Visualization Design (4) Visualization Implementation.

Data Wrangling

The first part of the development process involved a lot of data wrangling. Our dataset came from the "How Couples Meet and Stay Together" study by Professor Michael J. Rosenfield of Stanford University, Professor Reuben J. Thomas of City College of New York, and Professor Maja Falcon of Stanford University. The dataset contained an extremely large number of categories (with the study spanning over 5 waves), so we went through and picked out the ones that were most relevant or interesting. Most of the data was represented by numeric values, so we had to manually convert these data points to their actual representation (Figure 1). All four of us were involved in this process and dedicated around 4 hours total in this step.

Figure 1: We used this database to find mappings of the numeric values to their actual representations. In this example of the political party affiliation category, the dataset contained values 1-3 which represented republican, other, and democrat respectively.

Data Exploration

The next part involved some exploratory analysis of the data we chose in the previous part. Although we had narrowed down our initial dataset in step one, there were still many categories to choose from. We set out to explore which variables most affected the ratings of relationship satisfaction. As a result, the primary question we chose to focus on for this interactive visualization was "How do different qualities affect the ratings of relationship satisfaction?".

We used Tableau to create static visualizations to explore the different variables and their effect on relationship satisfaction (Figure 2.1). The original 'quality of relationship' category rated satisfaction on a 1-5 scale where 1 was 'excellent' and 5 was 'very poor', so we inverted the scale to represent 'excellent' as 5 and 'very poor' as 1. Since we were computing the average quality of relationship for each sub-component of a category, the inverted scale better reflected the computed average (we normally expect a higher average to mean greater satisfaction).

Figure 2.1: A couple different variables are charted here. We see the average rating of relationship satisfaction for each sub-component of Household Income, Race/Ethnicity, and Age. The size of the circles represent the sample size for that sub-component (bigger circles compute a more accurate average for the given demographic). Some interesting trends include a steady decrease in relationship satisfaction over the years until the mid 50's when relationship satisfaction increases steadily with an all-time high average at 75+ years.

We plotted a similar graph for several different categories to see whether there was much variance in relationship satisfaction among different sub-components. The ones that we ended up selecting include Age, Race/Ethnicity, Gender, Household Income, Religion, Partner's Race, and How Mother/Father Feels About Partner. During our exploration, we tried combining multiple categories as well to see how the averate ratings would change when multiple factors were taken into account (Figure 2.2, Figure 2.3).

Figure 2.2: This graph plots the average relationship satisfaction by parents' approval. We notice that a large number of participants had parents who did approve of their partners (rating of 4.5817). We notice a significant decrease in relationship satisfaction with participants who had parents who were either neutral (3.8842) or disapproved (3.6216).

Figure 2.3: We plot the same graph once more with gender included as another variable. We see a similar distribution of ratings by approval with slight differences in gender distribution. More females were likely to rate a higher relationship satisfaction rating than males were even when their parents disapproved.

There was a lot of interesting data around the effect of multiple variables on relationship satisfaction, so we chose to focus on that for our interactive visualization. Katherine was the main driver for this part of the process and dedicated around 5 hours in this step.

Visualization Design

This part of the process involved designing the interactive visualization we wanted to present. Refer to the section on Design Rationale above for a complete overview of the design process. All four of us were involved in this process and dedicated around 4 hours total in this step (design decisions were made throughout the entire process).

Visualization Implementation

The final step of the process was the implementation itself. Initially, we split off individually to experiment with D3 on our own. Once we had a basic grasp of how D3 worked, we delegated tasks to each team member so we could implement different parts of our visualization. This was split into figuring out how to (1) display the different categories and dropdown menus (2) displaying the bar (3) calculating an average rating (4) correctly displaying the average rating (5) filling in the bar correctly (6) filling in squares to represent the number of participants who fit the criteria.

Although a lot of our time was spent on the direct implementation of the above tasks, we also spent a lot of time importing the data and correctly formatting it so that it would be ready to use. We also spent a large chunk of our time debugging and figuring out how to effectively utilize D3. In total, we spent around 30 hours in this step (although the actual number of people-hours was much greater since we worked in parallel most of the time).

Summary

Our main roles were as follows: Katherine Choi (Coordinator), Alex Banh (Data Lead), Howard Yang (Tech Lead), Justin Tran (UX Lead). In total, this entire assignment took around 40 hours (closer to 100+ hours for each person's time contribution) with the implementation taking the most amount of time.

How happy would you be?