Author: krushton

StreetSavvy

StreetSavvy is a mobile web-based mapping tool that aims to improve the pedestrian experience by helping users make informed decisions about which route to walk.

We created StreetSavvy based on the premise that existing mapping tools do not have enough information relevant for pedestrians who are concerned about safety. With standard Google Maps, users request walking directions between two defined locations and they are presented with a choice of three routes. But Google does not provide any additional information to help users choose between these routes aside from the time it takes to get from point A to point B.

That may be enough information for some situations, but as we learned through our research, what is defined as “safe” or “not safe” is extremely specific to the individual, and changes frequently depending on situation. Unfamiliarity  with an area, the time of day, the weather, the location of certain businesses, how many people you are with….all of these factors may influence whether a particular situation feels safe or unsafe.

brick-directions Finding a route home

StreetSavvy was developed as a Masters final project and it won the 2014 Chen Awards in the “Enhancing User Experience” category.

StreetSavvy team

View Project Report (PDF) or Demo Video

Key Screens

work-streetsavvy-04

work-streetsavvy-05  

 

Visualizing Reddit Data in IPython Notebook

In my last semester at the I School I took an introductory course in data analysis using Python. I was pretty unfamiliar with statistics prior to the course and am still very much an amateur data scientist, but the course gave me just enough skills to be brave (perhaps foolishly so) in the face of an unruly data set.

Since the end of the semester I’ve found myself going back to these tools frequently, because they enable me to actually follow up on the random questions/curiosities/whims I get from time to time using public datasets and a few lines of code.

What is r/ProgressPics?

For this post I pulled data from a subreddit called Progress Pics. Progress pics is a place where people who are working on some form of body transformation go to post before and after pictures of themselves. I’ve been trying to improve my fitness lately so I’ve been frequenting the sub on and off for about six months.

It caught my eye as a potential source of interesting analysis because, unlike most subreddits, the community at progress pics has a title post format that encourages the use of structured data. Posters who are providing pictures are instructed to include their gender, age, height, start weight, and end weight in the post title, in a format that looks something like: F/23/5’5″ [189lbs > 169lbs = 20lbs]. 

The more structure you add to a blob of text, the easier it is to understand programmatically, so this data seemed like a good opportunity for analysis.

Method

This analysis was done in IPython notebook, using data analysis packages Pandas, numpy, and matplotlib. I used the Reddit API to pull in as many posts from the subreddit as I could before the API complained, and ended up with a dataset of about 1600 posts. The Reddit API provided post metadata including title, number of comments, number of votes, date/time, and more. Then, after some initial data cleanup, I used a series of regular expressions to extract the poster’s gender, age, height, start weight, and end weight from the post title.

After being filtered through these regexes, about half  (863) of the posts had good data for all of these metrics, so I dropped the remaining ones as this is a sufficiently large sample.

About the Data Set

So, who are the posters of r/ProgressPics? Some quick facts:

  • Gender: 362 female (42%), 501 male (58%)
  • Age: Average age 24, range 15-54
  • Height: Average height 5’6″ , range 4’11″ – 6’8″
  • Pounds change (lost or gained): Average 47 pounds, range 0-215

Descriptive Statistics

Here are some random findings from the data provided by Reddit and scraped from the title:

  1. Almost no one has Reddit Gold
    gold
  2. About 7% of posts are NSFW. I don’t have a chart of it, but more women than men post NSFW posts.nsfw
  3. The vast majority of pictures are posted on Imgur:
    imgur
  4. The age demographic is pretty representative of Reddit as a whole:age_histogram
  5. Here we see the height of posters broken out by gender. The huge jump is at 72″ or 6′, which probably indicates some fibbing on the part of the 5’11″ males
    height
  6. This chart shows a histogram of start and end weight which really helps visualize the weight lost!
    weight_histogram
  7. For this analysis I was very interested in the influence of gender on voting and commenting behavior. It seemed that female posters get way more votes and comments than male, which is clearly true from the data:
    score_comments_gender
  8. A scatterplot demonstrates the relationship between gender, scores, and pounds lost/gained. While male posters hover around the lower range of scores regardless of pounds lost, and some women fare about the same, a select few women climb out of the fray with 1500+ points.This chart also shows how almost no women report weight gain.
    score_scatterplot

 Significant Relationships

What conclusions can we draw out of correlation analysis of this data? With a sample size of 800 the correlation coefficient doesn’t have to be extremely large to be significant.

Assorted findings from looking at correlations:

  • Unsurprisingly, there is a strong positive correlation between pounds change and score – in general, people who lose more receive more upvotes and more comments.
  • There is also a positive relationship between age and pounds change, possibly because older people have put on more weight over time and thus have more to lose.
  • For men, age is positively correlated with final BMI. Older men are bigger than younger men. There is no relationship for women.
  • There is a weak correlation between age and number of upvotes for men. Older men receive more votes than younger men.

 

Gender, Final BMI, and Score

One particular area worth exploration is the relationship between gender, final BMI, number of comments, and score.

This was ultimately the question that piqued my interest in this analysis. Anecdotally speaking, it seemed that a certain subset of posters were receiving an inordinate number of votes and comments when compared with the number of pounds lost. In other words, it seemed like people were voting based on the current size of the person posting rather than the size of the accomplishment. Furthermore, this effect seemed to be particularly strong when the poster was female.

The data supports these conclusions:

  • While there is a weak inverse correlation between final BMI and score for all posters, this relationship is strong for female posters. In other words, as a female poster’s BMI goes down, the score the post receives goes up.
  • When looking at voting behavior by final BMI, there is an interesting pattern — downvotes are highest at the lower end of the bell curve. It turns out that the number of downvotes are also inversely correlated with BMI.
    upvotes_downvotes
  • When we group into standard BMI categories, the effect of gender and body size on score are even more striking. For men, posters who are considered normal or overweight receive approximately the same average score. It’s also “better” on r/progresspics for man to be considered obese than underweight. For women the situation couldn’t be more reversed.
    mean_score_gender
  • A final point of interest is the number of comments a post receives. While the number of comments is related to the score, people also tend to comment on things that they don’t like (“controversial” posts in Reddit land). As the following chart shows, underweight women result in a flurry of comments.
    comments_per_post

 

Data

If you’d like to perform your own analysis of this data, click here to download it as a CSV.

 

 

Fuzzy Logic

Fuzzy Logic is a web-enabled teddy bear that transforms the physical world into an interactive learning environment. This project was a group effort for a course in Interactive Device Design in Berkeley’s Citrus innovation lab. For my part of the project I was responsible for designing and developing the Fuzzy Logic Android app, which let the user configure the device and also acted as the communication hub between the bear and the web.

You can read more about our project and process here:

Fuzzy Logic

Fuzzy Logic

Domestic Scene 1 and 2

Domestic Scene 1 and 2 was an installation created for Art 133 – Advanced Sculpture. It was displayed in UC Berkeley’s Worth-Ryder Gallery for two weeks in November 2013.

Domestic Scene 1

A cardboard bathroom, where the “mirror” was a 10″ Android tablet reflecting the front-facing camera. The tablet was running a custom application that covertly takes a photo when it detects a face.

Domestic Scene 2

A monitor in the back room the gallery that displays the most recent “surveillance” captured by Domestic Scene 1.

domesticscene

 

Memex

In my Art 133 – Advanced Sculpture class we were tasked with creating a self portrait. I called my submission Memex, after the hypothetical proto-internet Vannevar Bush described in his 1945 article “As We May Think“.

Memex is a visualization of 1 month of my Chrome browser history. The links between sites are captured by a D3 force graph which I then cut into wood with a laser cutter to create a 3′ x 3′ wall hanging.

View Visualization (warning: takes a few seconds to load)

memex (1)

288 Cities

One of the hardest decisions we face is choosing where to live, and often there is simply not enough data available to make an informed decision. To attempt to make this process easier, 288 Cities aggregates facts about the most populous cities in the United States (the top 288, to be precise). Slider controls enable the user to narrow down the results based on economic, geographical, political, and other important factors.

Key Screens

The default view maps all results across the country
The default view maps all results across the country
The card view shows one card for each of the results with photos linked from Flickr
The card view shows one card for each of the results with photos linked from Flickr
The table view shows all of the data in raw form
The table view shows all of the data in raw form
Clicking on a card or table row drills down to the data for a single city
Clicking on a card or table row drills down to the data for a single city

Gif Cubby

Gif Cubby is a web app for storing links to animated gifs. The site features tagging, searching, user accounts, sharing (to Pinterest and Facebook), and Facebook authentication via OAuth. It also has a companion Chrome extension for rapidly posting gifs to the site from anywhere on the web.

main page

Code Map

CodeMap shows job search result counts in 100 U.S. cities for 200+ programming languages. The data was obtained by cross-referencing Wikipedia’s List of Programming Languages with Indeed.com search result counts for the top 100 largest U.S. cities. The result is a heatmap of programming language popularity across the country.

Code Map

SF Street Art Map

This project was created for a web development lab course. When planning the project, my group wanted to take advantage of one of the many sources of open data made available by the city of San Francisco. We pulled data from several sources into Google Fusion tables to create the multi-layered SF Street Art Map. Viewers can map the location of graffiti reports and recognized murals along with traffic, transit, and income layers to investigate the (sometimes tenuous) lines between art and crime.

Key Screens

Street art map main page. Map viewers can toggle different layers and visualizations with the filters panel
Street art map main page. Map viewers can toggle different layers and visualizations with the filters panel
Clicking one of the mural markers shows a popup image of the mural
Clicking one of the markers shows a popup with more information