<< Chapter < Page Chapter >> Page >
This report summarizes work done as part of the Visualizing Large Data Sets PFUG under Rice University's VIGRE program. VIGRE is a program of Vertically Integrated Grants for Research and Education in the Mathematical Sciences under the direction of the National Science Foundation. A PFUG is a group of Postdocs, Faculty, Undergraduates and Graduate students formed round the study of a common problem. This module will do exploratory analysis on large data sets, specifically data related to the housing crisis.

Introduction

The US housing crisis has undermined the world economy in wide reaching and poorly understood ways. Although there is a lot ofspeculation over the causes and the effects of the housing crisis, most of these ideas come from opinionated blogs or news articles that do not list theirsources. This lack of data becomes perilous as the US government invests trillions of dollars based on untested hypotheses concerning the crisis. OurPFUG's focus is to compile, clean, and analyze data pertaining to the housing crisis to get a clearer picture of what is actually going on.

Overview and Motivation

Real Estate Bubble : Around 2006, house prices rose much higher than their true value. Eventually, housing prices became so high, it was difficult for currentowners to afford their house. As foreclosure rates increased, house prices began to plummet. This has largely affected the global economy.

Little Public Organized Data : There is a lot of speculation over the causes and the effects of the housing crisis. Unfortunately, most of these ideas come from opinionatedblogs or news articles that don’t list their sources. Therefore, it is difficult to collect reliable information.

Government Expenditures : The government has already exhausted millions of dollars in order to aid those affected by housing crisis. With such littlepublic data about the crisis, we are left wondering what data the government is using.

Still Unfolding : It is important to realize that the housing crisis in ongoing. This allows us to track its progression and hopefully make predictionsfor the upcoming years.

Large Data Sets : The housing crisis serves as a perfect model for visualizing large data sets. Most data sets we collect usually cover multiple years,counties and variables.

Problems with Large Data

Hard To Find : All of the data we have collected come from multiple sources. Currently, thereis no central repository where data can be found.

Licenses and Fees : Some of the data sets have licenses that do not allow us to reproduce or publish any of our findings. Also many of the data sets cost largeamounts of money to purchase.

Size : Some data sets were as large as 10 GB. In order to work around this problem, we wereable to extract certain parts of the data sets without having to completely download them.

Dirty : Most of the data sets we find are what we call “dirty.” They are usually unorganized andpractically unreadable.

Data Sets

To view our most current data sets and work, please visit our PFUG's website: http://github.com/hadley/data- housing-crisis . Some of our major data sets include...

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, The art of the pfug. OpenStax CNX. Jun 05, 2013 Download for free at http://cnx.org/content/col10523/1.34
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The art of the pfug' conversation and receive update notifications?

Ask