Please create an account to participate in the Slashdot moderation system


Forgot your password?
Google Businesses The Internet

Making Sense of Census Data With Google Earth 65

mikemuch writes "Imran Haque has developed a mashup of Google Earth with data from the U.S. Census Bureau, called gCensus. The app uses the XML format known as KML (Keyhole Markup Language), which can create shapes and colors on the maps displayed by GE. Haque had to build custom code libraries (which he's made available as open source) that could generate KML for the project. He also had to extract the relevant data from the highly counter-intuitive Census Bureau files and store them in a database that could handle geographic data. gCensus lets you do stuff like create colorful overlays on maps showing population ages, race, and family size distributions."
This discussion has been archived. No new comments can be posted.

Making Sense of Census Data With Google Earth

Comments Filter:
  • Exciting! (Score:3, Interesting)

    by Prysorra ( 1040518 ) on Monday March 12, 2007 @10:31AM (#18316283)
    Imagine what applications you cook up with this .....

    Perhaps there's a way to fuse the presentation possibilities with Gapminder []?
    • Re:Exciting! (Score:4, Interesting)

      by the_greywolf ( 311406 ) on Monday March 12, 2007 @12:34PM (#18317823) Homepage

      Incidentally, I worked on a project that used the same data to a very interesting end: (Fortunately, my NDA has expired by now.)

      In our project, we needed to know exactly where and what coverage was needed on which FM channels where we could get away with putting in an FM translator station. We imported the census data in Access (which was surprisingly straight-forward), and used the Manifold System initially - and later MapInfo - to visualize the areas we could sneak in without interfering with existing (and future) stations.

      We developed several Access apps that allowed us to build a list of new sites to set up translator stations: Over 17,000, if memory serves. In the March filing of 2003, a couple Perl scripts pushed 4,221 applications to the FCC, nearly all of which were automatically generated.

      Over the course of the next 5 months, we went over each application by hand (again, using our custom software, combined with software originally designed for this purpose), while a couple of us developed our own series of scripts and programs to perform interference studies in MapInfo.

      In the August filing, we withdrew several applications and improved nearly 2,000 others.

      I left in March 2004, but last I heard, they had nearly 2,000 construction permits, and had actually built 5 or 6 of them. A few of the other permits have been sold for upwards of $500,000.

      All in all, I can't say I really approve of their methods, but I can't really say I'm not proud of what we achieved, either.

  • not even m$ (Score:3, Funny)

    by mastershake_phd ( 1050150 ) on Monday March 12, 2007 @10:43AM (#18316427) Homepage
    The Census Bureau has meticulously documented its data files--in a 635 page PDF file.

    Wow, now thats a file format.
    • by fotbr ( 855184 )
      Have you looked up Adobe's documentation on PDF? It makes the Census Bureau's documentation look small.
      • Amen. I had to read the ~1000 page (plus appendices) Adobe PDF Reference document for a recent project. I wonder if any other file format descriptions are as large?
    • Call me paranoid, but it's probably to keep us from doing it again [].

  • by skoaldipper ( 752281 ) on Monday March 12, 2007 @10:45AM (#18316471)
    The Census is equally important as voting. Special interest groups representing minority organizations work closely with state and local governments when they draw up political districts. What an awesome tool to hold those officials accountable and give other groups a voice - open access for everyone.
    • Census forms can request a lot of demographic data, business type and income, number of rooms in house, recent remodeling, etc. I've even had follow up surveys by census employees at my location asking for additional information -- I politely refuse. I have zero interest in Google aggregating this information and making it easier to mine the data, and making it easier for intrusive advertisers, or the local tax assessor.

    • I see all the major parties of government investing LOTS of time and money in this system so they can more efficiently find and knock on my door. They'll know expected race, income, etc before ever getting to my block. Now if only someone would combine that with a phone to address directory, they could do it all by phone! Recording begins: "Hello Mr voter, we understand you are elderly, white, and poor, and that your wife past away a few years ago. You seem to have an excellent credit rating and the re
    • I agree that the special interest groups are working with state and local governments to re-district for their own interests. I think that we disagree on what defines a special interest group, though. It's not the minorities or minority groups that scare me -- it's the majorities, such as the Republicans in Texas who redistricted well after the census []. Make no mistake, Democrats gerrymander when they have control of the government, but they redistrict at the appropriate time -- after the census results are
  • A lot of this has already been done, although the site hasn't updated since google changed their API's:
    • Re: (Score:2, Informative)

      "A lot of this has already been done, although the site hasn't updated since google changed their API's:"

      He mentions in the article - it uses Google maps rather then Google earth - and the API for Google earth apparantly allows more to be done with the visualisation of the data.
    • by ihaque ( 1074767 )
      The sibling poster is right - is a different site, that uses Google Maps instead of Google Earth. It's not quite as powerful a visualization tool, as Google Maps doesn't let you shade or make variable-height different regions. --Imran Haque
  • Others (Score:2, Interesting)

    by adickerson0 ( 884626 )
    I tend to use the Free AnalyGIS mash-up for Google Maps [] For those who don't know much about AnalgGIS [] I suggest checking out their web site. It is a pay service but the best of the bunch IMHO.
    • The census project seems to be a vertically integrated lump. I can't help thinking that leveraging existing FOSS tools would get a more flexible result. Say load the census data into Postgres, use Geoserver's KML output [] to apply styles. The automated binning of different tables could be implemented by using vendor specific parameters as supported by Web Map Services (WMS). It should be blazing fast as Postgres geometries and the Geoserver WMS have some hefty optimisation already built in.


  • by mac123 ( 25118 )
    We are currently down while we upgrade to Google Maps API version 2. We hope to have this finished soon, thank you.
  • I wonder if when using this mash-up some areas will start showing up as giant sinkholes?
  • by jthill ( 303417 ) on Monday March 12, 2007 @11:18AM (#18316877)

    The real goal of the project is to democratize information by making data (such as political and environmental data) that's currently publicly accessible in name only, truly accessible to the people.

    I'd like to see maps of the disparities between exit-poll and actual vote tally numbers, one map per election. This will make it possible, and not just "possible": once someone has putatively done the work, it'll be easy to check, because the raw data are available from trustworthy sources (cue cynicism in 3) so anyone can redo the map to check for distortions.

    This makes whole classes of questions easier for mere mortals to answer, and simultaneously makes their answers easier for mere mortals to understand. It's huge.

    • by Anonymous Coward
      Read this book [] and understand how useful GIS and statistics are.
    • by dbrutus ( 71639 )
      exit polls are wrong, have been wrong for decades, and their wrongness is skewed in a partisan fashion to the detriment of Republicans. Everybody corrects for this wrongness in different ways which is a great source of uncertainty because how you correct for the fact that a certain portion of the right won't talk to pollsters (and that portion is significantly larger than its twin on the left) is a black art.

      Polling is too often GIGO.
  • I knew from the headline that the story summary would include the deaded M word from 2006 -- mashup. The word that conveys the work and technical details involved is "integration". Yes, it's four syllables. Can you say it? I knew you could.

    When I go to work, I don't make something meant to absorb butter and gravy.

    • Re: (Score:3, Insightful)

      The word that conveys the work and technical details involved is "integration". Yes, it's four syllables.


      Or to put it another way, DAMN YOU KIDS! STAY OFF OF MY LAWN!

      (It actually irritated me too, until I realized that my irritation was a symptom of my "over 30" age, and then promptly got over it)

      • (It actually irritated me too, until I realized that my irritation was a symptom of my "over 30" age, and then promptly got over it)

        This is the second time in about a week I've referenced my age on Slashdot, but...I'm 21, and I think "mashup" is just as moronic, overused, and meaningless as "meme". It's a fad, hopefully, and it will go away in favor of more precise words (like "overlay" in this case), or existing words like "combination" that mean the same damn thing. I have no problem with new useful word

        • meme is actually a very useful word. It was coined by the Oxford zoologist and neo-Darwinist theorist Richard Dawkins to express the idea that natural selection applies not just to living things - but to anything that replicates - in this case ideas. In his book "The Selfish Gene" he explores in one chapter the parallels between the evolution of species and the evolution of Ideas - a meme is an idea that replicates in the environment of human culture and language. The term may be misused and overused - b
          • Yes, I agree that "meme" can be useful in its original context, just as every serious field has its own set of jargon. But it certainly has become as meaningless as "thing" or "idea" to the general web-browsing public.
    • by Otto ( 17870 ) on Monday March 12, 2007 @11:38AM (#18317151) Homepage Journal

      When I go to work, I don't make something meant to absorb butter and gravy.

      You might, if you worked in Idaho.

  • I hope that google is bidding to support automation and data gathering and analysis of the 2010 census. I'm sure there must be contracts going out now to support the process. So far, I think that google would be the most competent at managing the volume of data yet keeping the granularity.

    Also, with google support. I think that census data can be published far more quickly than the 2000 census was.
    • I don't think that there's any bidding for data collection - the Census Bureau is already setting up pilot data collection sites for 2008, and it looks like they're performing all the hiring. I'm not sure if they're allowed to oursource data collection under law.

      As for the process... The process is already pretty automated; I'm not sure if the Census Bureau has used Computer-Assisted Personal Interviewing (CAPI) for the decennial census before, but they're issuing some sort of handheld device to their field
  • I fear that this will enable unscrupulous realtors, etc. to make use of income data and geographic positioning to clearly delineate which areas are "bad" and therefore make further attempts to exploit those areas.
    • Of course it will enable unscrupulous realtors, politicians, decision-makers, and corporations to make unethical use of this tool. At the same time, it will also allow the public to have easy and intuitive access to aspects of everyday life they were conveniently kept hidden from them...
    • I was involved with a guy who wanted to make this type of data years ago. It would have been a lot easier using google maps than the proprietary GIS mapping he was using at the time. Actually, the project involved crime data, one of the types of information that home buyers find very interesting. The real estate companies thought it was a bad idea. It overlapped to tightly with income levels and social issues. They were concerned that that they would get in trouble for not telling buyers about informat
      • by dbrutus ( 71639 )
        That's only a temporary 'fix' as purposefully degrading and delaying data in this fashion eventually becomes illegal if you do it badly enough, long enough. And on the down side, when a neighborhood goes "good" that delay will hold down housing values in the new "hip" up and coming area. It also will hold down tax assessments.
    • I fear that this will enable unscrupulous realtors, etc. to make use of income data and geographic positioning to clearly delineate which areas are "bad" and therefore make further attempts to exploit those areas.

      Realtors already do that, using proprietary products that produce the same kind of information (they also can and sometimes do provide nice, useful area profiles to customers with the same tools, information is neither good nor bad, it is the application that can be either...)

      What's new about this

    • by ksheff ( 2406 )
      I would think they would use it to steer their clients away from the "bad" area and give the sellers in the "good" area even more reason to jack their prices up.
  • Given that this uses Google Earth, a 3d tool (unlike google maps) it might be useful to be able to do things like play with the elevation to represent different types of data (doable, I'm sure, if not that useful) or alternatively (and more practically) plot bar graphs on the area which they represent- height would represent one value, colour another- you could even segment the graphs vertically. I did a quick previs of what this might look like if anyone is interested []
  • "The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases."
    -sir josiah stamp, inland revenue department of England, 1896-1919
    • Actually, this is not at all how this is done anymore. While it still isn't perfect, it's much more scientific and accurate than what the quote suggests. Also, I think that quote is actually much older than you indicated in your post. I seem to recall a very similar quote attributed to a someone from the 1600's that was posted outside the economics department where I went to grad school.
  • but can it color bewb^H^H^H^Hgender.
  • ... it's just very complicated. The amount of information recorded is mind-boggling and not readily accessible mainly due to its size. Also, many of the types of information recorded need to be handled in certain ways - I've seen many beginning GIS students bungle up census data by misinterpreting Hispanic data and unemployment data because they don't understand how it is tabulated.
  • From his site, it looks like he's entering data by hand, and hence it's time consuming so he's got very little of the census data up.
    Seems like it shouldn't be too much of a problem to have something spider the census site, rip the pdfs off the site, & parse the data out of those pdfs then enter it into whatever format he's using.
    Almost certainly less time consuming to code something to do that than actually manually entering everything by hand.


    He's also short on space... once again google could come t
  • My company is looking for a passionate developer who wants to work on a very similar project with Google Earth and overlaying data that originates from a Drupal/MySQL install. This project could lead to a product that would leave the developer with residuals on future sales. The development work would be paid for, though on a flat/retainer basis. If you're interested, e-mail me. chris [at] THANKS!
  • gCensus is a great website - it spurred me to create a similar project in Ruby. I'm writing Ruby code to pull Census boundary & data files and parse them into KML files for Google Earth display (height adds a great dimensionality) and Google Maps. I'm posting my progress here: [].
  • by Randym ( 25779 ) on Wednesday March 14, 2007 @05:51PM (#18354765)
    ...such as using easily accessible GIS to examine voting patterns and election districts to catch gerrymandering and, potentially, election fraud.

    You do realize, don't you, that *every time the political maps are redrawn* (i.e. every ten years) that gerrymandering is extensively used by every local state legislature? Gerrymandering is, after all, the process of redrawing the Districts so as to maximize partisan advantage. You don't need this tool to catch gerrymandering -- it's ubiquitous!

    Rant aside, this app could certainly be a useful tool. The ideal -- nonpartisan -- political map would be drawn in such a way as to have the *sum of the sides* of *all* the districts to be a *minimum* while having the population of *each* district be within 1% of the average District population size for that state. (The Supreme Court has held that *some* variance in the population of districts is OK; I think that >1%, though, is *too much*.)

    For example, say a state has 6.5 million people and 10 Congressional districts. Then each District must contain 650,000 +/- 1% (i.e. 1% here equals 6,500 people) and the sum of the sides of all the districts together must be a minimum. This leads to roundish districts and no possibility of gerrymandering (which, because of the torturous way districts are drawn, tends to *maximize* the length of the sides of districts).

    The 'drawback' of this method, of course, is that only population -- and not historical voting patterns -- is taken into account, thus making it impossible to ensure that all the Democrats or Republicans or minorities are concentrated into just a few districts, as is done now by partisan legislatures. On the plus side, this would make more Congressional districts *unsafe* -- Congresspeople would actually have to get out there and *earn* their seats, instead of just sitting back and taking it for granted that their particular seat is safe because of the way that the Districts are drawn.

    However, this scheme is unimplementable at the present time, due to the recent reauthorization of the Voting Rights Act (for the next 25 years), which actually *requires* gerrymandering when it comes to creating so-called "minority majority" Districts. (This ensures that minorities have adequate representation in Congress, rather than having their power diluted by being split among a number of other districts.)

    For example, here in Michigan, blacks constitute about 12% of the population, but are highly concentrated in Detroit. If the above Voting Rights Act stricture was *not* in place, unscrupulous politicians could redraw the Congressional Districts in such a way (for example as long thin areas that had one end in Detroit and the other end outstate) as to ensure that blacks had *no* majority districts, and were a minority in *every* District that they were in -- clearly a violation of the Equal Protection clause of the Constitution.

    So my logical plan for nonpartisan redistricting is -- unfortunately -- unlikely to come to fruition anytime soon -- for Congressional Districts anyway. However, since the case that was adjudicated by the Supreme Court *only* addressed issues of *Federal* redistricting, it might be possible for individual states to implement this plan as a way of making elections more contested and, hence, more democratic.

You can measure a programmer's perspective by noting his attitude on the continuing viability of FORTRAN. -- Alan Perlis