I just got through reading an excellent post by Bret Taylor, ex-Googler and creator of FriendFeed, about the need for open data sets. He makes a compelling argument on how difficult and expensive it is to get any type of meaningful data that can really be used to make interesting web applications. I experienced this first-hand in the creation of eppraisal.com – getting good quality real estate data was not cheap or easy.
I think all of these barriers to data are holding back innovation at a scale that few people realize. The most important part of an environment that encourages innovation is low barriers to entry. The moment a contract and lawyers are involved, you inherently restrict the set of people who can work on a problem to well-funded companies with a profitable product. Likewise, companies that sell data have to protect their investments, so permitted uses for the data are almost always explicitly enumerated in contracts. The entire system is designed to restrict the data to be used in product categories that already exist.
Interesting, but how does this apply to Africa?
Depending on how you look at it, this is a great opportunity or a serious problem. For instance, it’s a problem for us on the Ushahidi project because it is difficult to get some of the detailed mapping data that we need in a usable format. However, if you’re an enterprising businessman you would realize how much un-digitized data is in Africa and would start doing something to create data sets and license that out.
Of course, you licensing that data out puts us all in the same quandry that Bret outlines in his post… That by it not being open and free, the barriers to entry are high(er) and only larger organizations with access to a lot of resources can utilize it. A catch-22 if ever there was one.
It only make sense to give up data, or collect data and give it away for free of the relative cost of doing that for each person is minimal. Anytime you need to use a lot of resources to collect data, then you deserve to charge a fair market price for it. So, while I’d love to have more free data available, I know that the challenges to getting there are quite steep.
A few sources of open and free data:
Twine – Misc.
OpenStreetMap – Geographical data
Freebase – Open shared database
OpenTick – Financial data
Numbrary – Numbers
DBpedia – Structured data from Wikipedia
Swivel – Misc and nice visuals
Jigsaw – business contacts
InfoChimps – Misc free data sets
NumberZoom – Phone numbers