Quantcast
Channel: Cartonomics: Space, Web and Society » Digital methods
Viewing all articles
Browse latest Browse all 5

The Internet as a software: repurposing API for online research

$
0
0

This paper is freshly written after another three days escapade I had the pleasure to make at the Digital Methods Initiative at the University of Amsterdam. The title comes from a remark from Bernhard Rieder stating that the Internet is getting more and more like a piece of software and less than a documentary system. As a consequence it is becoming less and less user-readable: one needs to dig deeper than HTML pages in order to see the mechanics of the internet. This remark will constitute the building block of the present blog post where I will present API as a resource to do online research. This post is largely based on the papers presentation, talks and workshop at the 2012 DMI winter school, where this year’s sexy theme was interfaces for the cloud.

Let us start with the various conceptions of the Web that were successively put forward to describe it:

  1. The Web as a navigational space: the web is something where you can navigate from one page to another. It does have a topology constituted with websites, pages, linked together with hyperlink. As a user, you would surf from one page to another into this document space. this conception of the web triggered various spatial conception of the Web as alternative and mappable space, i.e. cyberspace.
  2.  The Web as a platform: that was the big promise of the Web 2.0 age: the web is not only a succession of static page, but a succession of platform where you can build upon. The emphasis here is on user participation  where on can add pages, create links between pages, add content to platform, and so on.
  3.  The end of the Web? lots as been said about the decline of the Web: other information distribution models eg. based on  applications, are other ways to gain searched information or accessing a website without navigating between pages or starting  from a browser.

Another way to look at the end of the Web is by looking at how the hyperlink evolved in its functions and structure. On this matter, Anne Helmond presentation was clearly describing the fact that hyperlink are no longer manual links: links as a manual practice from webmaster linking one page to another is not at the centre of the stage anymore. On the opposite, a further algorithmisation of the hyperlink is taking place with the increasing presence of social button or widgets, which pre-configures hyperlinking practices : if the act of creating a link (by liking or digging) between two pages still keeps its share of user participation, it is encapsulated in applications that take care of the destination of the link. To say it otherwise and by borrowing an Olivier Etzrscheid’s blog post, the like will kill the link.

Another way to look at the algorithmization of the Web is by taking a look at the increasing place of API. The invention of API is not new (Bernhard Rieder, in his archeology of API, pointed at the Google Soap API); however, they were mostly business to business solutions: for this reason they had  strong requirements on integrity, hence making them not really handy. On the other side,  web API developped online are much lighter and looks like like “API for the masses” by comparing them with the previous versions.

On the same talk, Bernhard opened up with various conceptions of API and which kind of research perspective they could bring:

In this perspective, as Esther Weltevrede clearly showed in her presentation (from a collaborative work with Noortje Marres) that API appear as the nice and polite way of doing research versus the “punkish” way that scraping is. Scraping rises various obvious legal implications, but also technical issues, as detailed by Dick Hall, Infochimps business development manager and interviewd by Audrey Watters: the acceptable use of websites (ie. the number of times a scraper can visit a website) is defined by their Terms of Service: but these are different for every services and may change unilaterally from time to time, making scraping more difficult. But having these difficulties in mind, API and scraping calls for new possibilities to reach a real-time sociology.

If API provides a cleaner access to data, they are not clear of every critics: as Boyd and Crawford said in their paper about big data, the representativity of the set is not always easy to know. Getting data from an API firehose does not give you access to all the data set, but usually to a very small amount of all the data. For Twitter research, one has the choice between between what they call a  spritzer (roughly 1% public tweets) or a gardenhose  (roughly 10% public tweets).

In parallel of research practices, scraping can be of accessing data in a case of data scarcity. For instance, in the wake of the Fukushima power plant accident, numerous developers and hackers started scraping the official websites in charge of monitoring as they were not releasing data in structured format: scraping was then a way to aggregate information that were distributed amongst many websites, but also to publish structured data feeds (like the one set up by Marian Steinbach, generating a CSV every 10 minutes from the official monitoring data SPEEDI) that could eventually be use to create maps and other visualization means. if accessing to the data does not create an immediate empowerment situation, let us just say that the black box was opened without asking for the key: can we talk about tactical scraping?

  • API to study the evolution of the Web: this is probably the less common use of API: it can act as a piece of evidence while investigating the Web in itself: what can API tell us about the contemporary web in terms of properties, modus operandi and uses? That was the underlying questions under the various projects of the last DMI winter school.

Some project members took a look under the Facebook API rug: a first project tried to identify various possible information gap present in Facebook by double-crossing the FB API with other more engaged website, like Opensecrets.org of factcheck.org: Mitt Romney FB was taken as an example of cross-sourced FB page. Following the same motivation, a second project tried to see the differences between the elements available on the FB application (on user side) vs. the data you access through the API (on developper’s side) in order to develop  various “validity checking” possibilities.

Based on Jean Tinguely machines, a third project aimed to show API for themselves and in their all nakedness: they used the ready-made API platform IFTT (standing for If This Then That) and aiming to put the internet work for you, to let various API play with each other and joyfully intertwine: after they set up a profile for jean Tinguely in various online apps and services, they happily let the various API talk to each other and create some snowball actions: If Twitter message, then (empty) picture on Flickr ; if email, then flickr ; and my favorite, if twitter then twitter. Funnily enough, according to one the project member, one of the hardest part was not to add any content but to keep the API work within themselves.

Finally, the last project tried to track the tracker present online in different spheres: to do so, the firefox plugin Ghostery was repurposed in order to get a list of common online tracker, eg. widget, social button, analytics, etc. A, ad hoc tool was build by Koen Martens and Erik Borra which could identify the tracker present in lists of URL. We tried to compare the trackness of various spheres, eg. comparing national spheres, adult entertainment websites vs. children entertainment websites, technology blogs vs. news blog, etc. the results were transformed in GDF and visualized through Gephi.

And the last added one by Bernhard Rieder where he scraped the programmable web mashup repository and created this graph: one can which API are most widely used and which ones are combined. Here again, made with Gephi.

 

Update: Anne Helmond, PHD candidate at the University of Amsterdam and member of the Digital methods initiative, published her great introduction to API critique based on the presentation she gave at the winter school, as well as a summary of collaborative notes about this workshop .


Viewing all articles
Browse latest Browse all 5

Trending Articles