The Internet as a software: repurposing API for online research

This paper is freshly written after another three days escapade I had the pleasure to make at the Digital Methods Initiative at the University of Amsterdam. The title comes from a remark from Bernhard Rieder stating that the Internet is getting more and more like a piece of software and less than a documentary system. As a consequence it is becoming less and less user-readable: one needs to dig deeper than HTML pages in order to see the mechanics of the internet. This remark will constitute the building block of the present blog post where I will present API as a resource to do online research. This post is largely based on the papers presentation, talks and workshop at the 2012 DMI winter school, where this year’s sexy theme was interfaces for the cloud.

Let us start with the various conceptions of the Web that were successively put forward to describe it:

The Web as a navigational space: the web is something where you can navigate from one page to another. It does have a topology constituted with websites, pages, linked together with hyperlink. As a user, you would surf from one page to another into this document space. this conception of the web triggered various spatial conception of the Web as alternative and mappable space, i.e. cyberspace.
The Web as a platform: that was the big promise of the Web 2.0 age: the web is not only a succession of static page, but a succession of platform where you can build upon. The emphasis here is on user participation where on can add pages, create links between pages, add content to platform, and so on.
The end of the Web? lots as been said about the decline of the Web: other information distribution models eg. based on applications, are other ways to gain searched information or accessing a website without navigating between pages or starting from a browser.

Another way to look at the end of the Web is by looking at how the hyperlink evolved in its functions and structure. On this matter, Anne Helmond presentation was clearly describing the fact that hyperlink are no longer manual links: links as a manual practice from webmaster linking one page to another is not at the centre of the stage anymore. On the opposite, a further algorithmisation of the hyperlink is taking place with the increasing presence of social button or widgets, which pre-configures hyperlinking practices : if the act of creating a link (by liking or digging) between two pages still keeps its share of user participation, it is encapsulated in applications that take care of the destination of the link. To say it otherwise and by borrowing an Olivier Etzrscheid’s blog post, the like will kill the link.

Another way to look at the algorithmization of the Web is by taking a look at the increasing place of API. The invention of API is not new (Bernhard Rieder, in his archeology of API, pointed at the Google Soap API); however, they were mostly business to business solutions: for this reason they had strong requirements on integrity, hence making them not really handy. On the other side, web API developped online are much lighter and looks like like “API for the masses” by comparing them with the previous versions.

On the same talk, Bernhard opened up with various conceptions of API and which kind of research perspective they could bring:

API as data source: that is the main use of API, where they are used to develop application based on the data they provide (most of the programmable web mashups enters this category);
API as research object: the API firehose give access to a specific amount of data that can be analyzed: three examples of “Twitter studies” can be about the french blogosphere, public opinion in australian or the london riots online.

In this perspective, as Esther Weltevrede clearly showed in her presentation (from a collaborative work with Noortje Marres) that API appear as the nice and polite way of doing research versus the “punkish” way that scraping is. Scraping rises various obvious legal implications, but also technical issues, as detailed by Dick Hall, Infochimps business development manager and interviewd by Audrey Watters: the acceptable use of websites (ie. the number of times a scraper can visit a website) is defined by their Terms of Service: but these are different for every services and may change unilaterally from time to time, making scraping more difficult. But having these difficulties in mind, API and scraping calls for new possibilities to reach a real-time sociology.

If API provides a cleaner access to data, they are not clear of every critics: as Boyd and Crawford said in their paper about big data, the representativity of the set is not always easy to know. Getting data from an API firehose does not give you access to all the data set, but usually to a very small amount of all the data. For Twitter research, one has the choice between between what they call a spritzer (roughly 1% public tweets) or a gardenhose (roughly 10% public tweets).

In parallel of research practices, scraping can be of accessing data in a case of data scarcity. For instance, in the wake of the Fukushima power plant accident, numerous developers and hackers started scraping the official websites in charge of monitoring as they were not releasing data in structured format: scraping was then a way to aggregate information that were distributed amongst many websites, but also to publish structured data feeds (like the one set up by Marian Steinbach, generating a CSV every 10 minutes from the official monitoring data SPEEDI) that could eventually be use to create maps and other visualization means. if accessing to the data does not create an immediate empowerment situation, let us just say that the black box was opened without asking for the key: can we talk about tactical scraping?

API to study the evolution of the Web: this is probably the less common use of API: it can act as a piece of evidence while investigating the Web in itself: what can API tell us about the contemporary web in terms of properties, modus operandi and uses? That was the underlying questions under the various projects of the last DMI winter school.

Some project members took a look under the Facebook API rug: a first project tried to identify various possible information gap present in Facebook by double-crossing the FB API with other more engaged website, like Opensecrets.org of factcheck.org: Mitt Romney FB was taken as an example of cross-sourced FB page. Following the same motivation, a second project tried to see the differences between the elements available on the FB application (on user side) vs. the data you access through the API (on developper’s side) in order to develop various “validity checking” possibilities.

Based on Jean Tinguely machines, a third project aimed to show API for themselves and in their all nakedness: they used the ready-made API platform IFTT (standing for If This Then That) and aiming to put the internet work for you, to let various API play with each other and joyfully intertwine: after they set up a profile for jean Tinguely in various online apps and services, they happily let the various API talk to each other and create some snowball actions: If Twitter message, then (empty) picture on Flickr ; if email, then flickr ; and my favorite, if twitter then twitter. Funnily enough, according to one the project member, one of the hardest part was not to add any content but to keep the API work within themselves.

Finally, the last project tried to track the tracker present online in different spheres: to do so, the firefox plugin Ghostery was repurposed in order to get a list of common online tracker, eg. widget, social button, analytics, etc. A, ad hoc tool was build by Koen Martens and Erik Borra which could identify the tracker present in lists of URL. We tried to compare the trackness of various spheres, eg. comparing national spheres, adult entertainment websites vs. children entertainment websites, technology blogs vs. news blog, etc. the results were transformed in GDF and visualized through Gephi.

And the last added one by Bernhard Rieder where he scraped the programmable web mashup repository and created this graph: one can which API are most widely used and which ones are combined. Here again, made with Gephi.

Update: Anne Helmond, PHD candidate at the University of Amsterdam and member of the Digital methods initiative, published her great introduction to API critique based on the presentation she gave at the winter school, as well as a summary of collaborative notes about this workshop .

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.

The Internet as a software: repurposing API for online research

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112