http://www.outskirtspress.com/successfulreadingstrategies/

http://www.outskirtspress.com/successfulreadingstrategies/
http://www.outskirtspress.com/successfulreadingstrategies/

Sunday, February 20, 2011

What is Web 2.0?

What is Web 2.0?
Ideas, technologies and implications for
education
by
Paul Anderson

Web 2 Executive summary
Within 15 years the Web has grown from a group work tool for scientists at CERN into a global
information space with more than a billion users. Currently, it is both returning to its roots as a
read/write tool and also entering a new, more social and participatory phase. These trends have led to a
feeling that the Web is entering a ‘second phase’—a new, ‘improved’ Web version 2.0. But how
justified is this perception?
This TechWatch report was commissioned to investigate the substance behind the hyperbole
surrounding ‘Web 2.0’ and to report on the implications this may have for the UK Higher and Further
Education sector, with a special focus on collection and preservation activities within libraries. The
report argues that by separating out the discussion of Web technologies (ongoing Web development
overseen by the W3C), from the more recent applications and services (social software), and attempts
to understand the manifestations and adoption of these services (the ‘big ideas’), decision makers will
find it easier to understand and act on the strategic implications of ‘Web 2.0’. Indeed, analysing the
composition and interplay of these strands provides a useful framework for understanding its
significance.
The report establishes that Web 2.0 is more than a set of ‘cool’ and new technologies and services,
important though some of these are. It has, at its heart, a set of at least six powerful ideas that are
changing the way some people interact. Secondly, it is also important to acknowledge that these ideas
are not necessarily the preserve of ‘Web 2.0’, but are, in fact, direct or indirect reflections of the power
of the network: the strange effects and topologies at the micro and macro level that a billion Internet
users produce. This might well be why Sir Tim Berners-Lee, the creator of the World Wide Web,
maintains that Web 2.0 is really just an extension of the original ideals of the Web that does not
warrant a special moniker. However, business concerns are increasingly shaping the way in which we
are being led to think and potentially act on the Web and this has implications for the control of public
and private data. Indeed, Tim O’Reilly’s original attempt to articulate the key ideas behind Web 2.0
was focused on a desire to be able to benchmark and therefore identify a set of new, innovative
companies that were potentially ripe for investment. The UK HE sector should debate whether this is a
long-term issue and maybe delineating Web from Web 2.0 will help us to do that.
As with other aspects of university life the library has not escaped considerable discussion about the
potential change afforded by the introduction of Web 2.0 and social media. One of the key objectives
of the report is to examine some of the work in this area and to tease out some of the key elements of
ongoing discussions. For example, the report argues that there needs to be a distinction between
concerns around quality of service and ‘user-centred change’ and the services and applications that are
being driven by Web 2.0 ideas. This is particularly important for library collection and preservation
activities and some of the key questions for libraries are: is the content produced by Web 2.0 services
sufficiently or fundamentally different to that of previous Web content and, in particular, do its
characteristics make it harder to collect and preserve? Are there areas where further work is needed by
researchers and library specialists? The report examines these questions in the light of the six big ideas
as well as the key Web services and applications, in order to review the potential impact of Web 2.0
on library services and preservation activities.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
3
CONTENTS
Introduction 4
1. Web 2.0 or Web 1.0?: a tale of two Tims 5
2. Key Web 2.0 services/applications 7
2.1 Blogs 7
2.2 Wikis 8
2.3 Tagging and social bookmarking 9
2.4 Multimedia sharing 10
2.5 Audio blogging and podcasting 10
2.6 RSS and syndication 10
2.7 Newer Web 2.0 services and applications 12
3. The big ideas behind Web 2.0 14
3.1 Individual production and User Generated Content 14
3.2 Harnessing the power of the crowd 15
3.3 Data on an epic scale 18
3.4 Architecture of Participation 19
3.5 Network effects, power laws and the Long Tail 20
3.6 Open-ness 25
4. Technology and standards 27
4.1 Ajax 27
4.2 Alternatives to Ajax 28
4.3 SOAP vs REST 29
4.4 Micro-formats 30
4.5 Open APIs 31
5. Educational and institutional issues 32
5.1 Teaching and Learning 32
5.2 Scholarly Research 34
5.3 Academic Publishing 35
5.4 Libraries, repositories and archiving 36
6. Looking ahead - the Future of Web 2.0 46
6.1 Web 2.0 and Semantic Web 47
6.2 The emerging field of Web Science 49
6.3 The continued development of the Web as platform 49
6.4 Trust, privacy, security and social networks 49
6.5 Web 2.0 and SOA 50
6.6 Technology Bubble 2.0? 51
6.7 And Web 3.0? 52
Conclusion 53
About the Author 53
Appendix A: Recommendations & points for further debate 54
References 57
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
4
Introduction
At the end of 2006, Time magazine’s Person of the Year was ‘You’. On the cover of the magazine,
underneath the title of the award, was a picture of a PC with a mirror in place of the screen, reflecting
not only the face of the reader, but also the general feeling that 2006 was the year of the Web - a new,
improved, 'second version', 'user generated' Web. But how accurate is our perception of so-called 'Web
2.0'? Is there real substance behind the hyperbole? Is it a publishing revolution or is it a social
revolution? Is it actually a revolution at all? And what will it mean for education, a sector that is
already feeling the effects of the demands of Internet-related change?
In this TechWatch report I argue for the distinction between Web technologies (ongoing Web
development overseen by the W3C), the more recent applications and services that are emerging as a
result of this ongoing technological development (social software), and attempts to understand the
manifestations and adoption of these newer applications and services. I start with a brief discussion of
the historical context, with Sir Tim Berners-Lee and his vision for a single, global, collaborative
information space and contrast this story of the technology with the ideas of Tim O'Reilly, who has
attempted to understand the ways in which knowledge about the technologies, and the adoption of the
technologies, can be used to make predictions about technology markets.
Media coverage of Web 2.0 concentrates on the common applications/services such as blogs, video
sharing, social networking and podcasting—a more socially connected Web in which people can
contribute as much as they can consume. In chapter two I provide a brief introduction to some of these
services, many of them built on the technologies and open standards that have been around since the
earliest days of the Web, and show how they have been refined, and in some cases concatenated, to
provide a technological foundation for delivering services to the user through the browser window
(based on the key idea of the Web, rather than the desktop, as the technology platform). But is this
Web 2.0? Indeed, it can be argued that these applications and services are really just early
manifestations of ongoing Web technology development. If we look at Web 2.0 as it was originally
articulated we can see that it is, in fact, an umbrella term that attempts to express explicitly the
framework of ideas that underpin attempts to understand the manifestations of these newer Web
services within the context of the technologies that have produced them.
In section three I articulate six 'big' ideas, based on concepts originally outlined by Tim O’Reilly,
which can help us to explain and understand why Web 2.0 has had such a huge impact. In short, these
are ideas about building something more than a global information space; something with much more
of a social angle to it. Collaboration, contribution and community are the order of the day and there is
a sense in which some think that a new 'social fabric' is being constructed before our eyes. These ideas
though, need technology in order to be realised into the functioning Web-based services and
applications that we are using.
Education and educational institutions will have their own special issues with regard to Web 2.0
services and technologies and in section five I look at some of these issues. By special request,
particular attention has been given to libraries and preservation and the issues that present themselves
for those tasked with preserving some of the material produced by these services and applications.
Finally, I look to the future. What are the technologies that will affect the next phase of the Web’s
development: what one might call, rather reluctantly, Web 3.0?
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
5
1. 'Web 2.0' or 'Web 1.0'?: a tale of two Tims
Web 2.0 is a slippery character to pin down. Is it a revolution in the way we use the Web? Is it another
technology 'bubble'? It rather depends on who you ask. A Web technologist will give quite a different
answer to a marketing student or an economics professor.
The short answer, for many people, is to make a reference to a group of technologies which have
become deeply associated with the term: blogs, wikis, podcasts, RSS feeds etc., which facilitate a
more socially connected Web where everyone is able to add to and edit the information space. The
longer answer is rather more complicated and pulls in economics, technology and new ideas about the
connected society. To some, though, it is simply a time to invest in technology again—a time of
renewed exuberance after the dot-com bust.
For the inventor of the Web, Sir Tim Berners-Lee, there is a tremendous sense of
this. When asked in an interview for a podcast, published on IBM’s website, whether Web 2.0 was
different to what might be called Web 1.0 because the former is all about connecting people, he
replied:
déjà vu about all
"Totally not. Web 1.0 was all about connecting people. It was an interactive space, and I think Web
2.0 is of course a piece of jargon, nobody even knows what it means. If Web 2.0 for you is blogs and
wikis, then that is people to people. But that was what the Web was supposed to be all along. And in
fact, you know, this 'Web 2.0', it means using the standards which have been produced by all these
people working on Web 1.0."
1
Laningham (ed.), developerWorks Interviews, 22
To understand Sir Tim’s attitude one needs look back at the history of the development of the Web,
which is explored in his book
collaborative workspace where everything was linked to everything in a ‘single, global information
space’ (p. 5), and, crucially for this discussion, the assumption was that ‘everyone would be able to
edit in this space’ (IBM podcast, 12:20 minutes). The first development was Enquire, a rudimentary
project management tool, developed while Berners-Lee was working at CERN, which allowed pages
of notes to be linked together and edited. A series of further technological and software developments
led to the creation of the World Wide Web and a browser or Web client that could view
of marked-up information (HTML). However, during a series of ports to other machines from the
original development computer, the ability to edit through the Web client was not included in order to
speed up the process of adoption within CERN (Berners-Lee, 1999). This attitude to the ‘edit’ function
continued through subsequent Web browser developments such as ViolaWWW and Mosaic (which
became the Netscape browser). Crucially, this left people thinking of the Web as a medium in which a
relatively small number of people published and most browsed, but it is probably more accurate to
picture it as a fork in the road of the technology's development, one which has meant that the original
pathway has only recently been rejoined.
The term ‘Web 2.0’ was officially coined in 2004 by Dale Dougherty, a vice-president of O’Reilly
Media Inc. (the company famous for its technology-related conferences and high quality books)
during a team discussion on a potential future conference about the Web (O’Reilly, 2005a). The team
wanted to capture the feeling that despite the dot-com boom and subsequent bust, the Web was ‘more
important than ever, with exciting new applications and sites popping up with surprising regularity’
(O’Reilly, 2005a, p. 1). It was also noted, at the same meeting, that companies that had survived the
dot-com firestorms of the late 90s now appeared to be stronger and have a number of things in
common. Thus it is important to note that the term was not coined in an attempt to capture the essence
of an identified group of technologies, but an attempt to capture something far more amorphous.
nd August, 2006.Weaving the Web (1999). His original vision was very much of aand edit pages
1
txt
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
A transcript of the podcast is available at: http://www-128.ibm.com/developerworks/podcast/dwi/cmint082206.[last accessed 17/01/07].
6
The second Tim in the story, Tim O’Reilly himself, the founder of the company, then followed up this
discussion with a now famous paper,
Next Generation of Software,
is important to note that this paper was an attempt to make explicit certain features that could be used
to identify a particular set of innovative companies, including business characteristics, such as the fact
that they have control over unique, hard-to-recreate data sources (something that could become
increasingly significant for H&FE), or that they have lightweight business models. The paper did,
however, identify certain features that have come to be associated with ‘social software’ technologies,
such as participation, user as contributor, harnessing the power of the crowd, rich user experiences
etc., but it should be noted that these do not constitute a
Lee has pointed out, the ability to implement this technology is all based on so-called ‘Web 1.0’
standards, as we shall see in section four, and that, in fact, it’s just taken longer for it to be
implemented than was initially anticipated. From this perspective, ‘Web 2.0’ should not therefore be
held up in opposition to ‘Web 1.0’, but should be seen as a consequence of a more fully implemented
Web.
This distinction is key to understanding where the boundaries are between ‘the Web’, as a set of
technologies, and ‘Web 2.0’—the attempt to conceptualise the significance of a set of outcomes that
are enabled by those Web technologies. Understanding this distinction helps us to think more clearly
about the issues that are thrown up by both the technologies and the results of the technologies, and
this helps us to better understand why something might be classed as ‘Web 2.0’ or not. In order to be
able to discuss and address the Web 2.0 issues that face higher education we need to have these
conceptual tools in order to identify why something might be significant and whether or not we should
act on it.
For example, Tim O'Reilly, in his original article, identifies what he considers to be features of
successful ‘Web 1.0’ companies and the ‘most interesting’ of the new applications. He does this in
order to develop a set of concepts by which to benchmark whether or not a company is Web 1.0 or
Web 2.0. This is important to him because he is concerned that ‘t
widespread that companies are now pasting it on as a marketing buzzword, with no real understanding
of just what it means’ (O’Reilly,
behind the original O’Reilly discussions of Web 2.0 he
as platform, Harnessing collective intelligence, Data is the next 'Intel inside', End of the software
release cycle, Lightweight programming models, Software above the level of single device, and Rich
user experiences. In this report I have adapted some of O'Reilly's seven principles, partly to avoid
ambiguity (for example, I use ‘harnessing the 'power of the crowd'’, rather than ‘collective
intelligence’ as I believe this more accurately describes the articulation of the concept in its original
form), and partly to provide the conceptual tools that people involved in HE practice and decision
making have expressed a need for.
What is Web 2.0: Design Patterns and Business Models for theoutlining in detail what the company thought they meant by the term. Itde facto Web (r)evolution. As Tim Berners-he Web 2.0 meme has become so2005a, p.1). In order to express some of the concepts which werelists and describes seven principles: The Web
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
7
Well-known or education-based blogs:
http://radar.oreilly.com/
http://www.techcrunch.com/
http://www.instapundit.com/
http://blogs.warwick.ac.uk/
*
http://jiscdigitisation.typepad.com/jisc_
digitisation_program/
Software:
*
http://wordpress.org/
*
http://www.sixapart.com/typepad/
http://www.blogger.com/start
http://radio.userland.com/
http://www.bblog.com/
Blog search services:
http://technorati.com/
http://www.gnosh.org/
http://blogsearch.google.com/
http://www.weblogs.com/about.html
2. Key Web 2.0 services/applications
There are a number of Web-based services and applications that demonstrate the foundations of the
Web 2.0 concept, and they are already being used to a certain extent in education. These are not really
technologies as such, but services (or user processes) built using the building blocks of the
technologies and open standards that underpin the Internet and the Web. These include blogs, wikis,
multimedia sharing services, content syndication, podcasting and content tagging services. Many of
these applications of Web technology are relatively mature, having been in use for a number of years,
although new features and capabilities are being added on a regular basis. It is worth noting that many
of these newer technologies are concatenations, i.e. they make use of existing services. In the first part
of this section we introduce and review these well-known and commonly used services with a view to
providing a common grounding for later discussion.
NB * indicates an open source or other, similar, community or public-spirited project.
2.1 Blogs
The term web-log, or
1997 and refers to a simple webpage consisting of brief
paragraphs of opinion, information, personal diary entries,
or links, called
most recent first, in the style of an online journal (Doctorow
blog, was coined by Jorn Barger inposts, arranged chronologically with the
et al.
, 2002). Most blogs also allow visitors to add a
comment
This posting and commenting process contributes to the
nature of blogging (as an exchange of views) in what Yale
University law professor, Yochai Benkler, calls a ‘weighted
conversation’ between a primary author and a group of
secondary comment contributors, who communicate to an
unlimited number of readers. It also contributes to
blogging's sense of immediacy, since ‘blogs enable
individuals to write to their Web pages in journalism time –
that is hourly, daily, weekly – whereas the Web page
culture that preceded it tended to be slower moving: less an
equivalent of reportage than of the essay’ (Benkler, 2006, p.
217).
Each post is usually ‘tagged’ with a keyword or two, allowing the subject of the post to be categorised
within the system so that when the post becomes old it can be filed into a standard, theme-based menu
system
list of other posts by the same author on the blogging software’s system that use the same tag.
Linking is also an important aspect of blogging as it deepens the conversational nature of the
blogosphere (see below) and its sense of immediacy. It also helps to facilitate retrieval and referencing
of information on different blogs but some of these are not without inherent problems:
below a blog entry.2. Clicking on a post’s description, or tag (which is displayed below the post), will take you to a
to a particular post. If the item is moved within the database, e.g. for archiving, the permalink
stays the same. Crucially, if the post is renamed, or if the content is changed in any way, the
The permalink is a permanent URI which is generated by the blogging system and is applied
2
returning to a blog’s homepage after several weeks or months to find a particular piece of content is potentially a
hit and miss affair. The development of the permalink was an attempt to counter this, but has its own inherent
problems.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
Blog content is regularly filed so that only the latest content is available from the homepage. This means that
8
Examples of wikis:
http://wiki.oss-watch.ac.uk/
*
http://wiki.cetis.ac.uk/CETIS_Wiki
*
http://en.wikipedia.org/wiki/Main_Page
*
http://www.ch.ic.ac.uk/wiki/index.php/Main_P
age
http://www.wikihow.com
Software:
http://meta.wikimedia.org/wiki/MediaWiki
*
http://www.socialtext.com/products/overview
http://www.twiki.org/
http://uniwakka.sourceforge.net/HomePage
Online notes on using wikis in education:
http://www.wikiineducation.com/display/ikiw/
Home
*
permalink will still remain unchanged: i.e. there is no version control, and using a permalink
does not guarantee the content of a post.
referenced or commented on one of blogger B’s posts. When blog B receives notification from
blog A that a trackback has been created, blog B’s system automatically creates a record of the
permalink of the referring post. Trackback only works when it is enabled on both the referring
and the referred blogs. Some bloggers deliberately disable trackback as it can be a route in for
spammers.
Trackback (or pingback) allows a blogger (A) to notify another blogger (B) that they have
similar to a blog ‘bookmark’ or ‘favourites’ list.
Blog software also facilitates
the headline, is made available to other software via RSS and, increasingly, Atom. This content is then
aggregated into feeds, and a variety of blog aggregators and specialist blog reading tools can make use
of these feeds (see Table 1 for some key examples).
The large number of people engaged in blogging has given rise to its own term –
express the sense of a whole ‘world’ of bloggers operating in their own environment. As technology
has become more sophisticated, bloggers have begun to incorporate multimedia into their blogs and
there are now photo-blogs, video blogs (vlogs), and, increasingly, bloggers can upload material
directly from their mobile phones (mob-blogging). For more on the reasons why people blog, the style
and manner of their blogging and the subject areas that are covered, see Nardi
The blogroll is a list of links to other blogs that a particular blogger likes or finds useful. It issyndication, in which information about the blog entries, for example,blogosphere – toet al., 2004.
2.2 Wikis
A
easily edited by anyone who is allowed access
(Ebersbach
has meant that the concept of the wiki, as a
collaborative tool that facilitates the production of a
group work, is widely understood. Wiki pages have
an edit button displayed on the screen and the user
can click on this to access an easy-to-use online
editing tool to change or even delete the contents of
the page in question. Simple, hypertext-style linking
between pages is used to create a navigable set of
pages.
Unlike blogs, wikis generally have a
which allows previous versions to be examined, and a
wiki3 is a webpage or set of webpages that can beet al., 2006). Wikipedia’s popular successhistory function,
rollback
Proponents of the power of wikis cite the ease of use
(even playfulness) of the tools, their extreme
flexibility and open access as some of the many
reasons why they are useful for group working
(Ebersbach
There are undeniably problems for systems that allow such a level of openness, and Wikipedia itself
has suffered from problems of malicious editing and vandalism (Stvilia
are also those who argue that acts of vandalism and mistakes are rectified quite quickly by the self-
function, which restores previous versions.et al., 2006; Lamb, 2004).et al., 2005). However, there
3
Cunningham's concept of the wikiwikiWeb, in 1995.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
Ebersbach et al. traces this from the Hawaiian word, wikiwiki, meaning 'quick' or 'hurry' from Ward
9
Examples of tagging services:
http://www.connotea.org/
http://www.citeulike.org/
*
http://www.librarything.com/
http://del.icio.us/
http://www.sitebar.org
http://www.furl.net/index.jsp
http://www.stumbleupon.com/
http://www.blinklist.com/
http://www.digg.com/
http://www.rawsugar.com
http://del.icio.us/elearningfocus/web2.0
*
moderation processes at work. Alternatively, restricting access to registered users only, is often used
for professional, work group wikis (Cych, 2006).
2.3 Tagging and social bookmarking
A tag is a keyword that is added to a digital object (e.g. a website, picture or video clip) to describe it,
but not as part of a formal classification system. One of the first large-scale applications of tagging
was seen with the introduction of Joshua Schacter’s del.icio.us website, which launched the ‘social
bookmarking’ phenomenon.
Social bookmarking systems share a number of common
features (Millen
lists of ‘bookmarks’ or ‘favourites’, to store these centrally
on a remote service (rather than within the client browser)
and to share them with other users of the system (the
‘social’ aspect). These bookmarks can also be tagged with
keywords, and an important difference from the ‘folder’-
based categorisation used in traditional, browser-based
bookmark lists is that a bookmark can belong in more than
one category. Using tags, a photo of a tree could be
categorised with both ‘tree’ and ‘larch’, for example.
The concept of tagging has been widened far beyond website bookmarking, and services like Flickr
(photos), YouTube (video) and Odeo (podcasts) allow a variety of digital artefacts to be socially
tagged. For example, the BBC’s Shared Tags
of the public to tag BBC News online items. A particularly important example within the context of
higher education is Richard Cameron’s CiteULike
and share the academic papers they are reading. When you see a paper on the Web that interests you,
you click a button and add it to your personal library. CiteULike automatically extracts the citation
details, so you don’t have to type them in. This tool was used during the research for this report.
The idea of tagging has been expanded to include what are called
from a number of different users of a tagging service, which collates information about the frequency
with which particular tags are used. This frequency information is often displayed graphically as a
‘cloud’ in which tags with higher frequency of use are displayed in larger text.
Large organisations are beginning to explore the potential of these new tools and their concepts for
knowledge management across the enterprise. For example, IBM is investigating social bookmarking
through their intranet-based DogEar tool (Millen
service has set up a del.icio.us account at:
et al., 2005): They allow users to create4 project is an experimental service that allows members5, a free service to help academics to store, organisetag clouds: groups of tags (tag sets)et al., 2005). In education, JISC's e-Learning Focushttp://del.icio.us/elearningfocus [last accessed 07/02/07].
Folksonomy versus collabulary
One outcome from the practice of tagging has been the rise of the ‘folksonomy’. Unfortunately, the
term has not been used consistently and there is confusion about its application. More will be said
about this in the section on network effects, but for now it is sufficient to note that there is a distinction
between a folksonomy (a collection of tags created by an individual for their own personal use) and a
collabulary (a collective vocabulary).
4
http://backstage.bbc.co.uk/prototypes/archives/2005/05/bbc_shared_tags.html [last accessed 16/01/07].
5
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
http://www.citeulike.org/ [last accessed 16/01/07].
10
Well known photo sharing services:
http://www.flickr.com/
http://www.ourpictures.com/
http://www.snapfish.com/
http://www.fotki.com/
Well known video sharing services:
http://www.youtube.com/
http://www.getdemocracy.com/broadcast/
*
http://eyespot.com/
http://ourmedia.org/
*
http://vsocial.com
http://www.videojug.com/
Well known podcasting sites:
http://www.apple.com/itunes/store/podcasts.
html
http://btpodshow.com/
http://www.audblog.com/
http://odeo.com/
http://www.ourmedia.org/
*
http://connect.educause.edu/
*
http://juicereceiver.sourceforge.net/index.php
http://www.impala.ac.uk/
*
http://www.law.dept.shef.ac.uk/podcasts/
*
2.4 Multimedia sharing
One of the biggest growth areas has been amongst services
that facilitate the storage and sharing of multimedia
content. Well known examples include YouTube (video)
Flickr (photographs) and Odeo (podcasts). These popular
services take the idea of the ‘writeable’ Web (where users
are not just consumers but contribute actively to the
production of Web content) and enable it on a massive
scale. Literally millions of people now participate in the
sharing and exchange of these forms of media by producing
their own podcasts, videos and photos. This development
has only been made possible through the widespread
adoption of high quality, but relatively low cost digital
media technology such as hand-held video cameras.
2.5 Audio blogging and podcasting
Podcasts are audio recordings, usually in MP3 format, of
talks, interviews and lectures, which can be played either
on a desktop computer or on a wide range of handheld MP3
devices. Originally called audio blogs they have their roots
in efforts to add audio streams to early blogs (Felix and
Stolarz, 2006). Once standards had settled down and Apple
introduced the commercially successful iPod MP3 player
and its associated iTunes software, the process started to
become known as podcasting
some controversy since it implies that only the Apple iPod
will play these files, whereas, in actual fact, any MP3
player or PC with the requisite software can be used. A more recent development is the introduction of
video podcasts (sometimes shortened to vidcast or vodcast): the online delivery of video-on-demand
clips that can be played on a PC, or again on a suitable handheld player (the more recent versions of
the Apple iPod for example, provide for video playing).
A podcast is made by creating an MP3 format audio file (using a voice recorder or similar device),
uploading the file to a host server, and then making the world aware of its existence through the use of
RSS (see next section). This process (known as
directions to the audio file’s location on the host server, into the RSS file (Patterson, 2006).
Podcast listeners subscribe to the RSS feeds and receive information about new podcasts as they
become available. Distribution is therefore relatively simple. The harder part, as those who listen to a
lot of podcasts know, is to produce a good quality audio file. Podcasting is becoming increasingly
used in education (Brittain
moves to establish a UK HE podcasting community
6. This term is not withoutenclosure) adds a URL link to the audio file, as well aset al., 2006; Ractham and Zhang, 2006) and recently there have been7.
2.6 RSS and syndication
RSS is a family of formats which allow users to find out about updates to the content of RSS-enabled
websites, blogs or podcasts without actually having to go and visit the site. Instead, information from
the website (typically, a new story's title and synopsis, along with the originating website’s name) is
6
Coined by Ben Hammersley in a Guardian article on 12th February 2004:
http://technology.guardian.co.uk/online/story/0,3605,1145689,00.html
[last accessed 14/02/07].7
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
11
collected within a feed (which uses the RSS format) and ‘piped’ to the user in a process known as
syndication.
In order to be able to use a feed a prospective user must install a software tool known as an
aggregator
or
feeds they want to receive and then
for updates to the RSS feed and keep the user informed of any changes.
feed reader, onto their computer desktop. Once this has been done, the user must decide which RSSsubscribe to them. The client software will then periodically check
Illustration 1: Example of an RSS feed aggregation tool (NetNewsWire).
Technically, RSS is an XML-based data format for websites to exchange files
that contain publishing information and summaries of the site’s contents.
Indeed, in its earliest incarnation, RSS was understood to stand for Rich Site
Summary (Doctorow, 2002). For a variety of historical reasons there are a
number of RSS formats (RSS 0.91, RSS 0.92, RSS 1.0, RSS 2.0) and there are
some issues of incompatibility
later version of RSS 1.0, but is a different format. As it has become more widely used for blog content
syndication, in later versions RSS became known as Really Simple Syndication
tools now create and publish these RSS feeds automatically and webpages and blogs frequently
display small RSS icons and links to allow a quick process of registering to get a feed from the site
(see above, right).
In 2003 a new syndication system was proposed and developed under the name
up some of the inconsistencies between RSS versions and the problems with the way they
interoperate. This consists of two standards: the Atom Syndication Format, an XML language used for
Web feeds, and the Atom Publishing Protocol (APP), a HTTP-based protocol for creating and
updating Web resources. There is considerable discussion between proponents of RSS and Atom as to
which is the best way forward for syndication. The two most important differences between the two
are, firstly, that the development of Atom is taking place through a formal and open standards process
within the IETF
(known as the payload container) is more clearly defined. Atom can also support the enclosure of
8. It is worth noting that RSS 2.0 is not simply a9. A lot of bloggingAtom in order to clear10, and, secondly, that with Atom the actual content of the feed item’s encoding
8
See: http://blogs.law.harvard.edu/tech/rssVersionHistory for a history of the versions [last accessed 14/02/07].
9
See RSS Advisory Board service: http://www.rssboard.org/ [last accessed 14/02/07].
10
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
The Internet Engineering Task Force.
12
more than one podcast file at a time (see podcasting section) and so multiple file formats of the same
podcast can be syndicated at the same time
11.
2.7 Newer Web 2.0 services and applications
As we have seen, there are a number of technology services that are often posited as representing the
Web 2.0 concept in some way. In recent months, however, there has been an explosion of new ideas,
applications and start-up companies working on ways to extend existing services. Some of these are
likely to become more important than others, and some are certainly more likely to be more relevant to
education than others. There is such a deluge of new services that it is often difficult to keep track of
what’s ‘out there’ or to make sense of what each provides. I suggest there are two ways of helping
with this process. Firstly, to make sense of what the service is trying to do in the context of the overall
Web 2.0 ‘big ideas’ presented in section three. Secondly, as new services become available they can
be categorised roughly in terms of what they attempt to do, e.g. aggregate user data, construct a social
network etc.
In Table 1 I make a first attempt at such a categorisation process based on a small range of some of the
newer services. Such a table is only the beginning of the process and can only be snapshot as this is a
fluid market with new tools and start-up companies being announced on almost a daily basis (see, for
example, TechCrunch’s regular updates
directory which recently listed over 1,200 services in fifty categories ranging from blogging to Wifi)
12 on start-ups and new ideas; or eConsultant’s Web 2.013.
11
html
More technical detail of the Atom standard can be found at: http://www.ietf.org/html.charters/atompubcharter.and http://www-128.ibm.com/developerworks/xml/library/x-atom10.html [last accessed 14/02/07].
12
TechCrunch is a blog dedicated to profiling and reviewing new Internet products and companies:
www.techcrunch.com
13
http://www.econsultant.com/web2/
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
13
Table 1: Newer Web 2.0 services
Categorisation (based on
what they attempt to do)
Explanation and indicative links to the big ideas of Web 2.0 (see
section 3 for more detail)
Examples of service
Social Networking Professional and social networking sites that facilitate meeting people,
finding like minds, sharing content—uses ideas from harnessing the
power of the crowd, network effect and individual production/user
generated content.
Professional networking:
http://www.siphs.com/aboutus.jsp
https://www.linkedin.com/
http://www.zoominfo.com/
Social networking:
www.myspace.com
www.facebook.com
http://fo.rtuito.us/
http://www.spock.com/
(test beta only)
http://www.flock.com/
http://www.bebo.com/
Gather information from diverse sources across the Web and publish in
one place. Includes news and RSS feed aggregators and tools that
create a single webpage with all your feeds and email in one place—
uses ideas from individual production/user generated content.
http://www.techmeme.com/
http://www.google.co.uk/nwshp?hl=en
http://www.blogbridge.com/
http://www.suprglu.com/
http://www.netvibes.com/
Aggregation services
Collect and aggregate user data, user ‘attention’ (what you look at) and
intentions—uses ideas from the architecture of participation, data on
epic scale and power of the crowd.
http://www.attentiontrust.org/
http://www.digg.com/
Data 'mash-ups' Web services that pull together data from different sources to create a
new service (i.e. aggregation and recombination). Uses, for example,
ideas from data on epic scale and openness of data.
http://www.housingmaps.com/
http://darwin.zoology.gla.ac.uk/~rpage/ispecies/
http://www.rrove.com/set/item/59/top-11-us-universities
http://www.blears.net/weather/
(world weather from BBC RSS feed)
Tracking and filtering
content
Services that keep track of, filter, analyse and allow search of the
growing amounts of Web 2.0 content from blogs, multimedia sharing
services etc. Uses ideas from e.g. data on epic scale.
http://technorati.com/about/
http://www.digg.com/
http://www.blogpulse.com
http://cloudalicio.us/about/
Collaborative reference works (like Wikipedia) that are built using
wiki-like software tools. Uses ideas from harnessing the power of the
crowd.
http://www.squidoo.com/
http://wikia.com/wiki/Wikia
Collaborating
Collaborative, Web-based project and work group productivity tools.
Uses architecture of participation.
http://vyew.com/always-on/collaboration/
http://www.systemone.at/en/technology/overview#
http://www.37signals.com/
Replicate office-style
software in the browser
Web-based desktop application/document tools. Replicate desktop
applications. Based on technological developments.
http://www.google.com/google-d-s/tour1.html
http://www.stikkit.com/
http://www.backpackit.com/tour
Source ideas or work from
the crowd
Seek ideas, solutions to problems or get tasks completed by outsourcing
to users of the Web. Uses the idea of power of the crowd.
http://www.mturk.com/mturk/welcome
http://www.innocentive.com/
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
14
3. The big ideas behind Web 2.0
As outlined in section one, there is considerable speculation as to what Web 2.0 might be, and
it is inevitable that some of this would become confused as various people vie for attention in
the ongoing conversation. What I have tried to do in this section is to uncover what I believe
are the core ideas and to show, where possible, points at which various strands of related
thought start to be developed. I also try to raise some questions about how closely these
strands are related to some kind of evidence base. By looking at the history of, for example,
network theory, it is possible to see how assumptions made about the rate at which networks
grow could have contributed to the last technology boom and bust. This is important, not only
for avoiding a similar situation in the future, but also, for getting a more realistic
understanding of the role that Web 2.0 might play within education.
In this section I put forward six 'big' ideas, based on concepts originally outlined by Tim
O’Reilly, that can help us to explain and understand why Web 2.0 has had such a huge
impact. In short these are ideas about building something more than a global information
space; something with much more of a social angle to it. Collaboration, contribution and
community are the order of the day and there is a sense in which some think that a new 'social
fabric' is being constructed before our eyes. However, it is also important to acknowledge that
these ideas are not necessarily the preserve of 'Web 2.0', but are, in fact, direct or indirect
reflections of the power of the network: the strange effects and topologies at the micro and
macro level that a billion Internet users produce.
Key Idea
1 Individual production and User Generated Content
2 Harness the power of the crowd
3 Data on an epic scale
4 Architecture of Participation
5 Network Effects
6 Openness
3.1 Individual production and User Generated Content
'I have always imagined the information space as something to which everyone has
immediate and intuitive access, and not just to browse, but to
create.'
Tim Berners-Lee, 1999, p. 169
'We don't hate the media, we become the media'
Jello Biafra (Eric Boucher), 2001
14
In the 1980s the punk rock adage of "I can do that" led to thousands of young people forming
local bands and writing their own fanzines. Today’s generation are pressing ‘record’ on their
video cameras and hitting their mouse keys. With a few clicks of the mouse a user can upload
a video or photo from their digital camera and into their own media space, tag it with suitable
keywords and make the content available to their friends or the world in general. In parallel,
individuals are setting up and writing blogs and working together to create information
through the use of wikis. What these tools have done is to lower the barrier to entry,
following in the same footsteps as the 1980s self-publishing revolution sparked by the
14
From the spoken recording Become the media (Alternative Tentacles, 2001) available online at:
http://www.alternativetentacles.com/product.php?product=380
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
[last accessed 12/01/07].
15
introduction of the office laser printer and desktop publishing software pioneered by Apple
(Hertzfeld, 2005). There has been an out-pouring of production on the Web.
Much of recent media attention concerning the rise of the Web 2.0 phenomenon has focused
on what’s been given the rather ugly moniker of
to this phrase include content self-publishing, personal publishing (Downes, 2004) and ‘self
expression’.
Media interest in this is derived, in part, because the media itself is undergoing a period of
profound change as the true implications of the Web and in particular the new capability of
the viewers, or as the journalist Dan Gillmor (2004) describes them, the former audience, to
contribute materials for programmes, newspapers and websites. The widespread adoption of
cheap, fairly high quality digital cameras, videos, mobile and smartphones, have all
contributed to a rise in what’s sometimes called ‘citizen journalism’ or ‘witness
contributions’, in which newspapers and TV programmes make use of viewer’s clips of news
events. Many media organisations are undertaking major reviews of how they generate
content and investing in facilities to allow the public to have more of a role in newsgathering.
For example, The Sun newspaper now provides a single mobile phone number for members
of the public to submit copy and photos, and in South Korea the OhmyNews service has an
army of 40,000 citizen journalists edited by 50 professionals (Anderson, 2006). Meanwhile,
the BBC is working on a Creative Archive which will allow users to view and make use of
old, archived TV material, possibly ‘mashing-up’ their own versions of TV content. Many
commentators think we are entering a new era in which news is more of a ‘conversation’ and
this kind of change in people’s perception of who has the authority to ‘say’ and ‘know’ is
surely set to be a challenge within education.
So why do people engage in peer production like this? Chris Anderson (2006) says: ‘the
motives to create are not the same in the head as they are in the tail’ (see section 3.5.4).
People are driven by monetary motives at the head, but the coin of the realm at the lower end
of the tail
noticed is everything’ (Tim Wu, Professor of Law, in Anderson, 2006, p. 74).
To some commentators the increasing propensity for individuals to engage in the creation and
manipulation of information and digital artefacts is a major positive benefit. There are, of
course those who worry about where this might take us. The Chief Scientist at Xerox, John
Seely Brown worries about the loss of the structure and authority of an edited newspaper as
an institution in which a process of selection and reflection takes place (Brown and Duguid,
2000). The RSS feed is organised temporally, but what is the more important news? A
designed newspaper has a headline, an ‘above the fold’ story, and the editors have selected
the news based on lots of factors. There are also those who are sceptical over the true scale of
actual participation in all this. Over 10 million of the 13 million blogs in Blogger, a major
blog provider, are inactive according to Charles Mann (2006) who thinks that: ‘The huge
mass of dead blogs is one reason to maintain a healthy scepticism about the vast growth of the
blogosphere’ (p. 12).
user generated content (UGC). Alternativesis reputation’ (p. 73). We are living in more of an exposure culture, where ‘getting
3.2 Harnessing the power of the crowd
The term ‘harnessing collective intelligence’ as used by Tim O'Reilly has several problems
associated with it: firstly, what kind of ‘intelligence’ are we referring to? If we equate
‘information’ to ‘intelligence’ then many of his examples stand up to scrutiny. However, if
your understanding of ‘intelligence’ more naturally focuses on the idea of having or showing
some kind of intellectual ability, then the phrase becomes more problematic. O’Reilly
acknowledges this inherently by bringing in the concept of ‘the wisdom of crowds’ (WoC),
but this, in turn, brings its own set of problems (see below).
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
16
Related to this is the problem of what we mean by ‘collective intelligence’. Again, the WoC
ideas are drafted in by O’Reilly to try to help with this, but there is a critical gap between the
explication of ‘wisdom of crowds’ in its original form, as expressed by James Surowiecki,
and its application to Web 2.0 issues, that should give us cause to pause for thought.
3.2.1 The Wisdom of Crowds
The Wisdom of Crowds is the title of a book written by James Surowiecki, a columnist for the
New Yorker. In it, he outlines three different types of problem (which he calls cognition, coordination
and co-operation), and demonstrates how they can be solved more effectively by
groups operating according to specific conditions, than even the most intelligent individual
member of that group. It is important to note that although Surowiecki provides caveats on
the limitations to his ideas, the book's subtitle (‘why the many are smarter than the few and
how collective wisdom shapes business, economies, societies, and nations’) tends to gloss
over some of the subtleties of his arguments. The book has been very influential on Web 2.0-
style thinking, and several writers have adapted Surowiecki’s ideas to fit their observations on
Web and Internet-based activities.
An example of one of the ways in which WoC has been adapted for Web 2.0 is provided by
Tim O’Reilly in his original paper (2005a). He uses the example of Cloudmark, a
collaborative spam filtering system, which aggregates ‘the individual decisions of email users
about what is and is not spam, outperforming systems that rely on analysis of the messages
themselves’
as a type of cognitive decision making process, or what fans of the TV show
a millionaire
collectively, the ‘crowd’ is more likely to come up with ‘the right answer’, in certain
situations, than any one individual. The Cloudmark system implements an architecture of
participation to harness this type of distributed human intelligence.
This is a fairly unproblematic application of Surowiecki’s ideas to the Internet, but some of
the wider claims are potentially more difficult to reconcile. Whilst a detailed examination of
the issue is beyond the scope of this report, it is important to note that some examples that
supposedly demonstrate the connective forces of WoC to Web 2.0 are really closer to
collaborative production or crowdsourcing (see below) than collective ‘wisdom’. As
Suroweicki does not use the Web to demonstrate his concepts (although he has gone on
record as saying that ‘the Web is 'structurally congenial' to the wisdom of crowds’
difficult to objectively establish how far it should be used for understanding Web 2.0 and
therefore used as an accurate tool for benchmarking how ‘Web 2.0’ a company might be.
However, regardless of this, the way in which WoC is generally understood reinforces a
powerful
of the interesting things about the power of this idea is the implication it may have for the
traditional ways in which universities are perceived to accumulate status as ‘knowers’ and
how knowledge can legitimately be seen to be ‘acquired’.
(p. 2). What this kind of system demonstrates is what Surowiecki would describeWho wants to bewould call ‘ask the audience’. It is the idea that, by acting independently, but15) it iszeitgeist and may therefore discourage a deep level of critical thinking. In fact, one
3.2.2 Crowdsourcing: the rise of the amateur
The term
of Web-based out-sourcing for the procurement of media content, small tasks, even solutions
to scientific problems from the crowd gathered on the Internet. At its simplest level,
crowdsourcing builds on the popularity of multimedia sharing websites such as Flickr and
YouTube to create a second generation of websites where UGC is made available for re-use.
ShutterStock, iStockphoto and Fotolia are examples of Web-based, stock photo or video
crowdsourcing was coined by Wired journalist Jeff Howe to conceptualise a process
15
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
http://www.msnbc.msn.com/id/12015774/site/newsweek/page/2/ [last accessed 14/02/07].
17
agencies that act as intermediaries between amateur content producers and anyone wanting to
use their material. These amateur producers are often content with little or no fee for their
work, taking pride, instead, from the inherent seal of approval that comes with being
‘chosen’.
This type of crowdsourcing has been chipping away at the edges of the creative professions
for a while now. Photographers in particular have started to feel the pinch as websites make it
increasingly difficult for professionals to find a market for their work. Whilst the quality of
the images may vary considerably (it is often only good enough for low-end brochures and
websites) purchasers are often not able to see the poor quality or just don't care.
At the other end of the spectrum Howe demonstrates how, over the last five years or so,
companies such as InnoCentive and YourEncore have been using their websites to match
independent scientists and amateur or retired researchers with their clients’ R&D
development challenges. The individual who comes up with the solution to a particular
unsolved R&D problem receives a ‘prize’ that runs to tens of thousands of dollars.
More recently, Canadian start-up company Cambrian House has taken the crowdsourcing
model and experimented with open source software-type development models to create a
model that is more closely aligned to the WoC ideal. In the Cambrian House model, members
of the crowd suggest ideas that are then voted on (again, by ‘the crowd’) in order to decide
which ones should go forward for development. This model not only sources ideas and
innovations from the crowd, but also uses them to select the idea that will be the most
successful, accepting that, collectively, the decision of the crowd will be stronger than any
one individual's decision.
3.2.3 Folksonomies: individuals acting individually yet producing a collective result.
The term
whose ideas on what a folksonomy is stem, in part, from his experience of building taxonomy
systems in commercial environments and finding that successful retrieval was often poor
because users could not ‘guess’ the ‘right’ keyword to use. He has, however, expressed
concern in the recent past about the way the term has been mis-applied and his definition,
taken from a recent blog posting, attempted to clarify some of the issues:
'Folksonomy is the result of personal free tagging of information and objects (anything
with a URL) for one's own retrival [
(shared and open to others).
information.
VanderWal, 2005, blog entry.
Although folksonomy tagging is done in a social environment (shared and open) Vander Wal
emphasises that it is not collaborative and it is not a form of categorisation. He makes the
point that tagging done by one person on behalf of another ('in the Internet space' is implied
here) is not folksonomy
their own vocabulary in order to add explicit meaning to the information or object they are
consuming (either as a user or producer): 'The people are not so much categorizing as
providing a means to connect items and to provide their meaning in their own understanding.'
(Vander Wal, 2005). By aggregating the results of folksonomy production it is possible to see
how additional value can be created.
folksonomy is generally acknowledged to have been coined by Thomas Vander Wal,sic]. The tagging is done in a social environmentThe act of tagging is done by the person consuming the' [my italics].16 and that the value of a folksonomy is derived from people using
16
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
he describes this as 'social tagging'
18
Vander Wal states that the value of a folksonomy is derived from three key data elements: the
person tagging, the object being tagged (as an entity), and the tag being attached to that
object. From these three data elements you only need two in order to find the third. He
provides an example from del.icio.us which demonstrates that if you know the object's URL
(i.e. a webpage) and have a tag for that webpage, you can find other individuals that use the
same tag on that particular object (sometimes known as 'pivot browsing'). This can then
potentially lead to finding another person who has similar interests or shares a similar
vocabulary, and this is one of Vander Wal's key points concerning what he considers to be the
value of folksonomy over taxonomy: that groups of people with a similar vocabulary can
function as a kind of 'human filter' for each other.
Another key feature of folksonomy is that tags are generated again and again, so that it is
possible to make sense of emerging trends of interest. It is the large number of people
contributing that leads to opportunities to discern contextual information when the tags are
aggregated (Owen
unconstrained tagging, in the overall context of the development of hypertext, as 'feral
hypertext': 'These links are not paths cleared by the professional trail-blazers Vannevar Bush
dreamed of, they are more like sheep paths in the mountains, paths that have formed over
time as many animals and people just happened to use them' (Walker, 2005, p. 3).
et al., 2006), a wisdom of crowds-type scenario. One author describes such
3.3 Data on an epic scale
‘Information gently but relentlessly drizzles down on us in an invisible, impalpable electric
rain’
von Baeyer, 2003, p.3
In the Information Age we generate and make use of ever-increasing amounts of data. Some
commentators fear that this
feel that they offer a way out of this, and in the emerging Web 2.0 universe, data, and lots of
it, is profoundly important. Von Baeyer’s invisible rain is captured by Web 2.0 companies
and turned into mighty rivers of information. Rivers that can be fished.
In his original piece on the emergence of Web 2.0, Tim O’Reilly (2005a) discusses the role
that data and its management has played with companies like Google, arguing that for those
services, ‘the value of the software is proportional to the scale and dynamism of the data it
helps to manage’ (p. 3). These are companies that have database management and networking
as core competencies and who have developed the ability to collect and manage this data on
an epic scale.
A recent article in Wired magazine emphasised the staggering scale of the data processing
and collection efforts of Google when it reported on the company’s plans to build a huge new
server farm in Oregon, USA, near cheap hydro-electric power supplies once used to smelt
aluminium (Gilder, 2006). Google now has a total database measured in hundreds of petabytes
datafication is causing us to drown. Many Web 2.0 companies17
which is swelled each day by terabytes of new information. This is the network effect
working at full tilt.
Much of this is collected indirectly from users and aggregated as a side effect of the ordinary
use of major Internet services and applications such as Google, Amazon and Ebay. In a sense
these services are ‘learning’ every time they are used. As one example, Amazon will record
your book buying choices, combine this with millions of other choices and then mine and sift
this data to help provide targeted recommendations. Anderson (2006) calls these companies
17
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
10 to power 15 (a million, billion)
19
long tail
them do’ (p. 57).
This data is also made available to developers, who can recombine it in new ways. Lashing
together applications that take rivulets of information from a variety of Web 2.0 sources has
its own term—a
HousingMaps.com combined Google Maps (an online mapping service) with the USA-based
CraigsList of flats available for rent. These kinds of mash-ups are facilitated by what are
known as ‘open APIs’–Application Programming Interfaces (see section 4.5).
Much as these services have made life easier on the Web (who can imagine life without
Google now?) there is a darker side. Who owns this data? Increasingly, data is seen as
something – a resource – that can be repurposed, reformatted and reused. But what are the
privacy implications? Google’s mission is ‘to organise the world’s information’ and in part
this means
process of freeing data, in a process of exposure and reformatting, through techniques like
open APIs and mash-ups (Miller, 2005, p. 1). Others are not so sure. Tim O’Reilly makes a
telling point: ‘the race is on to own certain classes of core data: location, identity, calendaring
of public events, product identifiers and namespaces’ (2005a, p. 3). Brown and Duguid
(2000) argue that the mass dis-intermediation of the Web is actually leading to centralization.
aggregators who ‘tap consumer wisdom collectively by watching what millions ofmash-up. As an early, oft-quoted example, Paul Rademacher’syours. There is a tension here. Some argue that a key component of Web 2.0 is the
3.4 Architecture of Participation
This is a subtle concept, expressing something more than, and indeed building on, the ideas of
collaboration and user production/generated content. The key to understanding it is to give
equal weight to both words
most basic level, this means that the way a service is actually designed can improve and
facilitate mass user participation (i.e. low barriers to use).
At a more sophisticated level, the architecture of participation occurs when, through normal
use of an application or service, the service itself gets better. To the user, this appears to be a
side effect of using the service, but in fact, the system has been designed to take the user
interactions and utilise them to improve itself (e.g. Google search).
It is described in Tim O’Reilly’s original paper (2005a) in an attempt to explain the
importance of the decentralised way in which Bit Torrent works i.e. that it is the network of
downloaders that provides both the bandwidth and data to other users so that the more people
participate, the more resources are available to other users on the network. O’Reilly
concludes: ‘BitTorrent thus demonstrates a key Web 2.0 principle:
gets better the more people use it
ethic of cooperation, in which the service acts primarily as an intelligent broker, connecting
the edges to each other and harnessing the power of the users themselves.’ (p. 2).
18: this is about architecture as much as participation, and at thethe service automatically. There’s an implicit ‘architecture of participation’, a built-in
3.4.1 Participation and openness.
This concept pre-dates discussions about Web 2.0, having its roots in open source software
development communities. Such communities organise themselves so that there are lowered
barriers to participation and a real market for new ideas and suggestions that are adopted by
popular acclamation (O’Reilly, 2003). The same argument applies to Web-based services.
The most successful seem to be, the argument goes, those that encourage mass participation
and provide an architecture (easy-of-use, handy tools etc.) that lowers the barriers to
18
participation to a simple blurring of the lines between producers and consumers.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
Indeed, Chris Anderson, in The Long Tail, seems to get a little confused, equating the architecture of
20
participation. As a Web 2.0 concept, this idea of opening up goes beyond the open source
software idea of opening up code to developers, to opening up content production to all users
and exposing data for re-use and combination in so-called ‘mash-ups’.
3.5 Network effects, power laws and the Long Tail
‘Think deeply about the way the internet works, and build systems and applications that use it
more richly, freed from the constraints of PC-era thinking, and you're well on your way.’
Tim O'Reilly, O’Reilly Radar, 10
The Web is a network of interlinked nodes (HTML documents linked by hypertext) and is
itself built upon the technologies and protocols of the Internet (TCP/IP, routers, servers etc.)
which form a telecommunications network. There are over a billion people online and as
these technologies mature and we become aware of their size and scale, the implications of
working with these kinds of networks are beginning to be explored in detail. Understanding
the topology of the Web and the Internet, its shape and interconnectedness, becomes
important.
There are two key concepts which have a bearing on a discussion of the implications of Web
2.0. The first is to do with the size of the Internet or Web as a network, or, more precisely, the
economic and social implications of adding new users to a service based on the Internet. This
is known as the Network Effect. The second concept is the power law and its implications for
the Web, and this leads us into a discussion of the Long Tail phenomenon. At the heart of
Tim O’Reilly’s comment about the importance of the Internet as a network is the belief that
understanding these effects and the sheer scale of the network involved, and working ‘with
the grain’, will help to define who the Web 2.0 winners and losers will be.
th Dec 2006.
3.5.1 The Network Effect
The Network Effect is a general economic term used to describe the increase in value to the
existing users of a service in which there is some form of interaction with others, as more and
more people start to use it (Klemperer, 2006; Liebowitz and Margolis, 1994). It is most
commonly used when describing the extent of the increase in usefulness of a telecoms system
as more and more users join. When a new telephone user joins the network, not only do they
as an individual benefit, but the existing users also benefit indirectly since they can now ring
a new number and speak to someone they couldn’t speak to before
confined to telecoms and are, for example, widely referred to in relation to technology
products and their markets. There is an obvious parallel with the development of social
software technologies such as MySpace—as a new person joins a social networking site,
other users of the site also benefit. Once the Network Effect begins to build and people
become aware of the increase in a service’s popularity, a product often takes off very rapidly
in a marketplace.
However, this can also lead to people becoming ‘locked in’ to a product. A widely cited
example is the great commercial success of Microsoft Office. As more and more people made
use of Office (because other people did, which meant that they could share documents with an
increasingly larger number of people), so it became much harder to switch to another product
as this would decrease the number of people one could share a document with.
19. Such discussions are not
19
There are many subtleties to network effects and interested readers are pointed to:http://oz.stern.nyu.edu/io/network.html [last accessed 15/01/07].







JISC Technology and Standards Watch, Feb. 2007 Web 2.0
21
One of the implications of the network effect and subsequent lock-in to technology products
is that an inferior product can sometimes be widely, or even universally, adopted, and the
early momentum that developed behind VHS as a video format (over Betamax) is an example
that is often cited. Although economists provide much nuanced argument as to the details of
this (Liebowitz and Margolis, 1994) it is a powerful driver within technology marketing as it
is believed that a new product is more likely to be successful in the long-term if it gains
traction and momentum through early adoption. This has led to intense competition at the
early adopter phase of the innovation demand curve (Farrel and Klemperer, 2006) where
social phenomena such as ‘word of mouth’ and ‘tipping point’ and the human tendency to
‘herd’ with others play an important role (Klemperer, 2006).
As the Internet is, at heart, a telecommunications network, it is therefore subject to the
network effect. In Web 2.0, new software services are being made available which, due to
their social nature, rely a great deal on the network effect for their adoption. Indeed, it could
be argued that their
access to as many other young people as possible in order to find new friends with shared
interests? Educationalists should bear this in mind when reviewing new or proposed Web 2.0
services and their potential role in educational settings. As one lecturer recently found out, it
is easier to join with the herd and discuss this week’s coursework online within FaceBook (a
popular social networking site) than to try and get the students to move across to the
institutional VLE. There are also implications for those involved in the framing of technology
standards (Farrel and Klemperer, 2006), where the need for interoperability is important in
order to avoid forms of lock-in.
raison d'être is the network effect: why join MySpace unless it is to have
3.5.2 How big is the network effect?: the problem with Metcalfe's Law
How big is the network effect? Can we put a finger on the scale of its operation? The scale of
the effect is important because this may have a bearing on the way the architectures of Webbased
systems are designed and, in part, because discussions over the business models for
new technologies that are developed on the basis of Web 2.0 ideas, see these network effects
as important.
It is popularly believed that Robert Metcalfe (the inventor of Ethernet) proposed, in the early
1970s, a network effect argument whereby growth in the value of a telecommunications
network, such as the Internet, is proportional to n (the number of users) squared (i.e. n
Metcalfe’s original idea was simply to conceptualise the notion that although the costs of a
telecoms network rise linearly (a straight line on the graph), the ‘value’ to customers rises by
n
which means that a critical mass has been achieved.
Although this was originally intended as a rough empirical formulation rather than a hard
physical law it was subsequently described as such (‘Metcalfe’s Law’) in 1993 by George
Gilder, a technology journalist, who was influential during the dot-com boom of the 1990s.
However, recent research work has undermined this and subsequent theories that built on top
of it. Briscoe
value of a network of size n grows in proportion to n log(n)’ (p. 2). A growth of this scale,
whilst large, is much more modest than that attributed to Metcalfe. Briscoe
that: ‘much of the difference between the artificial values of the dot-com era and the genuine
value created by the Internet can be explained by the difference between the Metcalfe-fuelled
optimism of n
2)20.2 and therefore at some point there is a cross-over at which value will easily surpass costs,et al. (2006) argue that these formulations are actually incorrect and that: ‘theet al. further argue2 and the more sober reality of n log(n)’ (p. 2).
20
to by telephone), therefore the total value, it is argued, is n(n-1), which is roughly n
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
A communications network with n users means that each can make (n-1) connections (i.e. place calls2.
22
It is important to appreciate how deeply entrenched Metcalfe’s ideas have become. Long after
the boom and bust the idea that there are ‘special effects’ at work on the Internet driven by the
scale and topology
by sociologists to be one of the defining characteristics of the information technology
revolution or paradigm (Castells, 2000
commentators’ fears of an emerging technology ‘Bubble 2.0’ are founded.
So why is the network effect likely to be proportional to n log(n)? The key to understanding
this is to be aware that the term ‘value’ has been identified by Briscoe
nebulous term. What does it mean to say that the value (to me) of the telecommunications
network has increased when one new person becomes a new subscriber to the telephone
system or another website is added to the Web? To understand this we must delve into the
shape of the Web and become aware of the role of power laws operating on it.
21 of the network remains powerful, and indeed the formula is considered22). In terms of Web 2.0 this will matter again ifet al. as a rather
3.5.3 What shape is the Web?: the role of Power Laws
In addition to the physical network effects of the telecoms-based Internet, there are also Webspecific
network effects at work due to the linking that takes place between pieces of Web
content: every time users make contributions through blogs or use services that aggregate
data, the network effect deepens. This network effect is driving the continual improvement of
Web 2.0 services and applications as part of the architecture of participation.
In the previous section we saw how Briscoe
Network Effect was proportional to n log(n) rather than Metcalfe’s n
quantitatively justified by thinking about the role of ‘value’ in the network: adding a new
person to the network does not provide each and every other person on the network with a
single unit of additional value. The additional value varies depending on what use an existing
individual might make of the new one (as an example, some of your email contacts are many
times more useful to you than the rest). As this
distribution, with a long tail, it can be shown mathematically that the network effect is
proportional to n log(n) rather than n
A power law distribution is represented by a continuously decreasing curve that is
characterised by ‘a very small number of very high-yield events (like the number of words
that have an enormously high probability of appearing in a randomly chosen sentence, like
'the' or 'to') and a very large number of events that have a very low probability of appearing
(like the probability that the word 'probability' or 'blogosphere' will appear in a randomly
chosen sentence)’ (Benkler, 2006). Such power law distributions have very long ‘tails’ as the
amplitude of a power law approaches, but never quite reaches zero, as the curve stretches out
to infinity
et al. had made the argument that the size of the2. They argue that this isrelative value is dictated by a power law2.23. This is the Long Tail referred to by Chris Anderson (see below).
21
the ‘shape’ and ‘connectedness’ of the network
22
(n-1).
Although there is, I believe, an error on page 71, where he describes the formula as n to the power of
23
values of x, and k is the power to which x is raised – the exponent. In the graph the k
measure a frequency of about 1/k
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
Formally, a power law is an unequal distribution of the form y=axk where a is a constant for largeth ranked item willth of the first.
23
Figure 1: The Long Tail
The history of research on network effects and Web topology shows that the network effect
formula is not the only facet of life on the Internet and the Web that follows a power law
distribution. In fact, the shape of the Web (the way in which hypertext materials are linked)
and the connection patterns of Internet routers themselves also follow a power law
distribution.
3.5.4 The Long Tail
The Long Tail is the title of a book by
sets out to demonstrate the economic and social implications of the fact that the distribution
of many facets of life on the Web is unequal and follows a power law. It transpires that not
only do the physical interconnectedness of the Internet and the virtual interconnectedness of
hypertext links follow a power law distribution, but, also, that many facets of the actual
interaction that comes about through using tools that utilise these, also follows such a
distribution pattern.
To help understand this concept, Anderson provides an example from the process of selling
music albums to explain this process in the context of retailing on the Web. If one maps the
number of albums sold in a particular week – the frequency – against the name of the album,
it will be possible to see that the left hand side of the graph is dominated by huge sales of the
popular, chart-listed albums receiving radio air-play. Often, but not always, these will be the
newest albums. As one moves towards the right of the graph sales drop off dramatically,
roughly according to the power law curve described above (i.e. the second highest seller will
sell half the number of albums of the first). The curve continues falling away to the right,
following the 1/n rule, but, and this is the crucial point outlined by Chris Anderson, only
there is no artificial barrier to people buying less popular albums
things like physical shelf space, which is limited and expensive, which means that only the
most popular albums, or those receiving the most promotion, are stocked in shops. In a digital
environment, there is no real limit to ‘virtual’ shelf space, so there is also no real limit to the
number of albums that can be ‘stocked’. Up until now, the presence of artificial barriers has
cloaked the extent of the long tail.
Wired Editor, Chris Anderson (2006). In it, Andersonif. Artificial barriers include
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
24
Towards the end of the long tail the sales become smaller and smaller, in fact, tending
towards zero. However, what economists have noticed is that for sales of albums, books and
other artefacts, even the most unpopular items do have some sales. These are the niches at the
far end of the tail. What has excited economists and business analysts is that the total sales at
the lower reaches of the tail, although the items are individually unpopular, add up to a
substantial amount (the area under the graph). According to Anderson, in traditional retail,
new albums account for 63% of sales [in 2005], but online that percentage is reversed (36%
of sales). It is therefore obvious how Amazon has used the long tail to astonishing effect.
Wikipedia, too, is an excellent demonstrator of the concept as it contains tens of thousands
more entries than any published, book-based encyclopaedia could ever hope to collate.
3.5.5 The Implications of Web topology
Why does this matter? What are the implications of these two topological ‘rules’ with regard
to the developing Web 2.0 agenda? Understanding the shape of the Web and the implications
of power law distribution has important implications in general for making use of the Web
and the development of Internet-based technologies. It also has ramifications for debates
about the role and direction of Web 2.0 technologies, in which social connections between
people are a key part of the mix.
Firstly, there are implications from the development of the long tail. Chris Anderson argues
that we are moving towards a culture and economy where the huge number of people
participating in the niches in the tail really matters. Specialism and niche interests,
personalisation and fragmentation are all potentially driven by the march rightwards on the
graph. One of the forces driving this is the ‘democratization’ of the tools of production—the
number of albums released in 2005 increased by 36% but 300,000 free tracks, many of which
were produced by amateurs, were uploaded to MySpace, demonstrating the fact that ‘We are
starting to shift from being passive consumers to active producers’ (Anderson, 2006, p. 63)
and developing towards a culture which writer Doc Searls
Secondly, what does topology tell us about the shape of what might be called our
‘information environment’? How does this impact on the diffusion of new knowledge and the
sociology of new content creation? In the Web 2.0 era in which blogs and wikis are an
important part of the mix, much is made of the Internet ‘conversation’ afforded, particularly
by the rise of the blogosphere. What does our emerging knowledge on the shape of the Web
(its topology) tell us about the state of this conversation? Does the blogosphere actually work
as a coherent Internet-based cultural conversation? Or is it, as some fear, a case of when
everyone can speak, no-one can be heard
conversations reduces the Web to mush.
These are the kinds of questions that Yochai Benkler attempts to tackle in his book,
Wealth of Networks
is an increasingly important tool in the dissemination of new ideas and because blogs form
powerful social community-building tools. To some, this may sound like history repeating
itself with echoes, for example, of past debates about Web portals concentrating power and
debate in much the same way as ‘old’ media. But in fact, it is quite different.
Benkler’s point is that the topology of the Web and the links and connections that form the
conversation within the blogosphere is such that the system forms a kind of active filtration
process. This means that although individually most blogs should be taken with a pinch of
salt, collectively, they provide a mechanism ‘for topically related and interest-based clusters
24 calls producerism.25, in which an uncontrolled mish-mash ofThe(2006). He argues that we need an analysis of the blogosphere because it
24
Doc Searls blog: http://doc.weblogs.com/2006/01/15 [last accessed 14/02/07].
25
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
What Benkler (2006) calls the Babel objection (p.10)
25
to form a peer-reviewed system of filtering, accreditation, and salience generation’ (p. 252).
He believes that this is proving more than an equal to mainstream media and that that while
the Internet, Web and blogosphere may not be a communications utopia, it is a considerable
improvement, from the point of view of political, cultural and public engagement and
understanding, than traditional mass media.
Such an analysis has been made possible through a deepening understanding of the structure
of information on the Web. Although the deeper subtleties of Benkler's arguments are beyond
the scope of this report, and whilst you might not agree with the conclusions of his analysis as
summarised here, it is wise to be aware of the context of these debates and the importance of
the Web’s topology to their discussion.
3.6 Openness
The development of the Web has seen a wide range of legal, regulatory, political and cultural
developments surrounding the control, access and rights of digital content. However, the Web
has also always had a strong tradition of working in an open fashion and this is also a
powerful force in Web 2.0: working with open standards, using open source software, making
use of free data, re-using data and working in a spirit of open innovation. An important
technology in the development of Web 2.0 has been the open source Firefox browser and its
system of extensible plug-ins which allow experimentation. Readers with an interest in
exploring open source in general are referred to the JISC-funded OSSWatch service hosted at
the University of Oxford
26.
3.6.1 Expose the Data
In general, Web 2.0 places an emphasis on making use of the information in the vast
databases that the services help to populate. There is a parallel trend towards opening the
stores of data that have been collected by public sector agencies using taxpayers' money.
Readers will no doubt be aware of the wide-ranging debate within the academic and
publishing communities over open access to scientific and humanities research and the role of
journals in this regard, and this is not unconnected to moves within Higher Education and the
research community to expose experimental data (Frey, 2006).
However, the apparent drive towards openness has to be tempered by the ‘epic scale of data’
that is being collected and aggregated, in non-standard ways, by commercial companies.
There needs to be continual focus on open data exchange and the adoption of open standards.
As Tim O’Reilly said when speaking to the Open Business forum (2006a): ‘The real lesson is
that the power may not actually be in the data itself but rather in the control of access to that
data. Google doesn’t have any raw data that the Web itself doesn’t have, but they have added
intelligence to that data which makes it easier to find things.’
The sharing of data is an issue within Web 2.0. Lawence Lessig recently noted the difference
between 'true' sharing and 'fake' sharing, using YouTube (now Google) as an example: ‘But
never does the system give users an easy way to actually get the content someone else has
uploaded’ (Lessig, 2006). Other services are more forgiving, for example, Backpack and
Wordpress both allow user data to be exported as an XML text file.
3.6.2 Open APIs.
For this discussion see the technology section.
26
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
http://www.oss-watch.ac.uk [last accessed 14/02/07].
26
3.6.3 IPR
Web 2.0, like open source software, is starting to have an effect on intellectual property rights
(IPR) and how they are perceived. One obvious example is the role of copyright. As Chris
Anderson points out, the influx of ‘creators’ at the far end of the tail, who do not rely on
being paid for their content, are choosing to give up some of their copyright protections. At
the same time the scale and reach of Web 2.0 aggregators means that such systems may be
republishing material for which the process of assigning the rights has been obscured: the
Times Higher recently reported how UK academics had unwittingly stumbled across their
own scholarly outputs available for sale on Amazon for a few dollars. Other examples include
the uploading of copyright protected material to YouTube and other services.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
27
4. Technology and standards
‘The goal? To help us more easily develop the next generation of Web applications that are
every bit as good as or better than desktop PC applications.’
Dion Hinchcliffe, blog post, 11
One of the key drivers of the development of Web 2.0 is the emergence of a new generation
of Web-related technologies and standards. This has been underpinned by the powerful,
though not particularly new, idea of the
applications ran on the user’s machine, handled by a desktop operating system such as
MacOS, Windows or Linux, under the Web as platform, umbrella software services are run
within the actual window of the browser, communicating with the network and remote
servers.
One consequence of the Web as platform is that there is less emphasis on the software (as a
package: licensed and distributed) and far more on an application providing a service. The
corollary of this is that there is much less emphasis on the release of software and, indeed,
many well known Web 2.0 services remain in a kind of ‘perpetual beta’.
So why has the idea of the Web as platform become more feasible now? The answer is that
browser technology has moved on to a new stage in its development with the introduction of
what are known as Rich Internet Applications (RIA)
delivering RIAs is Ajax, but there are some alternatives which are mainly based on Flash
technology.
th Sept. 2006.Web as platform27. Whereas in the past, software28. Currently the main technology for
N.B
moved on to the idea of the network as platform. This is especially important for another one
of his key ideas: software above the level of a single device. O’Reilly cites iTunes and TiVo
as exemplars of this approach as, although not Web applications themselves, they leverage it
as part of their infrastructure.
Tim O’Reilly’s conceptualisation of Web technology with respect to Web 2.0 has since
4.1 Ajax
The delivery of Web 2.0 applications and services has been driven by the widespread
adoption of one particular group of technologies which are referred to as Ajax –
Asynchronous Javascript + XML – a term first coined by Jesse James Garrett (Johnson, 2005;
Garrett, 2005). As a term, Ajax attempts to capture both an approach to working with the
Web and the use of a specific range of technologies.
One of the big frustrations for users of traditional HTML-based websites is the time spent
waiting for pages to reload and refresh after the user has chosen an option or clicked on a
hypertext link. Several attempts have been made over the years to improve the dynamism of
webpages through individual techniques such as Javascript, hidden frames, Dynamic HTML
(DHTML), CSS and Microsoft’s XMLHttpRequest ActiveX tool. However, it is really only
27
browsers back in the 1990s, but eventually succumbed to competition from Microsoft, who had a
vested interest in maintaining the
controversy (see, for example, Auletta, 2001, for further details). O’Reilly (2005a) argues that the next
phase will be between Windows/the desktop paradigm ‘the pinnacle of proprietary control’ and the
open platform of the Web, and that ‘battle is no longer unequal, a platform versus a single application,
but platform versus platform, with the question being which platform, and more profoundly, which
architecture, and which business model, is better suited to the opportunity ahead’ (p. 2).
This idea was pioneered by Netscape, the company that developed one of the first successful Webstatus quo. This ‘competition’ was not without considerable
28
For an example of the sophistication and power of these types of interfaces see the Flex demo at:
http://examples.adobe.com/flex2/inproduct/sdk/dashboard/dashboard.html
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
[last accessed 14/02/07].
28
with the introduction of Ajax that this has come together successfully. With Ajax, only small
amounts of information pass to and from the server once the page has first been loaded. This
allows a
impression of richer, more 'natural' applications with the kind of responsive interfaces that are
commonly found in desktop applications (Google calendar is a good example of this).
Although Ajax is a group of technologies (see
sidebar), the core is the Ajax engine, which acts as
an intermediary, sitting within the client’s browser
and facilitating asynchronous communication with
the server of smaller items of information. So, if a
webpage contains a lot of text, plus, as a side-bar, a
graph of the current stock price of the company
being written about, this graph can be
asynchronously updated in real-time without the
whole page being reloaded every few seconds. The
Ajax engine processes every action that would
normally result in a trip back to the server for a page
reload, before making any really necessary referrals
back to the server.
Ajax relies heavily on JavaScript and XML being
accurately and efficiently handled by the browser.
The need for browsers to adhere to existing standards
is therefore becoming an important issue (Johnson,
2005). There is also an emerging debate with regard
to the adoption of emerging standards. For example
there is a debate over standards for the user interface
for Ajax-style applications. Mozilla, for example, is
committed to the XML User Interface (XUL)
standard
Extensible Application Markup Language
(XAML)
The Ajax technologies:
portion of a webpage to be dynamically reloaded in real-time and creates the29 whereas Microsoft are standing by their30.
-
way of presenting
information within the
browser)
HTML/XHTML (a standardsbased
-
CSS
-
(DOM) (a way of
dynamically controlling the
document)
Document Object Model
-
manipulation)
XML (data interchange and
-
manipulation)
XSLT (data interchange and
-
(asynchronous data retrieval
from the server)
XMLHttpRequest31
-
A detailed overview of Ajax and its application in Web 2.0 services is provided by the Open
Ajax group:
Javascript (or ECMA script)http://www.openajax.org/whitepaper.html [last accessed 14/02/07].
4.2 Alternatives to Ajax
There are alternatives to Ajax, the most important of which make use of Flash—the
ubiquitous graphics plug-in from Macromedia (now Adobe) that first appeared in the 1990s.
It allowed sophisticated, but quick-to-download, vector graphics and animation to be
displayed in the browser window. Flash requires a browser plug-in to work, although within
only a few years of its launch 99% of computers had the necessary addition to support it.
Flash is still being used to deliver compelling content within the browser (in fact the Flash
video player is beginning to take off because YouTube have adopted it). It has been used as
29
A mark-up language for user interface graphics. See: http://www.xulplanet.com/
30
http://msdn2.microsoft.com/en-us/library/ms752059.aspx [last accessed 14/02/07].
31
interface that allows data to be transferred from the client to the server and vice versa, while the user
continues to interact with the webpage. See:
1017-ajax.html?page=2
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
XMLHttpRequest object is implemented in most popular Web browsers and presents a simplehttp://www.javaworld.com/javaworld/jw-10-2005/jw-[last accessed 14/02/07].
29
the basis of other RIA development tools, including Adobe’s Flex and OpenLaszlo.
Developers in HE/FE might be particularly keen on OpenLaszlo as it uses an open source
model: OpenLaszlo programs are written in XML and JavaScript and then transparently
compiled to both Flash and non-proprietary Dynamic HTML.
As well as these Flash-based systems there are several emerging technologies which focus on
displaying rich graphics within the browser window. These include Microsoft’s WPF/E
XBAP, and the related XAML
Mozilla’s XUL; and Ethan Nicholas’s proposed, minimalist Java Browser Edition
(Hinchcliffe, 2006).
The introduction of these alternative RIA technologies is not without controversy and debate
amongst developers. Some of these solutions require the addition of a plug-in to the browsers
and make use of core technology that is proprietary. There is also some concern that the
approach taken by these products is ‘breaking the model of the web’ (Hinchcliffe, 2006 p. 1).
32,33 (all of which feature heavily in the Vista operating system);
4.3 SOAP vs REST: A Web architecture debate
‘At the heart of REST is the idea that the web works precisely because it uses a small number
of verbs applied to a large number of nouns.’
McGrath, 2006.
A further strand in the development of Web technology is the use of what are called
lightweight
coupled
viewed in contrast to the production of more robust Web Services which use what are seen as
the ‘heavyweight’ and rather formal techniques of SOAP and WS-*. This debate is focused as
much on issues of genre and style of programming practice and development techniques as it
is on the mandating of any particular technology, although the use of scripting languages such
as Perl, Python, PHP and Ruby, along with technologies such as RSS, Atom and JSON
one of the favourite ways of (lightweight) working.
Without going into this in too much depth, readers should be aware that these discussions
about style within the Web development community are crystallising around two main
approaches: REST and SOAP. This can be seen in a wider context of a generalised, on-going
debate within technology circles over simplicity vs. sophistication. REST stands for
Representational State Transfer, an
Roy Fielding (Costello, 2005). It is not a standard, but describes an approach for a
client/server, stateless architecture whose most obvious manifestation is the Web and which
provides a simple communications interface using XML and HTTP. Every resource is
identified by a URI and the use of HTTP lets you communicate your intentions through GET,
POST, PUT, and DELETE command requests. SOAP and WS-*, on the other hand, are more
formal and use messaging, complex protocols and Web Services Description Language
(WSDL).
One way of visualising the ensuing debate is provided by Sean McGrath. He describes the
Web as an enormous information space, littered with nouns (that can be located with URIs)
and a small number of verbs (GET, POST etc). Where SOAP is more of a Verb Noun system,
or simplified programming models, which facilitate the creation of loosely34 systems. This flexibility is a source of debate since, the lightweight ‘ideal’ is often35 isarchitectural idea and set of principles first introduced by
32
Windows Presentation Foundation is the graphical subsystem feature of .NET Framework 3.0
33
Extensible Application Markup Language (XAML: pronounced "Zammel")
34
communications techniques that allow for flexibility and for one end to change without affecting the
other.
loosely coupled entities make few assumptions about each other, limit dependencies and employ
35
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
JavaScript Object Notation, see Johnson (2005) for details
30
he argues that SOAP/WSDL allows the creation of too many (irregular) verbs (McGrath,
2006). There is considerable debate between communities of developers over these issues.
4.4 Microformats
Microformats are widely used by Web developers to embed semi-structured semantic
information (i.e. some level of ‘meaning’) within an XHTML webpage (Khare, 2006).
Information based on open data formats (a microformat) is buried within certain XHTML
tags (such as ‘class’ or ‘div’) or attributes (such as ‘rel’ or ‘rev’). The information is not used
by the browser for display or layout purposes but it can be picked up by applications such as
search engines
An example of a microformat is the hCard format which allows personal or organisational
contact information based on the vCard standard to be embedded in a webpage
argue that microformats will have significant benefits for the development of the Web
because they will allow bloggers or website owners to embed information that services and
applications can make use of without the need to go and visit the application’s website and
add the data.
Of course, to a certain extent, Web search engines already do this when they crawl a website
or blog and index the content for other people to locate. Microformats provide additional
information for these kinds of services. As an example, provision of information in the
hListing microformat (which is for small ads) on a blog would allow a small ads service (such
as Craigslist) to automatically find your listing. Future versions of the Firefox browser
(possibly version 3) are likely to incorporate functionality that makes use of microformats in
order to automatically move such data into one’s chosen applications or online services (for
example moving any contact information buried in a webpage into Gmail contacts list)–a
process described as being more ‘information broker’ than browsing (Wagner, 2007). An
illustration from Mozilla shows clearly how this vision fits with the Web as Platform idea
The use of microformats is not without its detractors and debates around this subject tend to
be centred around whether they a) help or hinder the process of moving Web content towards
36.37. Proponents38:
36
See: http://microformats.org/about/ [last accessed 14/02/07].
37
hCard in a webpage, see the tutorial at:
last accessed 14/02/07].
See: http://microformats.org/wiki/hcard. For those interested in the detail of an implementation of ahttp://usabletype.com/weblog/2005/usable-microformats/ [both
38
fundamentalTypes/informationBroker.jpg_large.jpg
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
http://people.mozilla.com/~faaborg/files/20061213-[last accessed 14/02/07].
31
the Semantic Web vision (they are sometimes referred to as the ‘lowercase semantic web’
(Khare and Celik, 2006) and b) have bearing on the on-going and wide-ranging discussions
over the merits or otherwise of the use of lightweight (REST etc.) or heavyweight (SOA etc.)
approaches and solutions.
39)
4.5 Open APIs.
“When I hear the word open used for services and APIs, I cringe, Just because something's
available on the Internet, is it 'open'?”
Brian Behlendorf
An Application Programming Interface (API) provides a mechanism for programmers to
make use of the functionality of a set of modules without having access to the source code.
An API that doesn’t require the programmer to license or pay royalties is often described as
40, in: Prodromou, 2006, p. 4.
open.
the creation of mash-ups of data from various sources.
One way of finding out what APIs are available is to look at the Programmable Web website
(
are doing with them (it recently registered over three hundred). One of the key examples is
the Google Maps API, which allows Web developers to embed maps within their own sites
(
use Google Maps. Amazon has also started to allow access to its database through
Amazon Web Services (AWS
However, there has been considerable debate over what constitutes ‘openness’. Increasingly
the discussions have moved beyond the parameters of open source software
discussing what open means in the context of a Web-based service like Google (O’Reilly,
2006b). Some argue that for a service it is the data rather than the software that needs to be
open and there are those that hold that to be truly open the user’s data should be able to be
moved or taken back by the user at will. Tim Bray, an inventor of XML, argues that a service
claiming to be open must agree that: ‘Any data that you give us, we’ll let you take away
again, without withholding anything, or encoding it in a proprietary format, or claiming any
intellectual-property [
Such ‘open’ APIs have helped Web 2.0 services develop rapidly and have facilitatedhttp://programmableweb.com/), which keeps track of the number of APIs and what peoplehttp://www.google.com/apis/maps/). Programmable Web claims that over 50% of data mashups41) API.per se and intosic] rights whatsoever.’ 42
39
accessed 14/02/07].
For more on this debate see Brian Kelly: http://www.ariadne.ac.uk/issue44/web-focus/#8 [last
40
Foundation.
One of the founding members of the Apache Group, which became the Apache Software
41
9188051?ie=UTF8&node=3435361&no=3435361&me=A36L942TSJ2AJA
http://www.amazon.com/AWS-home-page-Money/b/ref=sc_fe_l_1_3435361_1/002-3264884-[last accessed 14/02/07].
42
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
http://www.tbray.org/ongoing/When/200x/2006/07/28/Open-Data [last accessed 14/02/07].
32
5. Educational and Institutional Issues
There is significant debate over the alleged advantages and disadvantages of incorporating
social software into mainstream education. This is compounded by the fact that there is very
little reliable, original pedagogic research and evaluation evidence and that to date, much of
the actual experimentation using social software within higher education has focused on
particular specialist subject areas or research domains (Fountain, 2005). Indeed, JISC recently
announced an open call to investigate the ways that this technology is being used by staff and
students and identify opportunities for integration with existing institutional IT systems
this section we review some examples of preliminary activity in four areas: learning and
teaching, scholarly research, academic publishing, and libraries.
43. In
5.1 Teaching and learning
One of the most in-depth reviews undertaken in the UK of the potential impact of social
software on education has been carried out by the Nesta-funded FutureLab. Their recent
report,
and discusses them in the context of parallel, developing trends in education. These trends
tend towards more open, personalised approaches in which the formal nature of human
knowledge is under debate and where, within schools and colleges, there is a greater emphasis
on lifelong learning and supporting the development of young people’s skills in creativity and
innovation.
Within higher education, wikis have been used at the University of Arizona's Learning
Technologies Centre to help students on an information studies course who were enrolled
remotely from across the USA. These students worked together to build a wiki-based glossary
of technical terms they learned while on the course (Glogoff, 2006). At the State University
of New York, the Geneseo Collaborative Writing Project deploys wikis for students to work
together to interpret texts, author articles and essays, share ideas, and improve their research
and communication skills collectively
students to reflect and comment on either their work or others. Wiki-style technology has also
been used in a tool developed at Oxford University to support teachers with ‘design for
learning’
Bryan Alexander (2006) describes social bookmarking experiments in some American
educational research establishments and cites Harvard’s H2O as an exemplar project
Alexander also believes that wikis can be useful writing tools that aid composition practice,
and that blogs are particularly useful for allowing students to follow stories over a period of
time and reviewing the changing nature of how they are commented on by various voices. In
these scenarios, education is more like a conversation and learning content is something you
perform some kind of operation on rather than ‘just’ reading it.
In the UK, Warwick University has provided easy to use blogging facilities to allow staff and
students to create their own personal pages. The intention is that the system will have a
Social Software and Learning (Owen et al., 2006), reviews the emerging technologies44. Using wikis in this way provides the opportunity for45.46.
43
14/02/07].
http://www.jisc.ac.uk/fundingopportunities/funding_calls/2007/01/web_2_use.aspx [last accessed
44
14/02/07].
http://node51.cit.geneseo.edu/WIKKI_TEST/mediawiki/index.php/Main_Page [last accessed
45
http://phoebe-app.conted.ox.ac.uk/cgi-bin/trac.cgi/wiki/WikiStart [last accessed 14/02/07].
46
content), which can be tagged and subscribed to as RSS feeds. Playlists can be compiled by anyone and
are published under the Creative Commons. See:
14/01/07].
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
H20 provides for shared playlists (shared lists of readings, blog postings, podcasts and otherhttp://h2o.law.harvard.edu/index.jsp [last accessed
33
variety of education-related uses such as developing essay plans, creating photo galleries and
recording personal development
But these developments are not without debate. Apart from concerns around learner attention
(in an ‘always-on’ environment), identity, the emerging digital divide between those with
access to the necessary equipment and skills and those who do not, there are other, specific,
tensions. While some experts focus on the idea of ‘self production’ to argue that learners find
the process of learning more compelling when they are producers as much as consumers
others argue that the majority of learners are not interested in accessing, manipulating and
broadcasting material. Indeed, there is serious concern that ‘techno-centric’ assumptions will
obscure the fact that many young people are so lacking in motivation to engage with
education that once these new technologies are integrated into the education environment,
they will lose their initial attraction.
It is beyond the limited scope of a TechWatch report to do real justice to the wide-ranging
debate over of the pedagogical issues but it is perhaps important to point out some of the
implications that these issues will have for education in the same way as other sectors:
47.48,
‘social dimension’ of social software. In particular, more work is required in order to
understand the social dimension and this will require us to really ‘get inside the heads
of people who are using these new environments for social interaction’ (Kukulska-
Hulme, 2006, 16:50).
there is a lack of understanding of students’ different learning modes as well as the
questions. If students arrive at colleges and universities steeped in a more socially
networked Web, perhaps firmly entrenched in their own peer and mentoring
communities through systems like MySpace, how will education handle challenges to
established ideas about hierarchy and the production and authentication of
knowledge?
Web 2.0 both provides tools to solve technical problems and presents issues that raise
and provide institutional tools to do so? How will it handle issues such as privacy and
plagiarism when students are developing new social ways of interacting and
working? How will it deal with debates over shared authorship and assessment, the
need to always forge some kind of online consensus, and issues around students'
skills in this kind of shared and often non-linear manner of working, especially
amongst science/engineering students (Fountain, 2005).
One area where this is already having an impact is the development of Virtual Learning
Environments (VLEs). Proponents of institutional VLEs argue that they have the advantage
of any corporate system in that they reflect the organisational reality. In the educational
environment this means that the VLE connects the user to university resources, regulations,
help, and individual, specific content such as modules and assessment. The argument is that
as the system holds this kind of data there is the potential to tailor the interface and the
learning environment (such as type of learning resources, complexity of material etc.) to the
individual, particularly where e-learning is taking place, although so far relatively little use
has been made of, for example, usage statistics of VLEs or tailored content to substantiate
these claims.
However, others now question whether the idea of a Virtual Learning Environment (VLE)
even makes sense in the Web 2.0 world. One Humanities lecturer is reported as having said:
“I found out all my students were looking at the material in the VLE but going straight to
How will this affect education’s own efforts to work in a more collaborative fashion
47
http://www2.warwick.ac.uk/services/its/elab/services/webtools/blogs/about/
48
that social software technologies help to develop this kind of collaborative production.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
Cych (2006) cites the work of Steven Heppel and his ideas on symmetry and participation to argue
34
Facebook
might as well join them and ask them questions in their preferred space.”
49 to use the discussion tools and discuss the material and the lectures. I thought I50
Partly in response to these concerns, there has been research and discussion devoted to the
development of a more personalised version of the VLE concept – PLEs – to make use of the
technologies being developed in order to bring in social software and e-portfolios (Wilson,
2006).
5.2 Scholarly Research
Tim Berners-Lee’s original work to develop the Web was in the context of creating a
collaborative environment for his fellow scientists at CERN and in an age when interdisciplinary
research, cutting across institutional and geographical boundaries, is of increasing
relevance, simple Web tools that provide collaborative working environments are starting to
be used. The open nature of Web 2.0, its easy-to-use support for collaboration and
communities of practice, its ability to handle metadata in a lightweight manner and the nonlinear
nature of some of the technology (what Ted Nelson once called
attractive in the research environment (Rzepa, 2006) and there are four specific technology
areas which have seen uptake and development:
Firstly, folksonomies are starting to be used in scientific research environments. One example
is the CombeChem work at Southampton University which involved the development of a
formal ontology for laboratory work which was derived from a folksonomy based on
established working practices within the laboratory
some debate about the role and applicability of folksonomies within formal knowledge
management environments, not least because of the lack of semantic distinction between the
use of tags. A recent JISC report
reviewed some of the characteristics of ‘social tagging’ systems and the report notes that
‘Few evaluative, systematic studies from professional circles in knowledge organisations,
information science or semantic web communities have appeared to date’ (p. 39). Issues
raised by the JISC report include the obvious lack of any control over the vocabulary at even
the most basic level (for example, word forms – plural or singular – and use of numbers and
transliteration) and goes on to highlight shortcomings related to the absence of rules in the
tagging process, for example, on the granularity or specificity of tags. The main
recommendation of the report is that social tagging should not replace indexing and other
knowledge organisation efforts within HE/FE. There are also specific recommendations (see
pages 40–43) which are beyond the scope of this report.
Some researchers are, however, beginning to investigate whether it could be fruitful to
combine socially created tags with existing, formal ontologies (Al-Khalifa and Davis, 2006).
Tagging does provide for the marking up of objects in environments where controlled
indexing is not taking place, and as the tagging process is strongly 'user-centric', such tagging
can reflect topicality and change very quickly. We are also now starting to see folksonomies
being developed alongside expert vocabularies as a way of enabling comparative study e.g. of
the meaning-making process around artworks
solutions known as
shared vocabulary with help of classification specialists.
intertwingled51) are all52. However, there is, to put it mildly,Terminology services and technology (Tudhope et al., 2006)53. We are also beginning to see compromisecollabulary in which a group of domain users and experts collaborate on a
49
http://www.facebook.com/ A popular social networking site
50
Lawrie Phipps at JISC ALT-C stand.
Comment by attendee at ALT-C, 2006 (anonymous). Taken with thanks from private notes made by
51
to express the deep inter-connectedness and complexity of knowledge.
52
see: http://www.combechem.org/tour.php?tourpage=onto.html [last accessed 14/01/07].
53
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
see the Steve museum project: http://www.steve.museum/ [last accessed 12/01/07].
35
Secondly, although evidence is only anecdotal, blogging seems to be becoming more popular
with researchers of all disciplines in order to engage in peer debate, share early results or seek
help on experimental issues (Skipper, 2006). However, it has had no serious review of its use
in higher education (Placing, 2005). Butler (2005) argues that blogging tends to be used by
younger researchers and that many of these make use of anonymous names to avoid being
tracked back to their institutions. Some disciplines are so fast-moving, or of sufficient public
interest, that this kind of quick publishing is required (Butler cites climate change as one
example).
There has also been a trend towards collective blogs (Varmazis, 2006) such as ScienceBlogs
54
and RealClimate
as well as blog-like, peer-reviewed sites such as Nature Protocols
considerable scope to widen the audience for scientific papers and to assist in the process of
public understanding of science and research (Amsen, 2006). Indeed, Alison Ashlin and
Richard Ladle (2006), argue that scientists need to get involved in the debates that are
generated across the blogosphere where science discussions take place. These tools also have
the potential to facilitate communication between researchers and practitioners who have left
the university environment.
Thirdly, social tagging and bookmarking have also found a role in science (Lund, 2006). An
example of this approach is CiteULike
organise the academic papers they are reading.
Finally, there have also been developments in scientific data mash-ups and the use of Web
Services to link together different collections of experimental data (Swan, 2006). Examples
include AntBase
species, and the USA-based water and environmental observatories project (Liu
This corresponds to moves in recent years to open up experimental data and provide it to
other researchers as part of the process of publication (Frey, 2006) and the Murray-Rust
Research Group is particularly well known for this
integrating research experiment datasets into digital libraries
However, opinion is divided over the extent to which social software tools are being used by
the research community. Declan Butler, for a recent article in Nature (2005), conducted
interviews with researchers working across science disciplines and concluded that social
software applications are not being used as widely as they should in research, and that too
many researchers see the formal publication of journal and other papers as the main means of
communication with each other.
55, in which working scientists communicate with each other and the public,56. These tools provide57 a free service to help academics share, store, and58 and AntWeb, which use Web Services to bring together data on 12,000 antet al., 2007).59. The E-bank project is also looking at60.
5.3 Academic publishing
Speed of communication in fast-moving disciplines is also a benefit offered to academic
publishing, where social software technologies increasingly ‘form a part of the spectrum of
legitimate, accepted and trusted communication mechanisms’ (Swan, 2006, p. 10). Indeed, in
the long run, the Web may become the first stage to publish work, with only the best and most
durable material being published in paper books and journals, and some of this may introduce
a beneficial informality to research (Swan, 2006).
54
http://www.scienceblogs.com/channel/about.php [last accessed 14/01/07].
55
http://www.realclimate.org/index.php/archives/2004/12/about/ [last accessed 14/01/07].
56
http://www.nature.com/nprot/prelaunch/index.html [last accessed 14/01/07].
57
http://www.citeulike.org/
58
http://www.antbase.org/
59
See: http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Main_Page [last accessed 14/02/07].
60
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
http://www.ukoln.ac.uk/projects/ebank-uk/ [last accessed 14/02/07].
36
Such developments are obviously closely tied up with the Open Access debate and the need
to free data in order to provide other researchers with access to that data: these datasets will
need to be open access before they can be mashed. Those involved in the more formal
publishing of research information are actively working on projects that make use of Web 2.0
technologies and ideas. For example, Nature is working on two developments: Open Text
Mining Interface (OTMI) and Connotea, a system which helps researchers organize and share
their references
Some publishers are also experimenting with new methods of a more open peer reviewing
process (Rogers, 2006). Once again, Nature is devoting resources to a system where authors
can choose a 'pre-print' option that posts a paper on the site for anyone to comment on, whilst
in the meantime the usual peer-reviewing processes are going on behind the scenes. Another
website, arXiv
on. In addition, the SPIRE project
dissemination.
61.62, has also been providing pre-publication papers for colleagues to comment63 provides a peer-to-peer system for research
5.4 Libraries, repositories and archiving
As with other aspects of university life the library has not escaped considerable discussion
about the potential change afforded by the introduction of Web 2.0 and social media (Stanley,
2006). Berube (2007) provides a very readable summary of some of the implications for
libraries and there have been debates about how these technologies may change the library, a
process sometimes referred to as ‘Library 2.0’ a term coined by Mike Casey (Miller, 2006).
Proponents argue that new technologies will allow libraries to serve their users in better ways,
emphasise user participation and creativity, and allow them to reach out to new audiences and
to make more efficient use of existing resources. Perhaps the library can also become a place
for the production of knowledge, allowing users to produce as well as consume? Others worry
that the label is a diversion from the age-old task of librarianship.
However, what is interesting about many of these debates is that they are very broad,
sometimes contradictory, and much of the discussion can often be seen in the context of the
wider public debate concerning the operation of public services in a modern, technology-rich
environment in which user expectations have rapidly changed (Crawford, 2006), rather than
Web 2.0
mechanisms and the inter-library loans process (Dempsey, 2006). People worry that library
users expect the level of customer service for inter-library loans to be comparable to
Amazon's, and while this is obviously an important aspect of what Amazon provides, it is not
one of its Web 2.0 features.
This is not to say that there is no genuinely Web 2.0-style thinking going on within the
Library 2.0 debate (for example, in the USA, the Ann Arbor public library online catalogue
utilises borrowers’ data to produce an Amazon-style, ‘readers who borrowed this book, also
borrowed’ display feature
mash-ups to provide a personalised Google homepage with library data streams showing
per se. For example, comparison has been made between Amazon’s book delivery64 and John Blyberg’s Go Go Google Gadget65, which uses data
61
accessed 14/02/07].
See: http://blogs.nature.com/wp/nascent/2006/04/web_20_in_science.html for further details [last
62
http://arxiv.org/
63
http://spire.conted.ox.ac.uk/cgi-bin/trac.cgi [last accessed 28/01/07].
64
scroll to the bottom of the page). See also LibraryThing:
28/01/07].
available at: http://www.aadl.org/cat/seek/record=1028781 [last accessed 28/01/07] (you will need tohttp://www.librarything.com/ [last accessed
65
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
available at: http://www.blyberg.net/2006/08/18/go-go-google-gadget/ [last accessed 28/01/07].
37
popular lendings, items you have checked out, etc.), only that it might be helpful for
librarians, in terms of thinking about the future of libraries, to separate out the Web 2.0 ideas,
services and applications from the technology and more general concerns about ‘user-centred
change’. How, for example, might libraries take part of the ethos of the long tail (everything
has a value that goes beyond how many times it is requested) and not only learn from the way
Amazon has applied it, but perhaps even better it?
This idea is not without precedent, especially in areas where traditional library skills and
processes can be mapped to the development of Web 2.0-style applications and services, and
information retrieval (IR) is an interesting case in point. Mark Hepworth (2007) argues that
tagging is a form of indexing, blog trackbacking is similar to citation analysis, blog-rolling
echoes chaining and RSS syndication feeds can be considered a form of ‘alerting’—all
recognised concepts within discussions of IR. This is not to say that they are necessarily the
same: whereas traditional IR normally works with an index based on a closed collection of
documents, Web searching involves a different type of problem with an enormous scale of
documents/pages, a dynamic document base, huge variety of subject domains and other
factors (Levene, 2006). However, we can say that the thinking and discussion that has taken
place within IR both in traditional systems and more recently in the context of the Web in
general (Gudiva, 1997) will have some bearing on an understanding of Web 2.0 services and
applications. It may even be the case that Web 2.0 ideas and applications can contribute
solutions to some of the recognised existing problems within IR with regard to user behaviour
and usability issues (Hepworth, 2007), and even that the newer Web technologies such as
RIA may be harnessed to help the user or learner to organise and view data or information
more effectively.
Another reason why it may be important to think about the ideas behind Web 2.0 is in the
issue of the archiving and preservation of content generated by Web 2.0-style applications
and services.
5.4.1 Collecting and preserving the Web
‘The goal of a digital preservation system is that the information it contains remains
accessible to users over a long period of time.’
Rosenthal, 2005, section 2.
‘The most threatened documents in modern archives are usually not the oldest, but the
newest.’
Brown and Duguid, 2000 p. 200
The Web is an increasingly important part of our cultural space and for this reason the
archiving of material and the provision of a ‘cultural memory’ is seen as a fundamental
component of library work (Tuck, 2007), and there has been considerable discussion, debate
and research work undertaken in this area (Tuck, 2005a; Lyman, 2002). At the British Library
it is the policy that ‘the longer term aim is to consider web-sites [
collect within an overall collection development policy’ (Tuck, 2005a). However, there are
many issues to consider with regard to the archiving and preservation of digital information
and artefacts in general, and there are also issues which are particularly pertinent to the
archiving and preservation of the Web (Mesanès, 2006). Currently, the only large-scale
preservation effort for the open Web is the Internet Archive
small-scale initiatives that focus on particular areas of content (e.g. the UK Web Archive
sic] as just another format to66, although there are a number of
66
http://www.archive.org/index.php
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
38
Consortium, which focuses on medical, Welsh, cultural and political materials of
significance
Within the UK, the UK Web Archiving Consortium (UKWAC) is engaging with the
technical, standards and IPR related issues for collection and archiving of large scale parts of
the UK Web infrastructure (Tuck, 2005b). This work has included the initial use of archiving
software developed in Australia (Pandas), the development of a Web harvesting management
system (Web Curator Tool) and investigation work into the longer-term adoption new
standards, such as the emerging WARC storage format for Web archiving (Beresford, 2007).
There have also been a number of reports considering the issue of preservation of the Web. In
2003, for example, JISC and the Wellcome Trust prepared a report on general technical and
legal issues (Day, 2003) and UKOLN recently developed a general roadmap for the
development of digital repositories, which should be considered when reviewing the
difficulties of preserving newer Web material (Heery, 2006).
The Day report (2003) outlined two phases to the process of preserving Web content:
collection and archiving. Collection encompasses automatic harvesting (using crawler
technologies); selective preservation, which uses mirror-sites to replicate complete websites
periodically; and asking content owners to deposit their material on a regular basis. Secondly,
there is the process of archiving where a respected institution creates a record of the material
collected and provides access for future users.
However, part of the problem for the process of preservation is that the Web has a number of
issues associated with it which make it a non-trivial problem to develop archiving solutions
(Masanès, 2006; Day, 2003; Lyman, 2002; Kelly 2002). For example:
5.4.1.1 The Web is transient.
The Web is growing very rapidly, is highly distributed but also tightly interconnected (by
hyperlinks) and on a global scale. This makes the overall topology of the Web transient and it
becomes extremely difficult to know what’s ‘out there’—its true scope. In addition, the
average life span of webpages is short: 44 days in Lyman (2002, p. 38) and 75 days in Day
(2006, p. 177). Dealing with this ephemerality is difficult, especially when combined with the
fact that the Web can be considered an active publishing system (Masanès, 2006) in that
content changes frequently and can be combined and aggregated with content from other
information systems.
5.4.1.2 Web technologies are not always conducive to traditional archiving practices.
Problems with archiving the Web are inherently caught up with technology issues. At a very
basic level, as with all digital content, Web content is deeply entangled with or dependent on
technology, protocols and formats. For example, the average page contains links to five
sourced objects such as embedded images or sound files with various formats: GIF, JPEG,
PNG, MPEG etc. (Lyman, 2002). These protocols and formats evolve rapidly and content that
doesn't migrate will quickly become obsolete. In addition, information is always presented
within the context of a graphical look and feel which ‘evokes’ a user experience (Lyman,
2002) and content may even be said to exhibit a ‘behaviour’ (Day, 2006). This varies
according, in part, to the particular browser/plug-in versions in use and it is often argued that
preservation should attempt to retain this context. It is the difference between what Clay
67).
67
http://info.webarchive.org.uk/index.html
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
A consortium of Wellcome, British Library and National Library of Wales
39
Shirky calls ‘preserving the bits’ and ‘preserving essence’
about migrating not only the data but also the manner in which it was presented?
However, technology issues also go much deeper. Web content's
concept in preservation) is not simple. A webpage’s cardinality might be considered to be
one, as it is served by a single Web server and its location is provided by the unique identifier,
the URL. Masanès (2006) argues this means that, in archiving terms, it is more like a work of
art than a book and is subject to similar vulnerabilities, as the server can be removed or
updated at any time. However, this is further complicated by the fact that a webpage's
cardinality can be considered one and it can be many, at the same time. A large, perhaps
almost unlimited, number of visitors can obtain a ‘copy’ of the page for display within their
browser (an
time
permanently depends on its unique source (i.e. the publisher’s server) to exist.
In addition, they way HTTP works poses problems for archiving as it provides information on
a request-by-request basis, file by file. It cannot, unlike FTP, be asked to provide a list of the
whole set of files on a server or directory. This means that there is an extra layer of effort
involved as the extent of a website has to be uncovered before it can be archived. This
problem can be extrapolated to the whole of the Web.
The main method for gathering this information about the extent of a website, either for
search engine indexing or for archiving, is to follow the paths of links from one page to
another (so-called ‘crawling’) and there are two main issues with this:
68. With this in mind, how do we gocardinality69 (an importantinstantiation) and the actual details of the page that is served may well vary each70. This complex cardinality is an issue for preservation in that it means that a webpage
Robots Exclusion Protocol (Levene, 2006). These notices issue instructions about the
manner in which crawling can be carried out and might, for example, restrict which
parts of a site can be visited or impose conditions as to how often a crawl can be
carried out.
Websites can issue ‘politeness’ notices (in robots.txt files on the server) using the
pages or even whole websites un-archived. There are two main reasons for this:
Robot crawlers may not actually reach all parts of the Web and this leaves some
o
some websites are never linked to anything else
o
kept behind password-protected front-ends or is buried in databases in what
is known as the ‘deep’, ‘hidden’ or ‘invisible’ Web (Levene, 2006). Levene
estimates that the size of this hidden Web is perhaps 400 to 550 times the
extent of standard webpages.
Content in the 'hidden Web' needs a specific set of user interactions in order to access it and
such access is difficult to automate. Some, limited, headway has been made with this problem
by attempting to replicate these human actions with software agents that can detect HTML
forms and learn how to fill them in, using what are known as hidden Web agents (Masanès,
a large proportion of the Web cannot be reached by crawling as the content is
68
See: http://discuss.longnow.org/viewtopic.php?t=39 and
http://video.google.com/videoplay?docid=4000153761832846346&q=longnow.org&pl=true
accessed 28/01/07].
[last
69
with. In the traditional case of a book, a number of copies, maybe 2,000 of each edition are published,
printed and distributed (each of which is the same in terms of content). There is no need for an archive
to use a particular one of these copies in order to preserve a representation of that edition. In this
instance, the book's cardinality would be 2,000.
In simple terms the number of instances (or copies) of each work that are available to deal/work
70
take a copy of that page then it is unique on the date and at the time shown, but will not be the same on
the next visit.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
A simple example: Many website homepages graphically display the current time and date. If we
40
2006). One alternative requires direct collaboration with a site’s owner, who agrees to expose
the full list of files to an archive process through a protocol such as OAI-MHP
alternative, which saves the site’s owner from setting up a protocol and which is useful for
websites that offer a database gateway which holds metadata about a document collection, is
to extract (
documents, in an open format. In effect, the database has been replaced, at the archive, by an
XML file. This is the approach being facilitated by the deepArc tool that is being developed
by the Bibliothèque Nationale de France as part of the International Internet Preservation
Consortium (IIPC)
5.4.1.3 Legal issues pertaining to preservation and archiving are complex
Day (2003) argues that another major problem that relates to Web archiving is its legal basis.
In particular, there are considerable intellectual property issues involved in preserving
databases (as opposed to documents) which are compounded by general legal issues
surrounding copyright, lack of legal deposit mechanisms, liability issues relating to data
protection, content liability and defamation that pose problems for the collection and
archiving of content.
71. Anotherdeep mine) the metadata directly from the database and archive it, together with the72.
5.4.2 Preserving content produced through Web 2.0 services and applications.
As we have seen, there are considerable issues around the long-term preservation of the Web,
but how do these issues change with the introduction of Web 2.0 ideas and services?
Material produced through Web 2.0 services and applications is clearly dynamic, consisting
of blog postings, data mash-ups, ever-changing wiki pages and personal data that have been
uploaded to social networking sites. Some would argue that much of this content is of limited
value and does not warrant significant preservation efforts. On the other hand, Web 2.0
material is still part of the Web and others argue that since the Web is playing a major role in
academic research, scientific outputs and learning resources there is a strong case for
preserving at least some of it (Day, 2003) and a clear argument is now developing for the
preservation of blogs and wikis (Swan, 2006). Blogs in particular clearly form part of a
conversation that is increasingly part of our culture. From the point of view of education,
increasingly, published academic research will make reference to Web 2.0-type material, for
example, a peer group wiki focused on an experiment.
There are two key questions one can ask of Web 2.0 with regard to preservation. Firstly, to
what extent does Web 2.0 content form part of the hidden Web? Most Web-based archiving
tools make use of crawler technology and the issue here is whether the Web is evolving
towards an information architecture that ‘resists traditional crawling techniques’ (Masanès,
2006, p. 128). Getting at the underlying data that is being used in a wide variety of Web 2.0
applications is a major problem: many Web 2.0 services and mash-ups use layered APIs
which sit on top of very large dynamic databases. Unfortunately, technology to allow the
preservation of data from a dynamic database is only just beginning to be developed
might involve the development of some kind of ‘wayback machine’ that reconstructs a
database’s state at a specific time (Rosenthal, 2006).
In addition, the APIs used by many of the Web 2.0 systems are often described as open, but
they are, in fact, proprietary and subject to change; much of Web 2.0 is in perpetual beta and
73. This
71
Open Archive Initiative Metadata Harvesting Protocol
72
http://netpreserve.org/about/index.php [last accessed 14/02/07].
73
references section for a selection of his work.
JISC Technology and Standards Watch, Feb. 2007 Web 2.0
Peter Buneman at the University of Edinburgh has begun to develop the basic concept. See the
41
preservation mechanisms that make use of these interfaces would need to be able handle this
kind of change.
Secondly, how important is it to capture the graphical essence of Web 2.0 content and is this
technically possible? Many Web 2.0 services utilise a strong graphical look and feel in order

See: http://www.podcasting.blog-city.com/tags/?/ukhepodnet [last accessed 10/02/07].

No comments:

Post a Comment