Friday, June 01, 2007

Notes from JavaOne 2007 - The Semantic Web (BOF-6746)

This post is a tidied up version of the notes I took at Henry Story's BOF (BOF-6746) at the 2007 JavaOne variously entitled "Web 3.0: This is the Semantic Web" or "Developing Web 3.0". They are best read alongside the presentation. Henry moves fast...

Definitions
"PC era" - "The Desktop" (1980 - 1990)
"Web 1.0" - "The World Wide Web" (1990 - 2000)
"Web 2.0" - "The Social Web" (2000 - 2010)
"Web 3.0" - "The Data Web" -> "The Semantic Web" (2010 - 2020)
"Web 4.0" - "The NetOS" -> "The Intelligent Web" (2020 - 2030)

[DEMO] "Web 2.7" - Freebase. This is a Semantic wiki. It is all structured data. You can create classes of things. If I create a new film it will create a new film object and the information about it. I can also create my own classes. I am creating a database on the web

Pic: "It's Dog Simple - It's not complicated. Reality is complicated."

Web Architecture 101
  • URI (encompasses URLs and URNs) - "Universal Resource Identifier". This identifies a Resource. You might as well use URIs as they are
  • REST - "Representational State Transfer". A URL in a web browser can do an HTTP "GET" and a Representation of the Resource mapped by the URL is returned. A resource can return any number of representations.
  • Caching - The Web can cache Representations. If I call the same URL I get the same result.
  • Relations - REST consists of relations. RDBMS has limitations: relations are local, not universal. We want to webify the DB. If you add URIs to the Subject (e.g. "Tim"), Relation (e.g. the column name - "Name") and the Object (e.g. the row id'd by the Primary Key) we can get an Object with a "GET". You simply click on the URL to get it's meaning. Relations are now universal
  • RDF - "Resource Description Framework" - URI's exist to define Subjects , Relations/Properties and Objects
  • RDF with namespaces - Simplifies the URI's with @prefix. E.g. @refix foaf:
  • Syntax (how the URI strings combine to identify Objects) versus Semantics (how URI strings relate to the world - what they map to)
Advantages - Simplicity.
  • URIs are the only way to identify resources worldwide.
  • REST is the most scalable and simplest way to set up a universal info space.
  • RDF - you can't do it with less than a triple (Subject, Relation/Property, Object), it has syntax independence and is clickable (i.e. click the link) data
FOAF
FOAF is a simple ontology to describe friend of a friend relationships (Available today on blogs) - it is an example of a semantic dataset

N3
Another way to write down RDF data

OWL
Ontology Web Language - a set of resources which define things like classes, properties, the set of relations required to do something with an object in programming

DOAP
Description of a project (another ontology - we can use classes from different libraries. It is all in one uniform information space). This is being used today to describe OSS projects. Once informarion is explsed in this format it can be scraped and aggregated

Tools
There are ~500 Semantic tools, 50% in Java
  • DOAP integration with Netbeans
  • Protege - lets you define ontologies
  • TopBraid Composer - define ontologies and instance data
  • @RDF annotations in Java - there is a java.net project
  • Baetle - Bug And Defect Tracking Language
    • Uses - once you can track bug info from all OSS projects you can create bug hierarchies (e.g. "this bug from NB depends on this bug from Apache")
  • SPARQL - Semantic web query language. This is being stabdardised in the W3C. It looks a lot like SQL. If you have RDF data srored in a repository, you can then put a SPARQL endpoint in front of that. The data comes back as lists of RDF triples. You can query > 1 repository and then compare all the results by URI - the same URI means it is the same item. Therefore you don't need to refactor multiple databases together. You can get results back as simple XML(JSON)
  • RDF databases - The right way to do it is to publish data at URLs. You link all this information to see what new information you can get. To be meaningful, you need to know where the individual pieces of data came from to allow preferences of different data resources. "Reasoning". You can publish RDF onto a web page. Or you can publish it in an RDF database. There is a new class of DB to store this - "Triple Stores". These are more optimised for big lists of triples than an RDBMS
  • Semantic Wiki
  • Semantic Desktop - if all your data is all over your desktop and your web, the only way to keep track of this is to use URLs
Read More
http://blogs.sun.com/bblfish