Conceiving and maintaining a Web site is a difficult task. It is far
simpler to discover inconsistent information than a well maintained
site on the Internet. Our goal is to study and construct the tools
that are necessary to design, produce, and maintain complex and
coherent Web sites. Most efforts done in this domain concern the
syntactical structure of Web sites, leading to XML. But only a little
part of the semantics can be handle by syntactical constraints, and we
would like to address all the semantics of a Web site.
After introducing the current possibilities of representing semantics
of Web pages and more generally Web sites (using HTML and XML), we
present our main objective related to supporting the designers and the
web-masters in specifying and verifying semantically Web sites. This
problem of adding semantics is more and more addressed by the different
working groups of the W3C (see RDF [W3C-RDF1999,W3C-RDF-Schema1999], XML
Schema [W3C-XML-Schema2000]...) and also by the ontological approach
issued from AI researchers (SHOE [Heflin, Hendler, and Luke1999], On2broker
[Fensel et al.1998b]). But the main motivation of such works is to improve
information retrieval providing a better indexing.
Our motivation is slightly different as we want to help in designing,
specifying and checking Web sites. Very few works address the
semantic verification of web pages. Two of them are WebMaster
[van Harmelen and van der Meer1999,van Harmelen and Fensel1999], and works by
PCR99 that uses attribute grammars.
Our approach is inspired from previous works done in semantics of
programming languages, drawing a parallel between the syntax of
programming languages and the structure of Web sites (or
semi-structured documents), and between the semantics of programs and
the semantics of Web sites, applying some notions of types and
semantic rules to documents on the Web. To achieve this goal, we have
used the Centaur system (a generic programming environment generator,
http://www.inria.fr/croap/centaur/centaur.html) and its semantics
specification formalism Typol to construct a prototype of a Web site
verification system by means of inference rules using natural semantics
[Despeyroux1987,Kahn1987,Despeyroux1988,Borras et al.1988].
We illustrate this method by applying it on two examples of Web sites,
a thematic directory (like Yahoo) and an institutional site. The use
of natural semantics shows clearly the difference between syntactical
checking (for example verifying a page against a DTD, like in an XML
validator) that is context free, and a semantical computation that is
context dependent. The example of thematic directory shows the
possibility of using external resources tools (thesauri, ontologies).