by Thierry Despeyroux - Ontologies are heavily used in the context of the Semantic Web to formalize human knowledge. Ontologies engineering is now an important activity, and specialized softwares are developed to help in managing huge ontologies. The development of ontologies and of information systems can be compared to the development of programs. In this paper we make a parallel between ontologies and types in programming languages, and we use a small example to show that an ontology can be seen as a type system. When an ontology evolves, studying the impact of this evolution on the semantic annotations that use this ontology can be viewed as a type-checking process. The next step should be to import some notions used in the types community as overloading, polymorphism, type parameters, etc. to improve or create more powerful ontology definition languages.
Published at the IADIS WWW/Internet 2008 conference, 13-15/10/2008, Freiburg, Germany, 2008.
by Thierry Despeyroux, Eduardo Fraschini, Anne-Marie Vercoustre - Nous nous intéressons à l'extraction d'entités nommées avec comme but d'exploiter un ensemble de rapports pour en extraire une liste de partenaires. À partir d'une liste initiale, nous utilisons un premier ensemble de documents pour identifier des schémas de phrase qui sont ensuite validés par apprentissage supervisé sur des documents annotés pour en mesurer l'efficacité avant d'être utilisés sur l'ensemble des documents à explorer. Cette approche est inspirée de celle utilisée pour l'extraction de données dans les documents semi-structurés (wrappers) et ne nécessite pas de ressources linguistiques particulières ni de larges collections de tests. Notre collection de documents évoluant annuellement, nous espérons de plus une amélioration de notre extraction dans le temps.
Published at the 7ièmes Journées francophones Extraction et Gestion des Connaissances EGC 2007, 23/01/2007, Namur, Belgique, 2007.
by Thierry Despeyroux, Mounir Fegas, Yves Lechevallier, Anne-Marie Vercoustre - Cet article présente un nouveau modèle de représentation pour la classification de documents XML. Notre approche permet de prendre en compte soit la structure seule, soit la structure et le contenu de ces documents. L'idée est de représenter un document par l'ensemble des sous-chemins de l'arbre XML de longueur comprise entre n et m, deux valeurs fixées a priori. Ces chemins sont ensuite considérés comme de simples mots sur lesquels on peut appliquer des méthodes standards de classification, par exemple K-means. Nous évaluons notre méthode sur deux collections: la collection INEX et les rapports d'activité de l'INRIA. Nous utilisons un ensemble de mesures bien connues dans le domaine de la recherche d'information lorsque les classes sont connues a priori. Lorsqu'elles ne sont pas connues, nous proposons une analyse qualitative des résultats qui s'appuie sur les mots (chemins) les plus caractéristiques des classes générées.
Published at the 6ème journées Extraction et Gestion des Connaissances (EGC 2006), Revue des Nouvelles Technologies de l'Information (RNTI-E-6), Lille, France - 17 - 20 janvier 2006.
Paper in pdf
by Thierry Despeyroux, Yves Lechevallier, Brigitte Trousse, Anne-Marie Vercoustre - This paper presents some experiments in clustering homogeneous XML documents to validate an existing classification or more generally an organisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging classification. We mix the selection of structured features with fine textual selection based on syntactic characteristics. We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.
Published at the 5th International Conference on Knowledge Management, Know Center, Graz, Austria, 29 June - 1 July 2005.
Paper in pdf (updated biblio)
by Thierry Despeyroux, Yves Lechevallier, Brigitte Trousse, Anne-Marie Vercoustre - Cet article présente différentes expériences de classification de documents XML de structure homogène, en vue d'expliquer et de valider une présentation organisationnelle pré-existante. Le problème concerne le choix des éléments et mots utilisés pour la classification et son impact sur la typologie induite. Pour cela nous combinons une sélection structurelle basée sur la nature des éléments XML et une sélection linguistique basée sur un typage syntaxique des mots. Nous illustrons ces principes sur la collection des rapports d'activité 2003 des équipes de recherche de l'Inria en cherchant des groupements d'équipes (Thèmes) à partir du contenu de différentes parties de ces rapports. Nous comparons nos premiers résultats avec les thèmes de recherche officiels de l'Inria.
Published at the 5èmes journéee d'Extraction de de Gestion des Connaissances (EGC 2005), Université René Descartes, Paris, 19-21 janvier 2005.
Paper in pdf (updated biblio)
by Thierry Despeyroux - As Web sites are now ordinary products, it is necessary to explicit the notion of quality of a Web site. The quality of a site may be linked to the easiness of accessibility and also to other criteria such as the fact that the site is up to date and coherent. This last quality is difficult to insure because sites may be updated very frequently, may have many authors, may be partially generated and in this context proof-reading is very difficult. The same piece of information may be found in different occurrences, but also in data or meta-data, leading to the need for consistency checking. In this paper we make a parallel between programs and Web sites. We present some examples of semantic constraints that one would like to specify (constraints between the meaning of categories and sub-categories in a thematic directory, consistency between the organization chart and the rest of the site in an academic site). We present quickly the Natural Semantics, a way to specify the semantics of programming languages that inspires our works. Then we propose a specification language for semantic constraints in Web sites that, in conjunction with the well known ``make'' program, permits to generate some site verification tools by compiling the specification into Prolog code. We apply our method to a large XML document which is the scientific part of our institute activity report, tracking errors or inconsistencies and also constructing some indicators that can be used by the management of the institute.
Published at the 13th World Wide Web Conference (WWW2004), New York City, 17-22 May 2004.
by Thierry Despeyroux and Brigitte Trousse - La quantité d'information accessible sur le Web est phénoménale et la recherche d'une information pertinente et cohérente devient une gageure. Le but principal du Web Sémantique est de faciliter et de mécaniser la recherche d'information en formalisant une matière jusque là plutôt textuelle. Notre démarche est différente puisque nous nous intéressons à la construction des sites et voulons aider au contrôle et au maintient de la cohérence sémantique de ces derniers en utilisant des techniques utilisées habituellement pour définir la sémantique formelle des langages de programmation.
Published at the "journées scientifiques" de l'action spécifique Web Sémantique, CNRS, Paris, 10 et 11 octobre 2002
by Thierry Despeyroux and Brigitte Trousse - Many tools already exist to help in creating Web sites, but there are more concerned by the external appearance of the sites than by their content. How can we help in designing and maintaining semantically coherent sites ? Using software engineering technics, and more exactly Natural Semantics, a framework coming from the world of the semantics of programming languages, we propose a way of specifying and verifying the semantics of a Web site during its life time.
Published at the AACE WebNet 2001 Conference, Orlando, Florida, October 2001
by Thierry Despeyroux and Brigitte Trousse - A lot of efforts done in the word of the Web aims to facilitate data representation and data mining. This is done most of the time by a syntactic formalisation of knowledge or information using languages such as XML or RDF using an hypertext structure. We claim that this is not sufficient and that we need to provide a way of specifying semantic (global) constraints over Web sites to be able to mechanically perform some verifications and proof-reading during the life time of the site, using some software engineering technics.
Published at the ACM HYPERTEXT 2001 Conference, Århus, Denmark, August 2001
Available under ACM copyright in the ACM Digital Library: click here
by Thierry Despeyroux and Brigitte Trousse - The huge amount of information and knowledge available on the Web leads to the fact that it is more and more difficult to manage this information. Two different ways are commonly explored: giving a syntactical structure to Web sites, and annotating their content to facilitate Web mining. In this paper we explore a different approach inherited from software engineering: specifying the semantics of Web sites, allowing semantic verifications that will help both the conception and the maintenance of Web sites. To achieve this goal, we have experimented with the application of Natural Semantics (traditionally used to specify the semantics of programming languages) to Web sites specification and verification.
Published at the RIAO'2000 Conference, Paris, France, June 2000
Available as dvi , postscript and HTML . You can also see the slides (HTML) used at the RIAO'2000 conference.
by Thierry Despeyroux and André Hirschowitz - Draft - In this paper, we give a new, categorical definition for first-order and higher-order abstract syntax (that we call functional abstract syntax), and we state the corresponding induction, separation and recursion principles. We also present a practical application of functional abstract syntax by means of syntactical editing.
Available as dvi and postscript
This is the manual for AS, an abstract syntax specification formalism. The main features of this formalism are modularity and support for second-order abstract syntaxes. AS is the first formalism from the CLF (Computer Languages Factory), a forthcoming set of tools and specification formalisms for quick prototyping and complete implementation of computer languages syntaxes and semantics. This version of AS may be used under the Centaur system for first-order features only. The second-order features will be useful only when a higher-order version of Typol will be distributed.
Available as Inria report RT-0197 (gzipped PostScript), or HTML (latest version).
Prolog has a great potential as a high level programming language or as a specification language. However, the Prolog programmer would be happier if it was easier to isolate the pure logical parts of a program from its low level or non logical ones. This is the case in particular if one wants to derive a real tool from a logic program by adding some error recovery. This paper presents a clean way of adding error recovery to a pure prototype. This method has been used for generating Prolog code from a high level specification language that has a built-in error recovery mechanism.
Published at the INAP '95 Conference, Hino, Tokyo, Japan, October 1995
Available as dvi , postcript and HTML . You can also get a copy of the slides (postcript) used at the INAP'95 conference.