Skip to content
Snippets Groups Projects
Select Git revision
  • master default protected
1 result

dke_project

  • Clone with SSH
  • Clone with HTTPS
  • Data and Knowledge Engineering HP

    This is the elaboration of the home project. The aim of this project was to preserve all comedy films from DBPedia and Wikidata. The condition was that each film had at least one director born in 1970 or later.

    Preparation

    This project was developed for Python 3.7. It is based on a Flask server. In order to start the server and to be able to execute all operations, some preparations have to be made in advance.

    Install the enclosed requirements as well:

    $ pip3.7 install -r requirements.py

    Get started

    This project can be started with the following command:

    $ python3.7 app.py

    This starts the service on http://127.0.0.1:5000/.

    Swagger Documentation

    The whole service was documented with Swagger. This makes it possible to interactively explore the routes and generate new data. The start page will look like this:

    swagger

    Here you will find the interactive elements. How exactly the data is generated will be explained later.

    Submission 1

    This section shows how to create the data in wikidata.txt and dbpedia.txt. The exact strategies are given in the presentation.

    DBPedia

    To generate the data for DBPedia, the routes must be called in the following order:

    1. /dbpedia/groundtruth -> This will create a list of all result dicts in /static/dbpedia/dbpedia_groundtruth.txt
    2. /dbpedia/n3 -> This will create all Triples in static/dbpedia/dbpedia.txt

    Wikidata

    To generate the data for Wikidata, the routes must be called in the following order:

    1. /wikidata/groundtruth/advanced -> This will create a list of all result dicts in /static/wikidata/wikidata_groundtruth.txt
      • Careful, that can take 2 to 3 minutes.
      • Alternatively you can use /wikidata/groundtruth. However, the data is slightly smaller.
    2. /wikidata/n3 -> This will create all Triples in static/wikidata/wikidata.txt

    Submission 2

    This submission specifies how the vocabulary is put together to create a common knowledge base. The table can be taken from static/mapping/mapping.csv. The mapping will look like this:

    Variable Wikidata DBPedia
    Titel http://www.w3.org/2000/01/rdf-schema#label http://xmlns.com/foaf/0.1/name
    Director http://www.wikidata.org/prop/direct/P57 http://dbpedia.org/ontology/director
    Author http://www.wikidata.org/prop/direct/P58 http://dbpedia.org/ontology/author
    Cast Member http://www.wikidata.org/prop/direct/P161 http://dbpedia.org/ontology/starring
    Date published http://www.wikidata.org/prop/direct/P577 http://dbpedia.org/ontology/releaseDate
    Genre http://www.wikidata.org/prop/direct/P136 http://purl.org/dc/terms/subject
    Genre http://www.wikidata.org/prop/direct/P136 http://dbpedia.org/ontology/genre
    Duration http://www.wikidata.org/prop/direct/P2047 http://dbpedia.org/ontology/runtime
    Description http://schema.org/description http://dbpedia.org/ontology/abstract
    Production company http://www.wikidata.org/prop/direct/P272 http://dbpedia.org/ontology/distributor
    Production company http://www.wikidata.org/prop/direct/P272 http://dbpedia.org/property/productionCompanies

    As you can see there are double mappings for the Genre properties and Production Company. The */groundtruth/info routes can be used to decide whether one of the properties can be removed. The one with the weaker properties should always be removed.

    Submission 3

    In this part the matches of the records are searched. This is done in 2 levels. The first is that owl:sameAs is used directly. This allows to recognize linked data from DBPedia to Wikidata. On the second level the Titles and Directors of the movies are compared. Since the directors cannot be resolved directly, owl:sameAs is resolved and looked at again. Thus the URIs of the directors in DBPedia can be resolved to Wikidata. The overlaps can then be created under /coreferences and viewed under /static/coreferences/coreferences.txt.

    Presentation

    The remaining points will be discussed during the presentation.