Data and Knowledge Engineering HP
This is the elaboration of the home project. The aim of this project was to preserve all comedy films from DBPedia and Wikidata. The condition was that each film had at least one director born in 1970 or later.
Preparation
This project was developed for Python 3.7. It is based on a Flask server. In order to start the server and to be able to execute all operations, some preparations have to be made in advance.
Install the enclosed requirements as well:
$ pip3.7 install -r requirements.py
Get started
This project can be started with the following command:
$ python3.7 app.py
This starts the service on http://127.0.0.1:5000/
.
Swagger Documentation
The whole service was documented with Swagger. This makes it possible to interactively explore the routes and generate new data. The start page will look like this:
Here you will find the interactive elements. How exactly the data is generated will be explained later.
Submission 1
This section shows how to create the data in wikidata.txt and dbpedia.txt. The exact strategies are given in the presentation.
DBPedia
To generate the data for DBPedia, the routes must be called in the following order:
-
/dbpedia/groundtruth
-> This will create a list of all result dicts in/static/dbpedia/dbpedia_groundtruth.txt
-
/dbpedia/n3
-> This will create all Triples instatic/dbpedia/dbpedia.txt
Wikidata
To generate the data for Wikidata, the routes must be called in the following order:
-
/wikidata/groundtruth/advanced
-> This will create a list of all result dicts in/static/wikidata/wikidata_groundtruth.txt
- Careful, that can take 2 to 3 minutes.
- Alternatively you can use
/wikidata/groundtruth
. However, the data is slightly smaller.
-
/wikidata/n3
-> This will create all Triples instatic/wikidata/wikidata.txt
Submission 2
This submission specifies how the vocabulary is put together to create a common knowledge base.
The table can be taken from static/mapping/mapping.csv
.
The mapping will look like this:
As you can see there are double mappings for the Genre
properties and Production Company
.
The */groundtruth/info
routes can be used to decide whether one of the properties can be removed.
The one with the weaker properties should always be removed.
Submission 3
In this part the matches of the records are searched.
This is done in 2 levels. The first is that owl:sameAs
is used directly.
This allows to recognize linked data from DBPedia to Wikidata.
On the second level the Titles
and Directors
of the movies are compared.
Since the directors cannot be resolved directly, owl:sameAs
is resolved and looked at again.
Thus the URIs
of the directors in DBPedia can be resolved to Wikidata.
The overlaps can then be created under /coreferences
and viewed under /static/coreferences/coreferences.txt
.
Presentation
The remaining points will be discussed during the presentation.