Erinevus lehekülje "Automated reasoning homework 2017" redaktsioonide vahel
| 7. rida: | 7. rida: | ||
| * experimenting with integrating external knowledge bases with your rdf-style data | * experimenting with integrating external knowledge bases with your rdf-style data | ||
| − | The main concrete task is to add some common sense rules to the rdf-style output of the first phase of the project and then ask and prove some example queries using an automated reasoner. | + | The main concrete task is to add some common sense rules to the rdf-style output of the first phase of the project and then ask and prove some example queries using an automated reasoner. This is required for passing the lab. | 
| The secondary (non-obligatory) concrete task  is to integrate an external knowledge base like [http://wiki.dbpedia.org/ dbpedia] or [http://conceptnet.io/ conceptnet] or [https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/#c10444 yago] with your rdf-style output and be able to answer queries with the added help of the external knowledge base. | The secondary (non-obligatory) concrete task  is to integrate an external knowledge base like [http://wiki.dbpedia.org/ dbpedia] or [http://conceptnet.io/ conceptnet] or [https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/#c10444 yago] with your rdf-style output and be able to answer queries with the added help of the external knowledge base. | ||
Redaktsioon: 3. november 2017, kell 10:19
Automated reasoning homework as a second part of a two-phase project
The goal of this homework is
- to learn about representing knowledge and rules in logic
- experimenting with an automated reasoner
- experimenting with integrating external knowledge bases with your rdf-style data
The main concrete task is to add some common sense rules to the rdf-style output of the first phase of the project and then ask and prove some example queries using an automated reasoner. This is required for passing the lab.
The secondary (non-obligatory) concrete task is to integrate an external knowledge base like dbpedia or conceptnet or yago with your rdf-style output and be able to answer queries with the added help of the external knowledge base.
The third extra fancy (totally non-obligatory) task is to process restricted English queries, translating these to Otter and outputting only the English-style short answer.
As in the previous lab, for the lab defence/grading you will have to make a short presentation about your results with examples you manage to handle and the examples where you cannot.
Administrative
Work should be submitted to git, latest one day before deadline.
The groups are same as for the first phase.
What you have to do
First of all, by default we will use the otter prover. Download it and try out some example problems in the otter zip file. Useful examples for the lab are given below, in the section "Examples to use as a starting point". Have a look at the otter manual as well. We note that newer reasoners are allowed as well, but using these would be more complex.
Suppose you have a triplet like
[http://en.wikipedia.org/wiki/Barack_Obama, http://en.wikipedia.org/wiki/china], id:type http://conceptnet5.media.mit.edu/web/c/en/person
where the list [ http://en.wikipedia.org/wiki/Barack_Obama, http://en.wikipedia.org/wiki/china ] means that we do not know which interpretation is the correct one (the original word was like he,this,she,it,...)
First, pick just one interpretation like
http://en.wikipedia.org/wiki/Barack_Obama, id:type http://conceptnet5.media.mit.edu/web/c/en/person
and then
- represent it in otter format
- add common sense rules about the domain of geography (ontology: like a taxonomy, can be a bit more complex)
- pose a question in otter format
- run otter to find answer
If you manage to do that, you will pass.
Next, a bit fancier task you may take up (this is not required for passing, but will award extra points):
Run a loop where you try out different interpretations of he,this,she,it,... and for each interpretation in the list:
- represent it in otter format
- add common sense rules about the domain of geography
- check whether otter generates a contradiction: if yes, then this is probably a wrong interpretation, since it does not match common sense rules.
Integrating with the external knowledge base
Instead of relying only on your own handmade rules and facts stated in your processed English, you could use dbpedia or conceptnet or yago to get a large amount of real-world facts and a set of relevant rules automatically.
This is what you could really get from conceptnet and wordnet and dbpedia: the main purpose why these databases/systems exist.
This is not required to pass the lab, but if you manage to do it (ie. use some of these systems for creating relevant rules and facts automatically) we'll award you more points!
How to integrate dbpedia or conceptnet?
- First download the large textual representation in json or some other easily processable format: investigate different download options and try to find relevant ones which you can understand and work with.
- Second, write a program which converts this representation to a data/rule file in the otter syntax, using the same conventions and wrapping predicates as you did while processing the English text.
- When you create a final query file for Otter, you have to add the relevant rows in the created data/rule file to the Otter query file. Since Otter does not manage working with a huge input file, first filter out relevant facts/rule rows. What are the relevant rows? A simple approach is to take these rows which contain uri-s in your own data/rule set created while processing the English text.
Optional extra fancy task
Instead of posing otter-format questions and printing semi-raw results from Otter, you could let the human user pose restricted English questions like "Is Tallinn a city?", "What is Puise?", "What is the population of Tallinn?", "Where is lake X" with a rather simple, limited form, with a structure like "is X a Y", "what is X", "whas is X of Y" "where is X" etc and translate these to Otter queries. The resulting answer could then be printed out as a plain fact like "Yes", "Farm", "460.000", "Estonia" etc.
Again, this is not required to pass the lab, but if you manage to do it we'll award you more points!
Examples to use as a starting point
The following examples are in the Otter format: the first two are in the clause syntax and are similar to what you are expected to in the third lab. The last one demonstrates formula syntax, calculation and probabilities (kind of).
- Derive simple information about Obama
- Ask a query about Obama
- Output of first lab converted to second lab with a few rules added: can derive that Hyatt has type hotel and Obama is rich. Here is otter output of this example
- A question posed: find a mortal in the text: we just add -rdf(X,"id:type","mortal") | $ans(X). to the facts (i.e. a negated query with a variable and $ans to capture a suitable value for X.
- Probabilities and web scraping Determine the type of objects using information from a web page and probability-like measures.
- Example with weather, dates and trust
How to create the Otter file and use the output
- Split the example Otter file like obama1.txt into the initial part containing the strategy selection etc and the final part containing end_of_list.
- Create the rdf data in Otter format automatically from the output of your parser in the previous lab.
- In case you pose a question, add the negated question in the Otter format.
- Compile the full intput file by appending these parts into a full input file myinput.txt:
- the prefix part,
- the automatically created rdf data in Otter format,
- the (optional) question in Otter format, in the negated form
- the commonsense rules and facts: either static handmade rules or rules/facts created automatically from conceptnet/wordnet/dbpedia
- the short final part ie end_of_list
 
- Run otter like otter < myinput.txt > myoutput.txt
- Process the myoutput.txt, filtering out derived stuff and/or determine whether the output contains the word PROOF