Datasets:

AGDLI

The ArCo, GVP and DBpedia, Linking Initiative (AGDLI), is a research activity part of the project "SMARTOUR: intelligent platforms for tourism", funded by the Italian Ministry of University and Research. Our initiative is aimed at linking ArCo’s cultural entities to the well known Getty Vocabulary Program and DBpedia ontologies, with the main goal of providing a semantically rich representation of the Italian cultural heritage for tourism-related knowledge-based applications.

resource page

Multiple Knowledge GraphDB (MKGDB)

MKGDB is a large-scale graph database created as a combination of multiple taxonomy backbones extracted from 5 existing knowledge graphs, namely: ConceptNet, DBpedia, WebIsAGraph, WordNet and the Wikipedia category hierarchy. MKGDB, thanks the versatility of the Neo4j graph database manager technology, is intended to favour and help the development of open-domain natural language processing applications relying on knowledge bases, such as information extraction, hypernymy discovery, topic clustering, and others. Our resource consists of a large hypernymy graph which counts more than 37 millions nodes and more than 81 millions hypernymy relations.

resource page

WebIsAGraph

WebIsAGraph is a very large hypernymy graph compiled from a dataset of is-a relationships extracted from the CommonCrawl. We provide the resource together with a Neo4j plugin to enable efficient searching and querying over such large graph.

resource page

Wiki-MID

Wiki-MID is a LOD compliant multi-domain interests dataset to train and test Recommender Systems. Our English dataset includes an average of 90 multi-domain preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million Twitter users traced during six months in 2017. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their "topical" friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to categorize preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others.