TY - GEN
T1 - Flint
T2 - 11th International Conference on Extending Database Technology, EDBT 2008
AU - Blanco, Lorenzo
AU - Crescenzi, Valter
AU - Merialdo, Paolo
AU - Papotti, Paolo
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the collected pages can be used to populate a custom, search engine; to this end we rely on the facilities provided by Google Co-op.
AB - Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the collected pages can be used to populate a custom, search engine; to this end we rely on the facilities provided by Google Co-op.
UR - http://www.scopus.com/inward/record.url?scp=43349106811&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43349106811&partnerID=8YFLogxK
U2 - 10.1145/1353343.1353435
DO - 10.1145/1353343.1353435
M3 - Conference contribution
AN - SCOPUS:43349106811
SN - 9781595939265
T3 - Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings
SP - 720
EP - 724
BT - Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings
Y2 - 25 March 2008 through 29 March 2008
ER -