Flint: Google-bsing the Web

Lorenzo Blanco; Valter Crescenzi; Paolo Merialdo; Paolo Papotti

doi:10.1145/1353343.1353435

Flint: Google-bsing the Web

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Scopus citations

Abstract

Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the collected pages can be used to populate a custom, search engine; to this end we rely on the facilities provided by Google Co-op.

Original language	English (US)
Title of host publication	Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings
Pages	720-724
Number of pages	5
DOIs	https://doi.org/10.1145/1353343.1353435
State	Published - 2008
Externally published	Yes
Event	11th International Conference on Extending Database Technology, EDBT 2008 - Nantes, France Duration: Mar 25 2008 → Mar 29 2008

Publication series

Name	Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings

Other

Other	11th International Conference on Extending Database Technology, EDBT 2008
Country/Territory	France
City	Nantes
Period	3/25/08 → 3/29/08

ASJC Scopus subject areas

Hardware and Architecture
Information Systems
Software

Access to Document

10.1145/1353343.1353435

Cite this

Blanco, L., Crescenzi, V., Merialdo, P., & Papotti, P. (2008). Flint: Google-bsing the Web. In Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings (pp. 720-724). (Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings). https://doi.org/10.1145/1353343.1353435

Flint: Google-bsing the Web. / Blanco, Lorenzo; Crescenzi, Valter; Merialdo, Paolo et al.
Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings. 2008. p. 720-724 (Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Blanco, L, Crescenzi, V, Merialdo, P & Papotti, P 2008, Flint: Google-bsing the Web. in Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings. Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings, pp. 720-724, 11th International Conference on Extending Database Technology, EDBT 2008, Nantes, France, 3/25/08. https://doi.org/10.1145/1353343.1353435

Blanco L, Crescenzi V, Merialdo P, Papotti P. Flint: Google-bsing the Web. In Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings. 2008. p. 720-724. (Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings). doi: 10.1145/1353343.1353435

@inproceedings{c07a03d7e9be403a949b481e057fe167,

title = "Flint: Google-bsing the Web",

abstract = "Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the collected pages can be used to populate a custom, search engine; to this end we rely on the facilities provided by Google Co-op.",

author = "Lorenzo Blanco and Valter Crescenzi and Paolo Merialdo and Paolo Papotti",

year = "2008",

doi = "10.1145/1353343.1353435",

language = "English (US)",

isbn = "9781595939265",

series = "Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings",

pages = "720--724",

booktitle = "Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings",

}

TY - GEN

T1 - Flint

T2 - 11th International Conference on Extending Database Technology, EDBT 2008

AU - Blanco, Lorenzo

AU - Crescenzi, Valter

AU - Merialdo, Paolo

AU - Papotti, Paolo

PY - 2008

Y1 - 2008

N2 - Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the collected pages can be used to populate a custom, search engine; to this end we rely on the facilities provided by Google Co-op.

AB - Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the collected pages can be used to populate a custom, search engine; to this end we rely on the facilities provided by Google Co-op.

UR - http://www.scopus.com/inward/record.url?scp=43349106811&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=43349106811&partnerID=8YFLogxK

U2 - 10.1145/1353343.1353435

DO - 10.1145/1353343.1353435

M3 - Conference contribution

AN - SCOPUS:43349106811

SN - 9781595939265

T3 - Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings

SP - 720

EP - 724

BT - Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings

Y2 - 25 March 2008 through 29 March 2008

ER -

Flint: Google-bsing the Web

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this