Faves for this Web page

  • rynoshark - Sep 15 2008 | algorithms

    Nice idea! Redot from greg.

    Quoted: What I particularly like about this paper is that they take a very hard problem and find a beautifully simple solution. Rather than taking on the brutal task of tearing apart the page layout to discard ads, navigation, and other goo, they just noted that the most important part of the page tends to be natural language text. By starting all their signatures at stopwords, they naturally focus the algorithm on the important parts of the page. Very cool.

  • glinden - Aug 07 2008 | blogs

    Quoted: Martin Theobald, Jonathan Siddharth, and Andreas Paepcke from Stanford University have a cute idea in their SIGIR 2008 paper, "SpotSigs: Robust and Efficient Near Duplicate Detection in Large Web Collections" (PDF). They focus near duplicate detection on the important parts of a web pages by using the next few words after a stop word as a signature.

Add a Fave for this Web page

What happens when I press Publish?
Your Fave for this Web page gets shared with the Faves community. You can access it at any time by selecting "My Faves" from the menu above.
Why do you ask for my email address?
We use your email address to create an account, so you can easily find your Fave again at a later time.
Rate It

Separate each email address with a comma.
WE DO NOT SPAM | Please read our privacy policy.

Related Content from Around Faves

blogging

VIEW ALL

food

VIEW ALL