Faves for this Web page
- rynoshark - Sep 15 2008 | algorithms
Nice idea! Redot from greg.
Quoted: What I particularly like about this paper is that they take a very hard problem and find a beautifully simple solution. Rather than taking on the brutal task of tearing apart the page layout to discard ads, navigation, and other goo, they just noted that the most important part of the page tends to be natural language text. By starting all their signatures at stopwords, they naturally focus the algorithm on the important parts of the page. Very cool.
Quoted: Martin Theobald, Jonathan Siddharth, and Andreas Paepcke from Stanford University have a cute idea in their SIGIR 2008 paper, "SpotSigs: Robust and Efficient Near Duplicate Detection in Large Web Collections" (PDF). They focus near duplicate detection on the important parts of a web pages by using the next few words after a stop word as a signature.
Add a Fave for this Web page
- What happens when I press Publish?
- Your Fave for this Web page gets shared with the Faves community. You can access it at any time by selecting "My Faves" from the menu above.
- Why do you ask for my email address?
- We use your email address to create an account, so you can easily find your Fave again at a later time.
Related Content from Around Faves
blogging
-
Hey! They're making fun of my peeps!
1 FaverViewed: 5 Times
Fortunately, many of the things dreamed up by Hugo Gernsback, (Publisher of Popular Mechanics), such as the topless ladies' swimsuit, were widely adopted. - sudha - 19 days ago1 FaverViewed: 3 Times
- sudha - 19 days ago1 FaverViewed: 3 Times
food
-
1 FaverViewed: 14 TimesQuoted: How to make Kolacky - kolache Kolace - kolach - kolacky - rohlicky - kolac - kolachki - kolachi - kolachke. A type of Czech and Slovak pastry consisting of fillings inside a sweet bread dough then folded. Recipe for nut, poppy seed, prune filling.
- mohit - 5 days ago1 FaverViewed: 6 Times
- tfwright - 10 days ago1 FaverViewed: 5 Times
