I had to look up "splog" (a spam blog
, which the author uses to promote affiliated Web sites, according to Wikipedia) and double-check that "scrobble", the term that Last.
SVM (support vector machine), because the cycle of creation and deletion of spam blog is very rapid compared to normal blogs.
A spam blog is often composed of parts of other blogs and news articles.
However, some normal blogs may still be extracted as spam blog clusters, since all blog pages have certain high-frequency terms generated by blog sites, e.
Step 1: For initialization, spam blogs in the spam seed are appended in an initial spam blog list SBlog.
You may have already encountered a spam blog
, though they often look exactly like the real thing: there's an area at the end of each "post" for Comments, an Archived Blog section by month, a Recent Posts section, and some even include a BlogRoll so you can see who has viewed the blog.
Performing linguistic analysis on blogs is plagued by two additional problems: (1) the presence of spam blogs and spam comments and (2) extraneous noncontent including blog rolls, link rolls, advertisements, and sidebars.
Over the past year (Kolari 2007) we have developed techniques to detect spam blogs as they fit the overall architecture (figure 5), arrived at through our discussions with practitioners.
Once collected, the data must be cleaned to remove spurious spam blogs, or splogs (Kolari et al.
Spam blogs, for example, often form communities whose structural properties are very unlike those of naturally occurring blogs.
Such spam blogs
, or "splogs," typically collect money from advertisers as users click on their links, and some visitors purchase the promoted products.
The rise of spam blogs
, or splogs, can be attributed in part to the lack of clarity around online copyright and absence of a working marketplace for user-generated content.