<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I Can Has Linux? &#187; headlines</title>
	<atom:link href="http://icanhaslinux.com/category/headlines/feed/" rel="self" type="application/rss+xml" />
	<link>http://icanhaslinux.com</link>
	<description>Invisible Patent Infringement!</description>
	<lastBuildDate>Tue, 08 Jun 2010 23:47:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Digg.com Ubuntu popular headline analysis</title>
		<link>http://icanhaslinux.com/2007/09/08/diggcom-ubuntu-popular-headline-analysis/</link>
		<comments>http://icanhaslinux.com/2007/09/08/diggcom-ubuntu-popular-headline-analysis/#comments</comments>
		<pubDate>Sat, 08 Sep 2007 18:08:36 +0000</pubDate>
		<dc:creator>LightningCrash</dc:creator>
				<category><![CDATA[digg]]></category>
		<category><![CDATA[headlines]]></category>
		<category><![CDATA[ubuntu]]></category>

		<guid isPermaLink="false">http://icanhaslinux.com/2007/09/08/diggcom-ubuntu-popular-headline-analysis/</guid>
		<description><![CDATA[I was curious what the most popular keywords were in the Ubuntu headlines, since it seemed like some of them seemed identical. So I saved the top 10 pages of results for the search term Ubuntu, sorted by Most Diggs. With all of the pages in a directory, I cut out the headlines and stripped [...]]]></description>
			<content:encoded><![CDATA[<p>I was curious what the most popular keywords were in the Ubuntu headlines, since it seemed like some of them seemed identical.<br />
So I saved the top 10 pages of results for the search term Ubuntu, sorted by Most Diggs.<br />
With all of the pages in a directory, I cut out the headlines and stripped the HTML with the following command:</p>
<p><code>$ cat *.html|grep news-body|sed -e 's/&lt;[^&lt;&gt;]*&gt;//g'  &gt; diggubuntuheadlines.txt</code></p>
<p>Now I have a list of each headline. Unfortunately, though, this also returns headlines from articles that just mention Ubuntu, so I killed the lines that didn&#8217;t have Ubuntu.</p>
<p><code>$ grep -i ubuntu diggubuntuheadlines.txt &gt; diggubuntuheadlines2.txt  </code></p>
<p>Now I want to pull out a list of unique words in the file, the number of occurences of each word, sorted by the most occurences descending.  Thanks to <a href="http://www.perlmonks.org/?node_id=457784" target="_blank">this short perl script posted</a> by planetscape, I have a solution.</p>
<p>I paste the contents into a file, change the first line to read /usr/bin/perl, save it, then chmod +x the file.</p>
<p>Next I pipe the contents of the file into the script, and save the output.</p>
<p><code>$ cat diggubuntuheadlines2.txt | ./countwords.pl &gt; diggheadlinecount.txt</code></p>
<p>Well, I guess that&#8217;s enough foreplay, what&#8217;s the verdict?</p>
<blockquote><p>117    ubuntu<br />
25    to<br />
22    linux<br />
20    windows<br />
19    a<br />
14    in<br />
14    dell<br />
12    with<br />
12    on<br />
12    for<br />
11    the<br />
9    and<br />
8    install<br />
7    vista<br />
7    of<br />
7    how<br />
6    your<br />
6    you<br />
6    from<br />
5    released<br />
5    pcs<br />
5    out<br />
5    new<br />
5    is<br />
5    guide<br />
5    feisty<br />
5    by<br />
4    without<br />
4    what<br />
4    users<br />
4    than<br />
4    s<br />
4    has<br />
4    free<br />
4    best<br />
3    xp<br />
3    video<br />
3    ultimate<br />
3    time<br />
3    switching<br />
3    should<br />
3    running<br />
3    run<br />
3    over<br />
3    os<br />
3    official<br />
3    mythtv<br />
3    more<br />
3    microsoft<br />
3    media<br />
3    logo<br />
3    like<br />
3    know<br />
3    installing<br />
3    get<br />
3    fawn<br />
3    fast<br />
3    edition<br />
3    edgy<br />
3    dock<br />
3    boot<br />
3    based<br />
3    as<br />
3    anything<br />
3    about<br />
2    x<br />
2    world<br />
2    will<br />
2    way<br />
2    vs<br />
2    vote<br />
2    using<br />
2    up<br />
2    tutorial<br />
2    top<br />
2    this<br />
2    there<br />
2    t<br />
2    support<br />
2    studio<br />
2    stickers<br />
2    side<br />
2    shuttleworth<br />
2    review<br />
2    read<br />
2    powered<br />
2    pic<br />
2    pc<br />
2    password<br />
2    osx<br />
2    online<br />
2    one<br />
2    officially<br />
2    now<br />
2    need<br />
2    multimedia<br />
2    mount<br />
2    mce<br />
2    mark<br />
2    make<br />
2    magazine<br />
2    looks<br />
2    look<br />
2    laptop<br />
2    it<br />
2    installed<br />
2    gifting<br />
2    full<br />
2    eye<br />
2    ever<br />
2    dual<br />
2    distribution<br />
2    desktop<br />
2    days<br />
2    core<br />
2    completely<br />
2    compiz<br />
2    cheap<br />
2    center<br />
2    cd<br />
2    candy<br />
2    breezy<br />
2    box<br />
2    books<br />
2    beryl<br />
2    be<br />
2    are<br />
2    applications<br />
2    almost<br />
1    year<br />
1    xps<br />
1    xorg<br />
1    xgl<br />
1    write<br />
1    writabable<br />
1    wpics<br />
1    would<br />
1    working<br />
1    wireless<br />
1    winxp<br />
1    wins<br />
1    wine<br />
1    why<br />
1    whole<br />
1    while<br />
1    wga<br />
1    wep<br />
1    welcome<br />
1    web<br />
1    weapons<br />
1    we<br />
1    was<br />
1    warranty<br />
1    warcraft<br />
1    want<br />
1    wall<br />
1    voted<br />
1    vmware<br />
1    virus<br />
1    victorious<br />
1    versus<br />
1    validates<br />
1    uses<br />
1    user<br />
1    useful<br />
1    us<br />
1    unmount<br />
1    ui<br />
1    ugly<br />
1    tweaks<br />
1    tweaking<br />
1    tutorials<br />
1    try<br />
1    truth<br />
1    triple<br />
1    tricks<br />
1    transparent<br />
1    transform<br />
1    today<br />
1    tips<br />
1    tier<br />
1    thursday<br />
1    thinks<br />
1    things<br />
1    their<br />
1    ten<br />
1    technical<br />
1    tad<br />
1    system<br />
1    switches<br />
1    switch<br />
1    supported<br />
1    super<br />
1    sun<br />
1    strip<br />
1    story<br />
1    still<br />
1    sticker<br />
1    steps<br />
1    stable<br />
1    squad<br />
1    spread<br />
1    spotted<br />
1    spiffing<br />
1    software<br />
1    smoke<br />
1    single<br />
1    simple<br />
1    shrink<br />
1    shirt<br />
1    shift<br />
1    shell<br />
1    server<br />
1    searched<br />
1    seamless<br />
1    screwup<br />
1    screenshots<br />
1    screen<br />
1    satanic<br />
1    root<br />
1    rom<br />
1    rising<br />
1    right<br />
1    reviewit<br />
1    repository<br />
1    reported<br />
1    release<br />
1    redesign<br />
1    really<br />
1    readable<br />
1    ran<br />
1    ram<br />
1    quietly<br />
1    purchase<br />
1    progress<br />
1    products<br />
1    preview<br />
1    prettier<br />
1    preinstalled<br />
1    prebuilt<br />
1    pre<br />
1    posters<br />
1    possibly<br />
1    popularity<br />
1    popular<br />
1    pm<br />
1    player<br />
1    picture<br />
1    physics<br />
1    photoshop<br />
1    performance<br />
1    perfectly<br />
1    partition<br />
1    part<br />
1    parliament<br />
1    or<br />
1    onto<br />
1    office<br />
1    offers<br />
1    offering<br />
1    ntfs<br />
1    nrg<br />
1    notebooks<br />
1    not<br />
1    non<br />
1    next<br />
1    network<br />
1    n<br />
1    mod<br />
1    million<br />
1    might<br />
1    mdf<br />
1    mcgee<br />
1    mcdonalds<br />
1    marketplace<br />
1    manufacturers<br />
1    makes<br />
1    macbook<br />
1    mac<br />
1    looking<br />
1    links<br />
1    lifehacker<br />
1    life<br />
1    less<br />
1    just<br />
1    issue<br />
1    iso<br />
1    introducing<br />
1    internet<br />
1    interface<br />
1    instlux<br />
1    installer<br />
1    installation<br />
1    insane<br />
1    inaccurate<br />
1    impressed<br />
1    immediately<br />
1    images<br />
1    image<br />
1    if<br />
1    i<br />
1    hungry<br />
1    howto<br />
1    house<br />
1    hours<br />
1    hot<br />
1    holy<br />
1    hippo<br />
1    heron<br />
1    hell<br />
1    hardy<br />
1    happen<br />
1    guy<br />
1    gui<br />
1    growing<br />
1    great<br />
1    gnu<br />
1    gnome<br />
1    glass<br />
1    girl<br />
1    getting<br />
1    gets<br />
1    genuine<br />
1    fusion<br />
1    french<br />
1    forces<br />
1    followup<br />
1    fixed<br />
1    first<br />
1    firefox<br />
1    finally<br />
1    few<br />
1    father<br />
1    faster<br />
1    fantastic<br />
1    extended<br />
1    explains<br />
1    explained<br />
1    expensive<br />
1    expect<br />
1    existing<br />
1    excellent<br />
1    exactly<br />
1    everything<br />
1    everyone<br />
1    engine<br />
1    embargo<br />
1    eft<br />
1    easyubuntu<br />
1    easy<br />
1    easier<br />
1    dvddecrypter<br />
1    dvd<br />
1    dualview<br />
1    drops<br />
1    drivers<br />
1    download<br />
1    door<br />
1    doesn<br />
1    does<br />
1    do<br />
1    disturbing<br />
1    distributing<br />
1    dismissed<br />
1    diggers<br />
1    demo<br />
1    debian<br />
1    customs<br />
1    customization<br />
1    cst<br />
1    cs<br />
1    cracking<br />
1    could<br />
1    converts<br />
1    controls<br />
1    confirmed<br />
1    conf<br />
1    computers<br />
1    complete<br />
1    comparison<br />
1    community<br />
1    commercial<br />
1    coming<br />
1    com<br />
1    colors<br />
1    click<br />
1    cleartext<br />
1    cleaning<br />
1    circle<br />
1    choose<br />
1    card<br />
1    canonical<br />
1    building<br />
1    build<br />
1    bug<br />
1    booting<br />
1    black<br />
1    bittorrent<br />
1    billboard<br />
1    better<br />
1    been<br />
1    beautiful<br />
1    basics<br />
1    badger<br />
1    awesome<br />
1    award<br />
1    available<br />
1    at<br />
1    artwork<br />
1    arrives<br />
1    arrived<br />
1    april<br />
1    apps<br />
1    any<br />
1    an<br />
1    american<br />
1    amd<br />
1    amazing<br />
1    alumni<br />
1    after<br />
1    advantages<br />
1    administrator</p></blockquote>
<p>No surprises here, but it may be helpful when you go to write your next Digg headline. <img src='http://icanhaslinux.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Until next time</p>
<p>-LightningCrash</p>
]]></content:encoded>
			<wfw:commentRss>http://icanhaslinux.com/2007/09/08/diggcom-ubuntu-popular-headline-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
