Archive for the ‘digg’ Category

Digg hijacked?

Wednesday, September 19th, 2007

Today when checking the page, I noticed that a few of my Digg buttons pointed not to Digg links for my site, but for bizlead or some nonsense. It seems to be pretty common, as a number of commenters on the story echoed the same story.

Is this some sort of XSS bug that’s being exploited? I don’t know. It will be interesting to see how it pans out, though. Maybe there are some DB problems on the Digg end.

Either way, I’ve deactivated the Digg button on my posts until further notice. This isn’t a big deal, as the wonderful SU users make up about 90% of my traffic anyway. Digg, not so much.

I’ll post more about this later when I figure out what’s going on.

Edit: looks like the offending Digg submission is gone. I’ll consider turning the digg button back on later.

Until next time!

-LightningCrash

Analysis of 300 Digg top stories

Monday, September 10th, 2007

I wrote earlier about Digg Ubuntu headline analysis, but this time I decided to pull the top 20 pages of stories from the last year and run those through the counter. 300 stories later, here is the count of words within the headlines:

70 the
45 a
42 to
34 of
32 digg
25 you
25 s
22 in
22 and
21 pic
21 on
19 this
19 for
18 your
18 iphone
17 is
16 new
14 video
14 from
14 ever
14 apple
13 it
12 picture
12 i
12 google
12 best
12 amazing
11 t
10 how
10 d
9 with
9 website
9 firefox
8 why
8 what
8 vista
8 pics
8 not
8 free
8 at
7 like
7 kevin
7 have
7 buy
6 windows
6 should
6 one
6 most
6 james
6 gets
6 c
6 by
5 will
5 we
5 users
5 steve
5 right
5 photos
5 photo
5 my
5 make
5 mac
5 kim
5 jobs
5 ipod
5 if
5 dvd
5 drm
5 computer
5 as
5 an
5 all
4 worst
4 work
4 without
4 water
4 under
4 time
4 thing
4 that
4 system
4 so
4 shows
4 see
4 rose
4 riaa
4 pictures
4 photoshop
4 people
4 out
4 or
4 microsoft
4 launches
4 itunes
4 hacked
4 get
4 coolest
4 can
4 button
4 be
4 awesome
4 are
3 youtube
3 xp
3 world
3 woman
3 ve
3 up
3 unveils
3 tv
3 touch
3 store
3 something
3 sites
3 sign
3 show
3 seen
3 section
3 secret
3 save
3 re
3 pc
3 pay
3 page
3 over
3 other
3 old
3 no
3 net
3 needs
3 nbc
3 missing
3 me
3 list
3 linux
3 letter
3 laptop
3 key
3 internet
3 images
3 hd
3 got
3 good
3 geek
3 file
3 face
3 f
3 do
3 desktop
3 design
3 day
3 comment
3 comcast
3 colbert
3 cellphone
3 bill
3 b
3 anything
3 any
3 announces
3 almost
3 access
3 about
2 years
2 year
2 yahoo
2 would
2 worlds
2 wi
2 while
2 web
2 was
2 warning
2 wallpapers
2 w
2 use
2 ur
2 ultimate
2 two
2 tutorials
2 turn
2 tries
2 trick
2 traffic
2 totally
2 top
2 today
2 think
2 they
2 take
2 strangest
2 stop
2 stephen
2 stealing
2 steal
2 station
2 start
2 squad
2 space
2 some
2 site
2 shirt
2 search
2 screwed
2 screen
2 runs
2 revolt
2 results
2 responds
2 porn
2 plus
2 please
2 pirate
2 phone
2 perhaps
2 per
2 path
2 password
2 owned
2 own
2 open
2 online
2 officially
2 official
2 off
2 nsfw
2 now
2 nokia
2 nightmare
2 neighbors
2 need
2 myspace
2 music
2 mozilla
2 maps
2 makes
2 made
2 m
2 love
2 loses
2 look
2 logo
2 live
2 line
2 kill
2 kid
2 just
2 its
2 into
2 inside
2 idiot
2 high
2 hate
2 has
2 happens
2 had
2 hack
2 gmail
2 girl
2 gates
2 fun
2 found
2 first
2 fire
2 fiasco
2 fi
2 features
2 feature
2 every
2 effect
2 ebay
2 e
2 don
2 does
2 digging
2 desk
2 default
2 dear
2 cuts
2 customer
2 css
2 cracked
2 cover
2 could
2 cool
2 convert
2 connection
2 comic
2 com
2 color
2 cnet
2 clock
2 click
2 class
2 cheap
2 card
2 car
2 cake
2 but
2 business
2 building
2 build
2 browser
2 blue
2 blocked
2 billboard
2 been
2 back
2 around
2 anti
2 announced
2 animation
2 alex
2 again
2 ads
2 across

Some of the top items, when read in succession, are almost headlines in themselves!

Maybe next time I’ll pull a few thousand headlines. That sounds like a good project for tomorrow.

Edit: Trimmed off entries with only one result.

Digg.com Ubuntu popular headline analysis

Saturday, September 8th, 2007

I was curious what the most popular keywords were in the Ubuntu headlines, since it seemed like some of them seemed identical.
So I saved the top 10 pages of results for the search term Ubuntu, sorted by Most Diggs.
With all of the pages in a directory, I cut out the headlines and stripped the HTML with the following command:

$ cat *.html|grep news-body|sed -e 's/<[^<>]*>//g' > diggubuntuheadlines.txt

Now I have a list of each headline. Unfortunately, though, this also returns headlines from articles that just mention Ubuntu, so I killed the lines that didn’t have Ubuntu.

$ grep -i ubuntu diggubuntuheadlines.txt > diggubuntuheadlines2.txt

Now I want to pull out a list of unique words in the file, the number of occurences of each word, sorted by the most occurences descending. Thanks to this short perl script posted by planetscape, I have a solution.

I paste the contents into a file, change the first line to read /usr/bin/perl, save it, then chmod +x the file.

Next I pipe the contents of the file into the script, and save the output.

$ cat diggubuntuheadlines2.txt | ./countwords.pl > diggheadlinecount.txt

Well, I guess that’s enough foreplay, what’s the verdict?

117 ubuntu
25 to
22 linux
20 windows
19 a
14 in
14 dell
12 with
12 on
12 for
11 the
9 and
8 install
7 vista
7 of
7 how
6 your
6 you
6 from
5 released
5 pcs
5 out
5 new
5 is
5 guide
5 feisty
5 by
4 without
4 what
4 users
4 than
4 s
4 has
4 free
4 best
3 xp
3 video
3 ultimate
3 time
3 switching
3 should
3 running
3 run
3 over
3 os
3 official
3 mythtv
3 more
3 microsoft
3 media
3 logo
3 like
3 know
3 installing
3 get
3 fawn
3 fast
3 edition
3 edgy
3 dock
3 boot
3 based
3 as
3 anything
3 about
2 x
2 world
2 will
2 way
2 vs
2 vote
2 using
2 up
2 tutorial
2 top
2 this
2 there
2 t
2 support
2 studio
2 stickers
2 side
2 shuttleworth
2 review
2 read
2 powered
2 pic
2 pc
2 password
2 osx
2 online
2 one
2 officially
2 now
2 need
2 multimedia
2 mount
2 mce
2 mark
2 make
2 magazine
2 looks
2 look
2 laptop
2 it
2 installed
2 gifting
2 full
2 eye
2 ever
2 dual
2 distribution
2 desktop
2 days
2 core
2 completely
2 compiz
2 cheap
2 center
2 cd
2 candy
2 breezy
2 box
2 books
2 beryl
2 be
2 are
2 applications
2 almost
1 year
1 xps
1 xorg
1 xgl
1 write
1 writabable
1 wpics
1 would
1 working
1 wireless
1 winxp
1 wins
1 wine
1 why
1 whole
1 while
1 wga
1 wep
1 welcome
1 web
1 weapons
1 we
1 was
1 warranty
1 warcraft
1 want
1 wall
1 voted
1 vmware
1 virus
1 victorious
1 versus
1 validates
1 uses
1 user
1 useful
1 us
1 unmount
1 ui
1 ugly
1 tweaks
1 tweaking
1 tutorials
1 try
1 truth
1 triple
1 tricks
1 transparent
1 transform
1 today
1 tips
1 tier
1 thursday
1 thinks
1 things
1 their
1 ten
1 technical
1 tad
1 system
1 switches
1 switch
1 supported
1 super
1 sun
1 strip
1 story
1 still
1 sticker
1 steps
1 stable
1 squad
1 spread
1 spotted
1 spiffing
1 software
1 smoke
1 single
1 simple
1 shrink
1 shirt
1 shift
1 shell
1 server
1 searched
1 seamless
1 screwup
1 screenshots
1 screen
1 satanic
1 root
1 rom
1 rising
1 right
1 reviewit
1 repository
1 reported
1 release
1 redesign
1 really
1 readable
1 ran
1 ram
1 quietly
1 purchase
1 progress
1 products
1 preview
1 prettier
1 preinstalled
1 prebuilt
1 pre
1 posters
1 possibly
1 popularity
1 popular
1 pm
1 player
1 picture
1 physics
1 photoshop
1 performance
1 perfectly
1 partition
1 part
1 parliament
1 or
1 onto
1 office
1 offers
1 offering
1 ntfs
1 nrg
1 notebooks
1 not
1 non
1 next
1 network
1 n
1 mod
1 million
1 might
1 mdf
1 mcgee
1 mcdonalds
1 marketplace
1 manufacturers
1 makes
1 macbook
1 mac
1 looking
1 links
1 lifehacker
1 life
1 less
1 just
1 issue
1 iso
1 introducing
1 internet
1 interface
1 instlux
1 installer
1 installation
1 insane
1 inaccurate
1 impressed
1 immediately
1 images
1 image
1 if
1 i
1 hungry
1 howto
1 house
1 hours
1 hot
1 holy
1 hippo
1 heron
1 hell
1 hardy
1 happen
1 guy
1 gui
1 growing
1 great
1 gnu
1 gnome
1 glass
1 girl
1 getting
1 gets
1 genuine
1 fusion
1 french
1 forces
1 followup
1 fixed
1 first
1 firefox
1 finally
1 few
1 father
1 faster
1 fantastic
1 extended
1 explains
1 explained
1 expensive
1 expect
1 existing
1 excellent
1 exactly
1 everything
1 everyone
1 engine
1 embargo
1 eft
1 easyubuntu
1 easy
1 easier
1 dvddecrypter
1 dvd
1 dualview
1 drops
1 drivers
1 download
1 door
1 doesn
1 does
1 do
1 disturbing
1 distributing
1 dismissed
1 diggers
1 demo
1 debian
1 customs
1 customization
1 cst
1 cs
1 cracking
1 could
1 converts
1 controls
1 confirmed
1 conf
1 computers
1 complete
1 comparison
1 community
1 commercial
1 coming
1 com
1 colors
1 click
1 cleartext
1 cleaning
1 circle
1 choose
1 card
1 canonical
1 building
1 build
1 bug
1 booting
1 black
1 bittorrent
1 billboard
1 better
1 been
1 beautiful
1 basics
1 badger
1 awesome
1 award
1 available
1 at
1 artwork
1 arrives
1 arrived
1 april
1 apps
1 any
1 an
1 american
1 amd
1 amazing
1 alumni
1 after
1 advantages
1 administrator

No surprises here, but it may be helpful when you go to write your next Digg headline. :)

Until next time

-LightningCrash