EuroGOV

Language Distribution of EuroGOV collection given by [[textcat]] (provided by UVA):

      5 bulgarian-iso8859_5
 197881 czech-iso8859_2
  31379 danish
 139902 dutch
 310236 english
  15854 estonian
 514887 finnish
 195725 french
 468783 german
   6812 greek-iso8859-7
 324058 hungarian
    126 icelandic
  71848 irish
  79157 italian
 240316 latvian
  10575 lithuanian
    123 norwegian
  59318 polish
 155388 portuguese
  10852 romanian
     92 russian-iso8859_5
   7726 russian-koi8_r
  15007 russian-windows1251
    296 scots
     87 scots_gaelic
  98476 slovak-ascii
  76054 slovak-windows1250
  47491 spanish
 187562 swedish
 323152 unknown
    333 welsh

INTERNAL GLASGOW UNI ONLY:

Sorting anchor texts:

gzip -dc data-anchor.gz | cut -d ' ' -f 2- | sort -t- -k 1,1 -k 2n,2n -k 3n,3n > data-anchors-sorted && gzip data-anchors-sorted