Size of each domain
at 10065 be 69011 cy 1972 cz 324496 de 444794 dk 2144 ee 16768 es 35168 eu 374484 fi 661559 fr 156450 gr 303 hu 330822 ie 12754 it 89836 lt 10765 lu 8521 lv 317404 mt 13991 nl 149949 pl 66885 pt 147445 ru 104659 se 102457 si 12434 sk 58020 uk 66345
Language Distribution of EuroGOV collection given by [[textcat]] (provided by UVA):
5 bulgarian-iso8859_5
197881 czech-iso8859_2
31379 danish
139902 dutch
310236 english
15854 estonian
514887 finnish
195725 french
468783 german
6812 greek-iso8859-7
324058 hungarian
126 icelandic
71848 irish
79157 italian
240316 latvian
10575 lithuanian
123 norwegian
59318 polish
155388 portuguese
10852 romanian
92 russian-iso8859_5
7726 russian-koi8_r
15007 russian-windows1251
296 scots
87 scots_gaelic
98476 slovak-ascii
76054 slovak-windows1250
47491 spanish
187562 swedish
323152 unknown
333 welsh
INTERNAL GLASGOW UNI ONLY:
Sorting anchor texts:
gzip -dc data-anchor.gz | cut -d ' ' -f 2- | sort -t- -k 1,1 -k 2n,2n -k 3n,3n > data-anchors-sorted && gzip data-anchors-sorted