Collections of text and corpora
- BNC (British National Corpus)
A 100 million word corpus, syntactically tagged!
- LDC (Linguistic Data Consortium)
The LDC supports language-related research by creating and sharing data, tools and standards.
- OTA (Oxford Text Archive)
This is one of the well known corpora collecting points.
- The ECI Multilingual Corpus
The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a
large multilingual corpus, and supports existing and projected national and international efforts to
carefully design, collect and publish large-scale multilingual written and spoken corpora.
- Project Gutenberg
This is the official site of the project, with many links to FTP sites.