How to download a large corpus of written Dutch
The Corpus Contemporary Dutch (Corpus Hedendaags Nederlands, CHN) is only accessible via its online search interface. Due to copyright, there is no download option available. However, for non-commercial use, there is a good alternative: the SoNaR corpus contains over 500 millions words and can be downloaded. The texts of this corpus have been annotated with (machine-generated) lemmas, named entities and POS-tags.
To download the SoNaR corpus, a separate account at the IvdNT is required.