Most of these frequency lists come from written sources that are known to employ a wide variety of vocabulary: newspapers and magazines. And the top of the list usually reflects spoken language word frequency accurately as well.
Except in Finnish. The problem with Finnish is that it comes in at least three variants:
First is the so-called "book language," that only politicians and news anchors actually speak--otherwise, you'll see it only in written form in newspapers, magazines and novels.
Then, there is the "standard spoken language," which is fairly close to the book language, except for the variation in personal pronouns and the way verbs are conjugated. Also, some slangy expressions might be included. Listen to Finnish teachers speak: this is probably the diction they will use in a classroom. It's not as stiff sounding as the book language, yet it still retains an air of authority and a subtle indication about the speaker's level of education.
On top of that we have all the regional dialects.
In English, people might pronounce the word "I" differently, but it will be typed like that, regardless of where you are from. That's why predictive text works really well in English. In Finnish, the "I" can look like this: minä, mie, mää, or mä. All depending on the register the speaker is using. And in personal, written communication between friends and family members, people tend to use their dialects or some spoken variant of the language.
The problem with predictive text in Finnish is that there is not a single dictionary that is able to include all of these variants in it. If the dictionary memory was large enough and they could include all words, it would simply create a mess: instead of now giving you multiple alternatives of different words, the dictionary would offer you five different dialect versions of the same word--just because they might have only one letter difference in them. And as a South Karelian dialect speaker, I really don't need to see Savo dialect options pop up as my alternatives.
In the past I would never use predictive text: the words from either standard spoken language or from my dialect were not recognized by it, and even worse, the dictionary would throw me words that were not even close to what I wanted. I'd input, say, "Hello!" and the output would be "Closet!"
Why would anyone write in dialect, by the way? The answer is in being economical. Some Finnish words are damned long, so people simply cut the endings off when they speak. Also, in a country where most text messaging and phone calls are handled with a pay-as-you-go plan, you can save money on texts by cutting out as many characters from your text message as possible to keep your story within the character count of one text message. Dialects do this already for economical speaking.
A lot has changed in the past ten years, and the predictive outputs are really good these days. The dictionaries include standard spoken language pronouns, and even dialect pronouns in them. I actually enjoy using the predictive text now, as it gets me better than ten years ago.
There's however one but. As soon as I start using the predictive texting method, I stop using my dialect. There are two reasons for this. First, the frequency dictionary will most likely give me a "book language" version of anything else except some pronouns. Second, it will give me that word in a split second.
So, now I need to weigh typing shorter words which both saves me money and time (because I don't have to type for so long) against inputting only three characters and immediately getting the word that I want--except that it's just not in my dialect but instead in the standard that everyone understands. And this is because it comes from the frequency list that has been lifted from written language.
To save even more time, I may simply begin to write in the book language without even attempting my dialect version, just because I know that the dictionary will definitely get the book language form.
Everyone will understand me, but it's not anymore I who is writing the message; it's some very uptight person who is talking like a robot! Yet, because it is so much faster to compose the message by using a good, predictive method, I will most likely opt to using the frequency list dictionary, and lose my voice. It's just less of a hassle that way.
Will there be a time when, just out of being too lazy to type out words and we'll just accept whatever the dictionary gives us, Finns will begin to speak a very proper version of Finnish?
I suppose that would be a day when learners of Finnish would rejoice: finally, the language taught in the textbooks matches what they hear on the streets.