Everyone should visit the site -- an ongoing, online massive research project undertaken by "an American and Brazilian research team."

Also, you should read the blog -- which reports their fascinating results and takes a " big data" approach to

  • vocabulary acquisition,

  • vocabulary size (of native speakers and foreign learners of English),

  • vocabulary growth over life-span, and

  • the relation between reading and vocabulary size.

VESSEL 5000 (Faser)

Vocabulary ESL 5000-word (Free Shared Resource)

A terrible web video where I explain the project
... I'll improve it later.

Right now, ESL teaching is very much limping because of the lack of a really good free vocabulary source.

This source would include the following:
  1. large -- the first 5,000 or so words (that Jame Milton suggests is needed for about 90% of printed text)
  2. corpus-based -- mine is based on Brigham Young's Corpus of Contemporary American English -- you can download the list for free here ...
  3. in public domain -- as long as it's non-profit, BYU is okay with use of their COCA corpus -- and everything else will be Creative Commons (I'll work that out with them)
  4. bilingual AND monolingual (I'll be using a workaround on the to do this ...

Basically, my plan is this:

  • take the first 5,000 words,

  • write an example sentence (or "paradigm sentence") for each,

  • come up with simple line-drawing illustrations for each paradigm sentence

  • make it available to everyone

Here's my prototype -- an example of three using my HORRIBLE illustrations. (I'm using the three-sided flashcards on to do both bilingual & monolingual in the same deck.)
  • NOTE: on, type an "h" to see the hint (which is the English-only prompt for the word)
  • NOTE: on, type an "a" to hear the card read

Here's a terrible web video of me demonstrating and talking through the deck.

In addition, the cards would be distributed in WORD and PDF files -- as well as the Excel file with all the paradigm sentences.
Corpus of Contemporary American (COCA) text analyzer
My very early intro page to Brigham Young's Corpus of Contemporary American (COCA) analyzer.

Words, Words, Everywhere! (Michael Graves)
I really like this. It's a 2009 PDF for a presentation where University of Minnesota prof Michael Graves introduces the key concepts in vocabulary instruction. This is how he outlines the presentation:

The Landscape of American English
  • The Number of Words in the Lexicon
  • The Number of Words Students Know
  • The Frequency Distribution of the Lexicon
Tools and Approaches to Selecting Words
  • Word Lists
  • Recommendations for Identifying and Teaching Words
  • Testing Students' Word Knowledge

As far as how large a vocabulary is needed to be ready for college, he reports three different estimates. Nagy says 40,000. Graves thinks it's closer to 50,000.

The First 4,000 Words Website
Seward, Incorporated has created a for-pay interactive website to support students in learning the items in Michael Grave's "First 4,000 Words." MPS schools are going to pilot this resource in four classrooms. I suspect that this could be a powerful missing element in our ELD programming.

4,000 Words as an MS Word document

The 4,000 Words as a PDF document
Michael Graves's 4,000-word list. Note: This list is freely available on the Internet ... so I don't think I'm ripping off his intellectual property. Oddly enough, the field of Vocabulary study has an interesting "creative commons" feel about it ... with scholars tending to publish resources out more openly than in other fields of English Language scholarship. (But I might be wrong; I'll look a little closer when I have time.)
Michael Graves's IRA podcast on "Vocabulary Instruction"

This is an MP3, a sound .

The MAP Test DesCartes RIT-Level Vocabulary resource

This is interesting. You guys might know about it. It's from the NWEA website. (They're the folks who developed and market the MAP tests. It identifies the content-related words that MAP would use at each RIT level (scoring level) of the test.

As you guys know, I certainly think that a year's progress on the MAP Reading is a reasonable goal for each student -- and a reasonable expectation that the students and their parents should have of the school and teachers. This makes it blindingly obvious that vocabulary growth has to be one of the engines of this reading growth.
The full-on Paul Nation Test of Vocabulary Size (I think). 140-item multiple choice test to determine your vocabulary size. This is probably the most reliable. One interesting feature -- it allows you to select among some major languages (Chinese, Spanish, etc.) for the stems. So you're presented with the word in English, but some test-takers will have the choice of seeing the multiple choice options in their home language.

A very no-frills site with short tests at varying levels (1000-words, 2000-words, 3000-words and up). It looks like the tests are based on Paul Nation's Vocabulary Size test -- but the tests are much shorter. The same tests are also here.

Paul Nation's page on David Paul's "Language Teaching Professionals" website.
This is an incredibly rich resource -- including a downloadable zip file of various vocabulary measurement and teaching resources that Paul Nation has either developed or endorses.

A series of 50-item quizzes designed to test vocabulary size over the "Oxford 3000." (This is from a Chinese website where they do those heroic/obsessive feats of vocabulary memorization.)

John's score -- 57,954 words! Uh huh. That's one of them.

VCheck at
(John's score -- read the small print and weep ... top 1% of all test takers ... note: this 13,286 just means "of the TOEFL/TOEIC words they test ... it's not a measure of my vocab size.)
This is just for fun -- it's interesting -- a five-minute diagnostic for college-level vocabulary .. but it mixes self-assessment ("Do you know this word?") with fake words ... and stops the test if you claim to know too many made-up words.


"Vocabulary Size, Text Coverage, and Word Lists" (Nation & Waring)
A short, possibly older article -- but a strong introduction to the whole vocabulary issue. The authors address three key questions:
  • How many words are needed to do the things an [English Learner] needs to do?
  • How much vocabulary and how should it be learned?
  • What vocabulary does a learner need?

This is a good place to start.
Myths about Teaching and Learning Second Language Vocabulary: What Does the Research Say? (Folse, 2004)

Read it here
This article actually kicked off my recent (and very salubrious) vocabulary obsession. Folse uses empirical results to critique what he calls the four harmful myths of L2 vocabulary learning:

These eight myths are:

(1) Vocabulary is not as important in learning a foreign language as grammar or other areas.

(2). It is not good to use lists of words when learning vocabulary.

(3) Vocabulary should be presented in semantic sets.

(4) The use of translations is a poor way to learn new vocabulary.

(5) Guessing words from context is as productive for foreign language learners as it is for first language learners.

(6) The best vocabulary learners make use of one or two really good specific vocabulary learning styles.

(7) Foreign language learners should use a monolinĀ­gual dictionary.

(8) Vocabulary is covered enough in our curricula and courses.

I love the Folse research. But check out this handout from Folse's U of Central Florida SLA class ... or rather read this ...which is my revision of it ... which is maybe a little easier to follow (No disrespect to Brother Folse ... his handout was made to accompany a presentation ... not stand-alone.)

Penny Ur (2012) on Vocabulary Teaching: Some Insights from the Research" -- which largely echoes what Folse says up above and agrees with everything I've been saying. Penny Ur ... there's a name from the past ... PDF version of the article
Gary Wolf's article from Wired Magazine on Piotr Wozniak's SuperMemo software, an algorithm-based learning system.
Piotr Wozniak is working with the cognitive principle that the sweet spot to review information is right before it fades -- when it requires an effort to pull it into memory. This describes his software system (earlier versions available as freeware) that uses user-rating and algorithms to present information right at that key moment. All over the world, lots of folks are using Wozniak (and other algorithm based memory software like Anki for accelerated language learning.)

42-language simultaneous translator (

The nicetranslator webpage will translate words or phrases simultaneously into 42 languages (and it might soon be more).

An example of the output (for the word "healthy")

James Milton (2009) from //Measuring// //Second-Language Acquisition// on vocabulary's effect on gist AND using bilingual word lists ...

Cut and paste or type a text in and this cool site will process it and give you a helpful vocabulary list with many useful bells and whistles!