Poll: Is “Data” Singular or Plural?

What's your take on this grammar conundrum?
We often have internal discussions (ahem) about whether "data" should be used as a singular or plural noun. For example, when writing or speaking, do you say, "The data is dirty," or "The data are dirty?"
I've seen both variations widely used, but I'm curious what our readers think. Share your take, please!
6 Responses »
Leave a Response







Entries(RSS)
Since "data" come from Latin, "datum" is the Latin singular form of the word "data".
According to the Collins dictionnary, the English word "data" is a plural noun, so "all the data are..." should be the correct form. ("datum" is not used that much anymore...).
Dont mix "data" with "information"
http://www.wsu.edu/~brians/errors/data.html
http://www.googlefight.com/index.php?lang=fr_FR&word1=%22all+data+is%22&word2=%22all+data+are%22
Oliver M. is technically correct, and if we were speaking Latin I would agree with the proscription. But we speak English where usage rules. The singular "datum" is so rare that its use could confuse some. The use of "data" as a singular is almost always a clipped form of "database" and not actually a use of "data" as singular. So I voted that "data" can be singular or plural.
Andi is thinking along the lines of our internal discussions. Yes, "datum" is technically the singular form. But how often do you hear that in everyday use? Perhaps the question should have better been phrased, "How do you USE data - as a singular or plural?" That is, as you personally write about data, how do you handle the verb immediately following the noun?
I treat the word 'data' in the same way as 'sheep' or 'fish'. I can talk about one data item or a data list so I believe the word can be used both ways. It just depends on the context.
Going with the thread here, I would tend to agree that datum as a singular and data as a plural is an archaic usage. Indeed, in my experience, those who make an issue of it tend to be folks whom are being pedantic for its own sake and not for any real issue of clarity. The way I almost exclusively see "data" being used is not in the singular or plural exactly but in the sense of being a mass noun, much like "sand". For most practical uses "data" ceases to be useful when it is only a single data point. This is where I suspect the pluralist crowd is coming from, that what we think of as data only takes on its form when there is more than one point in the set.
This is however consistent with the usage of a mass noun. A grain of sand is not "sand" in most peoples thinking. It is a speck, a fragment, maybe even a rock... You would not suggest that the grain of said is however -made- of sand. Similarly with data, most people would use the phrase "a data point" to refer to a single piece. The phrase "datum", while technically correct, is not typically used in this manner. Datum has come to more often to a different and (perhaps more importantly) independently useful meaning. Datum usually refers to a benchmark of some sort, the specific value that other data are measured against. If I say my house is 28 feet high virtually everyone reading that understands that I am not meaning it is that far above sea level (itself another important datum) but rather the top of the house is 28 feet above the ground that the house sits on. A piece of data is often itself only relevant to some other stated or implied datum (an assertion that some theoretical physicists enjoy taking to absurd lengths).
On the other side, to speak of "various data" refers not to individual data points but carries a strong implication of being multiple data sets. Again, this is similar to "sand". To speak of the sands found on a given beach implies that there are various layers or regions of types of sand. To say there is only one sand found on a particular beach is to say that the sand is of a homogeneous nature; it is understood that it does not mean that there is a single solitary sand grain on the beach. If one said that "the only data available suggests that", the implication that the data is from a single or at least limited source.
So, since classifying data as a mass noun is consistent with actual usage, is well understood, is of increasing rather than decreasing clarity (albeit via nuance), and that the technically correct traditional usage (ie datum) is not encountered and has a separate useful meaning I strongly promote that it should be considered "standard" modern usage of the term.
Data is a collective noun. We don't say "that stack of chairs are heavy", and we shouldn't say "the data from that application are corrupt." It's so annoying to hear someone say "data is" -- it's ignorance of English grammar.