Typology of Number Systems:
the database

As part of the 'Language and Number' project (a branch of NWO Horizon project 'Knowledge and Culture'), we are now building a typological database that describes cross-linguistic variation in the domain of number-related constructions — their meaning, morphology and syntax. Most prominently, we want to find out which points / intervals on the number line languages treat as grammatically special, and why.

Studies on number cognition show that the core knowledge system of number consists of two subsystems — the Approximative Number System (ANS) and the Object Tracking System (OTS). The former works with large sets, while the latter is restricted to sets consisting of up to four elements. Although the relation between these two systems and natural language is crucial for many theories of number, the question of whether natural language grammar systematically reflects this distinction is still open. If the answer is positive, we expect to find typological data revealing reliable tendencies in how languages treat different numerousities — in particular, a strong split between numbers below 4 ('OTS domain') and above 4 ('ANS domain').

Although some information on number constructions in different languages appears in various existing databases (most prominently, World Atlas of Language Structures), the information on the relation between morphosyntax and the number line has not been collected yet for the vast majority of languages.

To fill in this gap, we are building the "Typology of Number Systems" database. The software we are using is designed to be compatible with the Typological Database System. We apply traditional methods of typological data collection (extracting information from existing descriptions, questionnaires, etc.), but we are also working on establishing a pipeline that would make the data collection process (semi-) automatic.


Basic facts
29

genetically and geographically different languages have been described
432

constructions in the database
75%

of numeral constructions are number-sensitive, i.e. have restrictions on the quantities they can describe
95%

of cardinal constructions show quantity restrictions
35%

of multiplicative constructions ('n times') show quantity restrictions
40%

of all constructions involve numerals with no (clear) derivational source
55%

of all constructions involve numerals derived from cardinals
3%

of all constructions involve numerals derived from multiplicatives
The patterns

of number-sensitivity vary across different types of numerals — some resemble the ANS/OTS pattern, others don't

The framework
As a unit of description, we use a refined notion of 'construction' as a unique combination of the values of morphosyntactic parameters. In Russian, for example, similar meanings have to be expressed quite differently depending on the quantity of items involved:
'One girl left'
Odna
1.CARD-FEM.SG.NOM
devochka
girl-SG.NOM
ushla
leave-PST.SG.FEM

'Six girls left'
Shest'
6.CARD
devochek
girl-PL.GEN
ushli
left-PST.PL
Quantity '1' requires the numeral to agree with noun in gender in case, the noun to be in singular, and the verb to agree in singular as well. In the case of 6, the numeral doesn't agree with the noun, and both the noun and the verb are in plural. The nouns are in different cases. Under our notion of construction, these are two different constructions, and each of them is defined for different points or intervals on the number line.

This set-up is meant to allow for easy to formulate generalisations about grammatical saliency of different regions of the number line. Apart from that, the set-up has a welcome side-effect of being a potential source of all sorts of other typological generalisations about number-related constructions. Some potential research questions the database can help with are the following:

  • What are universals, tendencies and rarities in the sources of morphological derivation for different groups of numerals?
  • What are usual, rare, impossible combinations of uses of multifunctional morphology in numeric domain?
  • How do languages treat relatively recent notions such as zero, fractions, and so on? Are there reliable tendencies?
  • What kinds of numeric meanings are always, often, sometimes, never grammatically encoded in natural language? Are there any implicative universals to observe (i.e. if a language has multiplicative numerals, it has ordinals)?
  • And many more

Some results
Annebeth Buis & Lisa Bylinina

Building an automatic pipeline for typological research: a case study

To be presented at the Dialogue conference.

RSUH, Moscow, 1—4 June 2016

Lisa Bylinina & Ellen van Drie


Typological rarities in the domain of numeral derivation

Talk given at the 11th International Symposium of Cognition, Logic and Communication. Number: Cognitive, Semantic and Crosslinguistic Approaches.

University of Latvia, Riga, 10—11 December 2015

Ellen van Drie (2015)


The Morphological Derivation of Numerals

BA thesis. Utrecht University
Lisa Bylinina, Natalia Ivlieva, Alexander Podobryaev and Yasutada Sudo (2015)
A Non-Superlative Semantics for Ordinals
and the Syntax of Comparison Classes [draft version]


Manuscript
Lisa Bylinina, Natalia Ivlieva, Alexander Podobryaev and Yasutada Sudo (2015)
An In Situ Semantics for Ordinals. [draft version]

In: Thuy Bui, Deniz Özyıldız (eds.), NELS 45: Proceedings of the Forty-Fifth Annual Meeting of the North East Linguistic Society: Volume 1, pp. 135-145.


The group
Prof. Dr. Sjef Barbiers
Meertens Instituut / Utrecht University
sjef.Barbiers@meertens.knaw.nlWebsite

Sjef Barbiers is one of the main applicants of the NWO Horizon project Knowledge and Culture. He the supervisor of the two Language and Number projects which are part of the Knowledge and Culture project. In addition to the postdoc project described here there is a PhD project on the acquisition of cardinals and ordinals in Dutch and English carried out by Caitlin Meyer (UvA; supervisors Weerman and Barbiers). The research questions in the two projects are partly based on Barbiers, S. (2007). Indefinite numerals ONE and MANY and the cause of ordinal suppletion. Lingua, 117 (5), 859-880.
Dr. Lisa Bylinina
Meertens Instituut

Worked on the architecture and theoretical design of the database, closely supervising the team of research assistants
Sofia Popova
University of Amsterdam

Description to be added

Former research assistants
Robyn Berghoff
robynberghoff@gmail.com
Dates of internship: 9/11/2015 — 29/01/2016

'Second-year Research Master student at Utrecht University. Have so far worked on Turkish, Navajo and Hausa.'
Ginger Haasbroek
gingerhaasbroek@gmail.com
Dates of internship: 02/11/2015 — 31/01/2016

'A BA in Philosophy (minor in Linguistics), still debating which MA to do. Currently working on Hawaiian.'
Ruby Sleeman (LinkedIn)
rubysleeman@gmail.com
Dates of internship: 1/09/2015—1/12/2015

'I am a research master student at Leiden University (expected graduation year: 2016). In the database project, my main focus was on making sure that all the data in the different sections of the database are properly aligned and representative, by using grammars and collaborating with some experts external to the project. I worked on several languages, including Dutch (my native tongue), Hausa, and Basque.'
Ilaria E. Colombo
ilariaelena.nabi@gmail.com
Dates of internship: 1/09/2015—1/12/2015

'Research MA Linguistics, Universiteit van Amsterdam (completed in September 2015). My major focus in the project has been on unifying and systematizing the data; I have been working on several languages, including Thai, Turkish, and Hungarian.'
Daniel Foster
daniel.foster@student.uva.nl
Dates of internship: 6/05/2015—1/08/2015

'Currently I am writing my master's thesis on variation and change in address terms in Colombian Spanish using data from Twitter here at the UvA. I am also in the process of applying to PhD programs in the US. On the project I assisted with adding Spanish, Welsh, Irish, Korean, Panjabi, and Magahi and I wrote the glossing conventions document.'
Ellen van Drie
e.g.a.vandrie@gmail.com
Dates of internship: 20/04/2015—24/07/2015

'During my internship at the Meertens Institute I studied the morphological derivations of numerals in the languages Dutch, English, Hungarian, Bulgarian and Adyghe - I wrote my BA thesis on this topic. Currently I am doing a Master's degree in Linguistics at the Radboud University in Nijmegen.'
Annebeth Buis
annebethbuis@gmail.com
Dates of internship: 31/08/2015—22/11/2015

Worked on the pipeline for automatic data collection.
Sophie Ruthven
sophieruthven0@gmail.com
Dates of internship: 20/07/2015—25/09/2015
Imke Kruitwagen
imkekruitwagen@gmail.com
Dates of internship: 09/11/2015 — 29/01/2016

'Linguistics student at Utrecht University and currently working on Greek and Hindi.'
Kees van Doorn
kees.vandoorn@gmail.com
Dates of internship: 20/07/2015 — 31/08/2015
Made on
Tilda