-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bowern-2021-752 #1047
Bowern-2021-752 #1047
Conversation
cool that we have this list! I'll volunteer to review, and I suggest we should have three reviewers at least. If @chrzyki can moderate, this would be cool ;) |
Thank you @CarolinHu, very nice to have this! I'll happily moderate and it is very good to have @LinguList has a reviewer on board for this. I'll also have a look at the list in more detail. @AnnikaTjuka, do you have time to do a review as well? Thank you! |
@CarolinHu, @chrzyki, here's the output of
|
What this makes me think: the list is obviously a collection of different datasets, and for this reason, some glosses seem to be fine-grained translations, but they reflect rather the choice in a given dataset (alive vs. live/be alive, alive occurring only in some languages in the North of Australia: https://huntergatherer.la.utexas.edu/lexical/feature/597). So I suggest to do this a bit differently, also easier for the reviewers, and split the list into its three original lists (which is easy, as you have indicated it anyway). Then we review each list in separation. AND: we will have fewer mergers, but we will allow for mergers as ash/ashes, since we cannot assume these mean different concepts. |
BW: left/left hand and right/right hand are common glosses for "Left" and "right" in many questionnaires. |
BTW: we should (I include myself, as I keep forgetting it!) more often use the |
Last argument for splitting the lists: the basic vocab will be rather nicely linked, but the rest won't, so we can also later decide if it is not too special for now. |
Ah, yes, thanks for pointing that out. Taking care of the mergers with splitting the lists is a good thing - I didn't consider the mergers when I originally suggested doing this as one list. Apologies, @CarolinHu. |
I added a concepticon check with my initial commit (at least for the double mappings), I just like to hide the output behind a collapsible markdown so it takes up less space when scrolling down to the other comments :) |
Didn't see that, sorry, but it shows you are thinking much beyond the
way I thought so far, very good!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished reviewing list A and C. I'll do B tomorrow.
Bowern-2021-207a-66 66 far 409 quality Basic Vocabulary 1406 FAR | ||
Bowern-2021-207a-67 67 fat/grease 410 body Basic Vocabulary 323 FAT (ORGANIC SUBSTANCE) | ||
Bowern-2021-207a-68 68 father 411 kinship Basic Vocabulary 1217 FATHER | ||
Bowern-2021-207a-69 69 fear 412 mental Basic Vocabulary 781 FEAR (FRIGHT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other Bowern lists this is mapped to FEAR (BE AFRAID).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Spanish gloss indicated only a noun (similar comments will appear a number of times now, so this is maybe a general question of what to make of Spanish/Portuguese glosses if available?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, this is a bit tricky. I guess for Australian and North American native languages the English glosses were used, and for the South American the Spanish glosses. I would therefore follow the English glosses to keep it consistent with other Bowern lists. But @chrzyki or @LinguList What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @AnnikaTjuka here, keeping it consistent is more important I'd say.
Bowern-2021-207a-73 73 fish 415 fauna Basic Vocabulary 227 FISH | ||
Bowern-2021-207a-74 74 flow 416 motion Basic Vocabulary 2003 FLOW | ||
Bowern-2021-207a-75 75 flower 278 flora-fauna Basic Vocabulary 239 FLOWER | ||
Bowern-2021-207a-76 76 flower 417 environment Basic Vocabulary 239 FLOWER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suspected that one of them is referring to the noun the other to the verb, but I'm not sure which one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments on list B. Thanks for adding those lists, @CarolinHu!
Thank you all very much for reviewing! I resolved all unproblematic conversations, so now there is only the critical cases left to decide - especially the issue of dealing with differences in the extent of the English and Spanish/Portuguese glosses. |
Yes, let us do it pragmatically with these lists now: we say: we follow the English glosses in case of doubt. Examples might be mentioned in the NOTE. In this way, one could see to which degree this could feed back to the original database! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of the comments!
concepticondata/conceptlists.tsv
Outdated
@@ -359,3 +359,6 @@ Anonby-2018-1500 Anonby, Erik and Asadi, Ashraf and Nikravan, Pegah 2018 1500 a | |||
Duong-2020-100 Duong, Thu Hang and Nguyen, Thu Quynh and Nguyen, Van Loi 2020 100 basic, questionnaire English La Chí languages in Vietnam https://evols.library.manoa.hawaii.edu/bitstream/10524/52466/1/JSEALS_Special_Publication_6_Anthropologyoflanguage.pdf Duong2020 The current list was used to investigate some basic characteristics of the La Chí language spoken in Bản Díu, comparing it with La Chí varieties in Bản Máy and Bản Phùng The concepts are claimed to be generally based on the 100-item list of Swadesh (presumably, the [one that dates back to 1955](:ref:Swadesh-1955-100)). The data for the analysis come from the materials collected by Nguyễn Văn Lợi, Nguyễn Thu Quỳnh and Dương Thu Hằng during their field trips in the respective regions. 136-138 | |||
Wittmann-1973-200 Wittmann, Henri Gontran 1973 200 basic, areal French Mauritian, Haitian Wittmann1973 This list, based on [Swadesh’s 200 item list](:ref:Swadesh-1952-200), was used for lexicostatistical analysis of the relatedness between Caribbean Creoles (represented by Haitian in the list) and Mascarene Creoles (represented by Mauritian). The chosen words of both Creole languages were compared among each other and with French with regards to their homosemantic similarity, yielding a greater relatedness of each Creole to French than to each other. Our mappings generally follow the English concepts in Swadesh; however, the French translations may diverge from them in some cases. 94-98 | |||
Gerardi-2021-244 Ferraz Gerardi, Fabrício and Reichert, Stanislav 2021 244 basic, areal English Tupi-Guarani languages https://doi.org/10.1075/dia.18032.fer Gerardi2021b This 244-item list was the basis of a pilot study on the classification of 38 Tupían languages. It was later expanded to [414 concepts](:ref:Gerardi-2021-414) as part of the TULED database. 30-35 | |||
Bowern-2021-207a Bowern, Claire and Epps, Patience and Hill, Jane and Hunley, Keith 2021 a 207 basic, areal English Languages spoken by hunter-gatherers in Australia, North America and South America https://huntergatherer.la.utexas.edu/lexical Bowern2021 This list is one of three representing the current state of the database “Languages of hunter-gatherers and their neighbors” which includes lexical as well as grammatical data for phylogenetic research. The lexical items were assigned to three main categories (Basic Vocabulary, Flora Fauna Vocabulary, and Culture Vocabulary) and several subcategories; this list contains the items assigned to “Basic Vocabulary”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short note: the a, b, c is only needed and used if concept lists are the same length, but since three lists are all different, a, b, c can be discarded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out I never submitted this review (?) But I hope it is easy to quickly change the identifiers to go without a, b, c here.
I changed two more mappings as discussed in the comments. (Side note: Thanks for the note on the list suffixes, I wasn't quite sure about that. My reference for adding them were the Carling 2017 lists who all have different numbers of concepts and list suffixes as well, so in the future this might need an edit as well?) |
oh, that passed my attention. Well, in fact it is NOT important, it is
just an identifier. It is maybe even practical, as it indicates that
these belong together, so let us maybe just leave it as is.
|
I have thought this over again: we have no indication of our practice in our tutorial. So we should afterwards make an issue and mention that for sublists of a larger concept list, a disambiguation with a, b, and c is recommended, while a disambiguation is required when two lists end up having the same identifier. This means the current practice in this issue is correct, as it is optional. |
It is also probably easier, imagine three sublists: Author-Year-100 vs. Author-Year-200 vs. Author-Year-100, should we name them Author-year-100a vs. Author-Year-200 and Author-Year-100b, or Author-Year-100a, Author-Year-200b, Author-Year-100c. |
BTW: do we have given all lists a list-alias in conceptlists.tsv? I'd propose to give the name of the database, or "Hunter-Gatherer Database", as all call the database like this. |
Lists now have an alias (in the right position, I hope, otherwise I will have to make an additional PR. I don't trust the web interface too much :) Is this ready for merging now? |
Yes, please merge, and many thanks for the great work (and thanks also for your patience with my requests). |
No harm done, thanks for the thorough review! |
Pull request checklist
Additional information
https://huntergatherer.la.utexas.edu/lexical
The database from issue #42. @chrzyki would you have time to moderate/take a look?
When in doubt about a word class, I checked whether a Spanish gloss was available. The Flora Fauna & Culture lists did not allow for many mappings as the concepts there were very specific. Also, a number of concepts occurred multiple times, either in different lists or in the same list, but with different semantic subcategories; if there was no specification or disambiguation, I mapped all of them, resulting in some double mappings:
concepticon check
1100 44 Bowern-2021-752-43 43 cook
1100 595 Bowern-2021-752-594 594 cook food
1208 284 Bowern-2021-752-283 283 cat
1208 584 Bowern-2021-752-583 583 cat
1318 290 Bowern-2021-752-289 289 chicken
1318 585 Bowern-2021-752-584 584 chicken
137 295 Bowern-2021-752-294 294 coca
137 593 Bowern-2021-752-592 592 coca
137 594 Bowern-2021-752-593 593 coca
1392 113 Bowern-2021-752-112 112 louse
1392 114 Bowern-2021-752-113 113 louse
1392 391 Bowern-2021-752-390 390 louse
1920 55 Bowern-2021-752-54 54 dream
1920 610 Bowern-2021-752-609 609 dream
1920 611 Bowern-2021-752-610 610 dream
1920 612 Bowern-2021-752-611 611 dreaming
227 74 Bowern-2021-752-73 73 fish
227 330 Bowern-2021-752-329 329 fish (generic)
239 76 Bowern-2021-752-75 75 flower
239 77 Bowern-2021-752-76 76 flower
239 335 Bowern-2021-752-334 334 flower
348 332 Bowern-2021-752-331 331 fish poison, barbasco
348 633 Bowern-2021-752-632 632 fish poison
379 31 Bowern-2021-752-30 30 blunt
379 58 Bowern-2021-752-57 57 dull/blunt
646 16 Bowern-2021-752-15 15 ash
646 18 Bowern-2021-752-17 17 ashes
832 236 Bowern-2021-752-235 235 beans
832 569 Bowern-2021-752-568 568 beans