-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Snodgrass Concept List #1393
Conversation
Hi @LinguList, |
@alzkuc, very nice start :-) We should now review the concept list together. My suggestion is that @katjabocklage has a look, since she was doing some mapping. I would also ask Riccardo to join in and will add him here later on. Regarding the original data by Snodgrass, I wonder if you have by any chance also added additional information from the paper? They share ratings and other information, have they also been digitized, or did you restrict this to the concepts? |
As you see, @alzkuc, the code check shows there is no problem with the procedure. This is already very fine. |
@AnnikaTjuka, the list is also interesting for NoRaRe (on the longer run). Do you want to look into this list and the mappings as well ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alzkuc, I would already have a small request. If you check at the DIFF view where you can compare what has been modified, you will see that conceptlists.tsv has been modified in many points where no new data were added, because you add "
markers to the right and left of cells (probably automatically, through your editor). If you could revert these cases, we'd only see what you added, this would be very useful.
I only restricted it to concepts now, but I will happily add other information. There are multiple options - AoA, name agreement, familiarity. Which ones should I add? |
That must have happened automatically. I will revert those cases :) |
|
For the sake of comparing later different values, and also for the sake of learning how to display values in CSVW (that is, the JSON format), it would be useful to add the major variables. But I just checked the paper and saw that there are very rich variables in the concept list, and it would beautiful to reflect at least a few of them, also for the sake of training. Thus, page 178 till 181 the author discusses certain ranks from some Montague. Here, I would recommend to see if one can trace the original article, and otherwise also add the ranks plus the semantic field that they assign in one column each to the current concept list. We can discuss some examples of the format before doing actually so, but maybe it is also clear, what I have in mind? |
E.g. (BMCN = Battig Montague Norms)
|
In the context of NoRaRe, one would later add more semantics to the categories here. |
They use norms for frequency (Kučera-Francis) and age of acquisition (Carroll-White). Page 205, ff, in these cases, I'd ask you to quickly check, @alzkuc, if these are taken from other sources (I remember having heard of Kučera-Francis) and therefore might also be already available in other contexts (e.g. NoRaRe). For the norms on Name agreement, Image agreement, Familiarity and Complexity, my suggestion would be to check if these are as thus measured by Snodgrass, so they should be included then, and also quickly described (later, in NoRaRe, where we add the information on the data in a second step, we will look at them more closely). |
All in all, this turns out to be very interesting, so my gut feeling is that these image naming norms have never been discussed from this semantic perspective and in this detail. With this in mind: if we find concepts not reflected in Concepticon, and they are used in other object naming studies, we may want to add them to concepticon in an additional step. |
that sounds good! yes, it is clear. I agree. :-) @LinguList |
I will have a look at my distant relative's work haha ;-) |
:-)
|
UPDATE @LinguList: Name agreement, Familiarity, Image Agreement and Complexity were all measured by Snodgrass&Vanderwart in that study. I think we should therefore add all of them to the list. Not all of the concepts are linked to BMCN, only 189 out of 260. The Kučera-Francis frequency norms (highly cited, but currently somewhat outdated) are available for 240 out of 260 of the words, and AoA (Carroll-White) is available for 89 out of 260. Should we include those? |
That is very cool info, and important! |
I would say we do not link them. Because we may have more recent information from other sources. But it shows that psychologists always have the same dilemma: they cannot link up everything, since every study uses another word, and one cannot provide data for all of them. |
For BMCN, I would like to know if we can get the data in general. In consider these norms as quite important, since they add another layer of "semantic domain" (as Thanasis was mentioning in his talk) to the concepts. And it is always better to have these things defined by somebody rather than defining things oneself (and being criticized for it). |
So adding the norms seems useful. I'd suggest to maybe check if this can be run through tanscribus to avoid typing. |
yes, that seems to be the case! |
okay, I'll take a look at that today and will keep you updated :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two points (see detailed comments). In general, the list proves its importance, since we find naming differences already now. One can even simply check who people "call" Snodgrass images TODAY to find language change in action over the past 30 years, I guess...
With the requested changes, we only need to check how to integrate numbers in CSVW (you can label them as "string" in the metadata first, I can then provide suggestions on how to make sure they are treated as integers, and then we can merge, in my opinion.
Snodgrass-1980-260-101 101 31 PAN Frying pan | ||
Snodgrass-1980-260-102 102 Garbage can | ||
Snodgrass-1980-260-103 103 3089 GIRAFFE Giraffe | ||
Snodgrass-1980-260-104 104 476 CUP Glass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would unmap "Glass" here, even if I support the idea. The problem is that we have in the European context a distinction between Tasse and Glas as containers, and I consider it useful to maybe add GLASS (CONTAINER FOR LIQUID) later as a concept to Concepticon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will map it as a new concept and add it to Concepticon! Definition as per Cambridge? "a small container for drinks made of glass or similar material with a flat base and usually with no handle"? @LinguList
Snodgrass-1980-260-141 141 478 LIP Lips | ||
Snodgrass-1980-260-142 142 2397 LOBSTER Lobster | ||
Snodgrass-1980-260-143 143 1596 LOCK Lock | ||
Snodgrass-1980-260-144 144 867 GLOVE Mitten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a specific type of glove that we would call Fäustling in German, right? So I suggest to unmap, since we already have Glove above. So we leave it unmapped for now. Here, the images come in handy, as they tell us what we are dealing with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I will unmap this one. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This could be done, but I just checked: so far, there is no other
concept list with this concept. But we only add new concepts if we KNOW
that there are at least two or even more concept lists with them, so my
tip is to keep this in mind for now but not act ;-)
|
So from my viewpoint, we would then only wait for the numbers. If you experience problems in digitization here, let me know tomorrow, so I can see if we can somehow help you. If @Risious comes to the office, we should also ask him to have a look at the mappings, as he'll also contribute new concept lists and should already learn the extended workflow you used here. |
@LinguList I have added an issue for this; tagged under: object naming. |
sounds good, I agree. I am keeping track of those that came up in this list but are not in Concepticon and once (if) they come up in the other object naming lists, we can discuss wether we add them :-) @LinguList |
I have approved the changes now. Would you prefer to merge already and consider adding the numeral data later in a new PR, or should we wait? |
What do you think? I'd be happy to get to see it in Concepticon already, if it is okay :-) But it is your call! @LinguList |
Well in Concepticon.clld.org it will only appear in the next release (which may be in 2025), but in the concepticon-master, you could then already officially download it. |
I'll send you code to compute coverage with Swadesh now. |
okay, sure! that would be good. :-) And I will add the norms later together with the next lists. |
great, thank you! @LinguList |
Pull request checklist
concepticon notlinked --gloss "NEW_GLOSS"
Additional information
...