Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Snodgrass Concept List #1393

Merged
merged 5 commits into from
Jul 3, 2024
Merged

Add Snodgrass Concept List #1393

merged 5 commits into from
Jul 3, 2024

Conversation

alzkuc
Copy link
Collaborator

@alzkuc alzkuc commented Jul 1, 2024

Pull request checklist

  • add new concept list
  • add new metadata
  • add new Concepticon concept sets
    • checked whether the new concept(s) can be applied to existing lists with
      concepticon notlinked --gloss "NEW_GLOSS"
  • add new Concepticon concept relations
  • refine existing Concepticon concept set mappings
  • refine Concepticon glosses
  • refine Concepticon concept relations
  • refine Concepticon concept definitions
  • retire data

Additional information

...

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

Hi @LinguList,
I have added the Object Naming Dataset from Snodgrass and Vanderwart as we discussed. Let me know what you think. Thank you! :-)

@LinguList
Copy link
Contributor

@alzkuc, very nice start :-) We should now review the concept list together. My suggestion is that @katjabocklage has a look, since she was doing some mapping. I would also ask Riccardo to join in and will add him here later on. Regarding the original data by Snodgrass, I wonder if you have by any chance also added additional information from the paper? They share ratings and other information, have they also been digitized, or did you restrict this to the concepts?

@LinguList
Copy link
Contributor

As you see, @alzkuc, the code check shows there is no problem with the procedure. This is already very fine.

@LinguList
Copy link
Contributor

@AnnikaTjuka, the list is also interesting for NoRaRe (on the longer run). Do you want to look into this list and the mappings as well ?

Copy link
Contributor

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alzkuc, I would already have a small request. If you check at the DIFF view where you can compare what has been modified, you will see that conceptlists.tsv has been modified in many points where no new data were added, because you add " markers to the right and left of cells (probably automatically, through your editor). If you could revert these cases, we'd only see what you added, this would be very useful.

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

@alzkuc, very nice start :-) We should now review the concept list together. My suggestion is that @katjabocklage has a look, since she was doing some mapping. I would also ask Riccardo to join in and will add him here later on. Regarding the original data by Snodgrass, I wonder if you have by any chance also added additional information from the paper? They share ratings and other information, have they also been digitized, or did you restrict this to the concepts?

I only restricted it to concepts now, but I will happily add other information. There are multiple options - AoA, name agreement, familiarity. Which ones should I add?

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

@alzkuc, I would already have a small request. If you check at the DIFF view where you can compare what has been modified, you will see that conceptlists.tsv has been modified in many points where no new data were added, because you add " markers to the right and left of cells (probably automatically, through your editor). If you could revert these cases, we'd only see what you added, this would be very useful.

That must have happened automatically. I will revert those cases :)

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

@alzkuc, I would already have a small request. If you check at the DIFF view where you can compare what has been modified, you will see that conceptlists.tsv has been modified in many points where no new data were added, because you add " markers to the right and left of cells (probably automatically, through your editor). If you could revert these cases, we'd only see what you added, this would be very useful.

That must have happened automatically. I will revert those cases :)

@LinguList

@LinguList
Copy link
Contributor

For the sake of comparing later different values, and also for the sake of learning how to display values in CSVW (that is, the JSON format), it would be useful to add the major variables.

But I just checked the paper and saw that there are very rich variables in the concept list, and it would beautiful to reflect at least a few of them, also for the sake of training. Thus, page 178 till 181 the author discusses certain ranks from some Montague. Here, I would recommend to see if one can trace the original article, and otherwise also add the ranks plus the semantic field that they assign in one column each to the current concept list. We can discuss some examples of the format before doing actually so, but maybe it is also clear, what I have in mind?

@LinguList
Copy link
Contributor

E.g. (BMCN = Battig Montague Norms)

ENGLISH SEMANTIC_FIELD BMCN_RANK
Alligator Four-footed animal 63
Bear Four-footed animal 9
Bowl Kitchen utensil 9
Broom Kitchen utensil 98

@LinguList
Copy link
Contributor

In the context of NoRaRe, one would later add more semantics to the categories here.

@LinguList
Copy link
Contributor

They use norms for frequency (Kučera-Francis) and age of acquisition (Carroll-White). Page 205, ff, in these cases, I'd ask you to quickly check, @alzkuc, if these are taken from other sources (I remember having heard of Kučera-Francis) and therefore might also be already available in other contexts (e.g. NoRaRe). For the norms on Name agreement, Image agreement, Familiarity and Complexity, my suggestion would be to check if these are as thus measured by Snodgrass, so they should be included then, and also quickly described (later, in NoRaRe, where we add the information on the data in a second step, we will look at them more closely).

@LinguList
Copy link
Contributor

All in all, this turns out to be very interesting, so my gut feeling is that these image naming norms have never been discussed from this semantic perspective and in this detail. With this in mind: if we find concepts not reflected in Concepticon, and they are used in other object naming studies, we may want to add them to concepticon in an additional step.

@alzkuc alzkuc closed this Jul 1, 2024
@alzkuc alzkuc reopened this Jul 1, 2024
@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

For the sake of comparing later different values, and also for the sake of learning how to display values in CSVW (that is, the JSON format), it would be useful to add the major variables.

But I just checked the paper and saw that there are very rich variables in the concept list, and it would beautiful to reflect at least a few of them, also for the sake of training. Thus, page 178 till 181 the author discusses certain ranks from some Montague. Here, I would recommend to see if one can trace the original article, and otherwise also add the ranks plus the semantic field that they assign in one column each to the current concept list. We can discuss some examples of the format before doing actually so, but maybe it is also clear, what I have in mind?

that sounds good! yes, it is clear. I agree. :-) @LinguList

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

They use norms for frequency (Kučera-Francis) and age of acquisition (Carroll-White). Page 205, ff, in these cases, I'd ask you to quickly check, @alzkuc, if these are taken from other sources (I remember having heard of Kučera-Francis) and therefore might also be already available in other contexts (e.g. NoRaRe). For the norms on Name agreement, Image agreement, Familiarity and Complexity, my suggestion would be to check if these are as thus measured by Snodgrass, so they should be included then, and also quickly described (later, in NoRaRe, where we add the information on the data in a second step, we will look at them more closely).

I will have a look at my distant relative's work haha ;-)

@LinguList
Copy link
Contributor

LinguList commented Jul 1, 2024 via email

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 1, 2024

They use norms for frequency (Kučera-Francis) and age of acquisition (Carroll-White). Page 205, ff, in these cases, I'd ask you to quickly check, @alzkuc, if these are taken from other sources (I remember having heard of Kučera-Francis) and therefore might also be already available in other contexts (e.g. NoRaRe). For the norms on Name agreement, Image agreement, Familiarity and Complexity, my suggestion would be to check if these are as thus measured by Snodgrass, so they should be included then, and also quickly described (later, in NoRaRe, where we add the information on the data in a second step, we will look at them more closely).

UPDATE @LinguList: Name agreement, Familiarity, Image Agreement and Complexity were all measured by Snodgrass&Vanderwart in that study. I think we should therefore add all of them to the list.

Not all of the concepts are linked to BMCN, only 189 out of 260. The Kučera-Francis frequency norms (highly cited, but currently somewhat outdated) are available for 240 out of 260 of the words, and AoA (Carroll-White) is available for 89 out of 260. Should we include those?

@LinguList
Copy link
Contributor

Not all of the concepts are linked to BMCN, only 189 out of 260. The Kučera-Francis frequency norms (highly cited, but currently somewhat outdated) are available for 240 out of 260 of the words, and AoA (Carroll-White) is available for 89 out of 260. Should we include those?

That is very cool info, and important!

@LinguList
Copy link
Contributor

I would say we do not link them. Because we may have more recent information from other sources. But it shows that psychologists always have the same dilemma: they cannot link up everything, since every study uses another word, and one cannot provide data for all of them.

@LinguList
Copy link
Contributor

For BMCN, I would like to know if we can get the data in general. In consider these norms as quite important, since they add another layer of "semantic domain" (as Thanasis was mentioning in his talk) to the concepts. And it is always better to have these things defined by somebody rather than defining things oneself (and being criticized for it).

@LinguList
Copy link
Contributor

So adding the norms seems useful. I'd suggest to maybe check if this can be run through tanscribus to avoid typing.

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 2, 2024

I would say we do not link them. Because we may have more recent information from other sources. But it shows that psychologists always have the same dilemma: they cannot link up everything, since every study uses another word, and one cannot provide data for all of them.

yes, that seems to be the case!

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 2, 2024

For BMCN, I would like to know if we can get the data in general. In consider these norms as quite important, since they add another layer of "semantic domain" (as Thanasis was mentioning in his talk) to the concepts. And it is always better to have these things defined by somebody rather than defining things oneself (and being criticized for it).

okay, I'll take a look at that today and will keep you updated :-)

Copy link
Contributor

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points (see detailed comments). In general, the list proves its importance, since we find naming differences already now. One can even simply check who people "call" Snodgrass images TODAY to find language change in action over the past 30 years, I guess...

With the requested changes, we only need to check how to integrate numbers in CSVW (you can label them as "string" in the metadata first, I can then provide suggestions on how to make sure they are treated as integers, and then we can merge, in my opinion.

Snodgrass-1980-260-101 101 31 PAN Frying pan
Snodgrass-1980-260-102 102 Garbage can
Snodgrass-1980-260-103 103 3089 GIRAFFE Giraffe
Snodgrass-1980-260-104 104 476 CUP Glass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would unmap "Glass" here, even if I support the idea. The problem is that we have in the European context a distinction between Tasse and Glas as containers, and I consider it useful to maybe add GLASS (CONTAINER FOR LIQUID) later as a concept to Concepticon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will map it as a new concept and add it to Concepticon! Definition as per Cambridge? "a small container for drinks made of glass or similar material with a flat base and usually with no handle"? @LinguList

Snodgrass-1980-260-141 141 478 LIP Lips
Snodgrass-1980-260-142 142 2397 LOBSTER Lobster
Snodgrass-1980-260-143 143 1596 LOCK Lock
Snodgrass-1980-260-144 144 867 GLOVE Mitten
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a specific type of glove that we would call Fäustling in German, right? So I suggest to unmap, since we already have Glove above. So we leave it unmapped for now. Here, the images come in handy, as they tell us what we are dealing with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I will unmap this one. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LinguList
Copy link
Contributor

LinguList commented Jul 2, 2024 via email

@LinguList
Copy link
Contributor

So from my viewpoint, we would then only wait for the numbers. If you experience problems in digitization here, let me know tomorrow, so I can see if we can somehow help you. If @Risious comes to the office, we should also ask him to have a look at the mappings, as he'll also contribute new concept lists and should already learn the extended workflow you used here.

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 3, 2024

For BMCN, I would like to know if we can get the data in general. In consider these norms as quite important, since they add another layer of "semantic domain" (as Thanasis was mentioning in his talk) to the concepts. And it is always better to have these things defined by somebody rather than defining things oneself (and being criticized for it).

@LinguList I have added an issue for this; tagged under: object naming.

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 3, 2024

This could be done, but I just checked: so far, there is no other concept list with this concept. But we only add new concepts if we KNOW that there are at least two or even more concept lists with them, so my tip is to keep this in mind for now but not act ;-)

sounds good, I agree. I am keeping track of those that came up in this list but are not in Concepticon and once (if) they come up in the other object naming lists, we can discuss wether we add them :-) @LinguList

@LinguList
Copy link
Contributor

I have approved the changes now. Would you prefer to merge already and consider adding the numeral data later in a new PR, or should we wait?

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 3, 2024

I have approved the changes now. Would you prefer to merge already and consider adding the numeral data later in a new PR, or should we wait?

What do you think? I'd be happy to get to see it in Concepticon already, if it is okay :-) But it is your call! @LinguList

@LinguList
Copy link
Contributor

Well in Concepticon.clld.org it will only appear in the next release (which may be in 2025), but in the concepticon-master, you could then already officially download it.

@LinguList LinguList merged commit 1d3b913 into concepticon:master Jul 3, 2024
1 check passed
@LinguList
Copy link
Contributor

I'll send you code to compute coverage with Swadesh now.

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 3, 2024

Well in Concepticon.clld.org it will only appear in the next release (which may be in 2025), but in the concepticon-master, you could then already officially download it.

okay, sure! that would be good. :-) And I will add the norms later together with the next lists.

@alzkuc
Copy link
Collaborator Author

alzkuc commented Jul 3, 2024

I'll send you code to compute coverage with Swadesh now.

great, thank you! @LinguList

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants