-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train on coco dataset ? #284
Comments
@Tingwei-Jen I believe For now, I think you will have to write a script to convert your annotations to the VOC format to train with |
If it helps, I created a script today that converted my own dataset format to VOC and it wasn't too hard to do in python. You only need a few parts of the VOC dataset. An example xml file of mine is this:
If you compare that to the VOC dataset you'll notice that it contains a lot of extra fluff that darkwflow doesn't need. Whilst I won't be making a conversion script (as it varies on your original dataset) I can probably help you to make yours in python. Look up ElementTree and the xml stuff in python, super helpful! Good luck! |
@jubjamie Great, I'm glad you got it to work. What I had in mind for the ideal setup would be some configuration file that specifies where in the XML file the important information can be found (i.e. That way, a configuration file can be created for VOC, COCO and any other annotation format someone would want to train from. Then darkflow's parsing system can remain format agnostic. Just an idea - and not sure if it'll ever get implemented but thought I would put it out there. |
I kind of get what you mean. The initial format of the data wouldn't really matter as long as it ended up in the darkflow format. You don't really need much info and it's quite difficult. The tricky bit is identifying the key info in these random datasets. If it was all in XML then you could just specify the original XML field and then copy that to the darkflow XML, but that's quite easy. For ANY dataset you need to understand the original data layout. If you are starting from scratch someone on another issue found this: Very interesting stuff! |
I wasn't thinking it would auto-detect the data from the configuration file - although that would be cool. The configuration file (something like |
I'm confused. What would the .json file do or contain? |
The .json file would contain something along these lines
The above is an example for VOC showing the darkflow parser how to find the attributes it needs (i.e. how to traverse the XML annotation files) if the annotation files are in a JSON format then the same configuration style should still work to traverse the nested dictionaries. If someone wanted to use a different annotation format all they need to change is the .json file as shown above to show darkflow how to find the necessary values from their annotation format. Maybe this is a bad idea - but I thought it would make things a little more versatile between format types without having one format type (i.e. VOC) that everything needs to be converted to (makes things simpler without having to convert). No XML file is returned - this is not a conversion script - what is currently here would be replaced by code that would read the .json configuration file, traverse the annotation files as specified by the configuration file while loading everything into memory like it already does. The parsing script is currently hardcoded for VOC (it only reads the VOC annotation format). This method would make it able to read any kind of JSON or XML annotation format provided the configuration file properly tells it how to traverse the files. Does that make sense? |
I see what you mean. I was thinking of doing it as a kind of API where you just pass the file and the mappings and it will generate the corrected files for you. I guess you could do it with a json file too that you load in as a translator. However, considering that you're images also need to be put in a certain place (and be named correctly) it might be better in the long run to have something that creates the file so that way you can share it or use it again without having to run the conversion every time you train etc. |
@jubjamie Your method should work too although I think it's a little more complex. I still don't think we're quite on the same page here. I'm not proposing any method that uses conversion. Right now darkflow parses the XML annotation files each time you try to train. Instead of parsing the XML files with the current script that can only traverse VOC annotation XML files I'm proposing that a configuration file specifies how darkflow should parse the XML files. It would not be any slower than the current method. It would be faster than your proposed method as there is no intermediate step. In your method (if I understand it correctly) your original annotation files are parsed and then saved in VOC format - then darkflow parses the VOC files and loads the data into memory. My method removes the middle step - the original annotation files are directly parsed by darkflow into memory as specified by the json configuration file. Does this make sense? 😄 If you wanted to share your annotation files all you would have to do is share them along with your small .json configuration file that tells darkflow how to parse them. That's it. |
Yes I did understand your method. I just know that it would be quicker for me to implement one that makes a file. |
@abagshaw Thanks a lot, i will find another solution to solve this problem 😄 |
I know it's an old post but you can convert those json annotations to xml using something like this: |
hi,
I can train on pascal voc dataset, but i don't know how to train on coco.
The format of annotations are different, one is xml another is json,
and dose the darkflow only read xml format of annotation?
Any suggestion?
The text was updated successfully, but these errors were encountered: