Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
entitize committed Sep 29, 2020
1 parent 508ac00 commit 040c233
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Download the comment data from [here](https://drive.google.com/file/d/14iroKftRk
Download the images [here](https://drive.google.com/drive/folders/1jU7qgDqU1je9Y0PMKJ_f31yXRo5uWGFm?usp=sharing).

**Option 2:**
The `*.tsv` dataset files have an `image_url` column which contain the image urls. You can use the URLs to download the images. Only multimodal images
The `*.tsv` dataset files have an `image_url` column which contain the image urls. You can use the URLs to download the images.

For convenience, we have provided a script which will download the images for you. Please follow the instructions if you would like to use the attached script.

Expand All @@ -59,8 +59,6 @@ $ python image_downloader.py file_name

Please note that results in the paper are based on multimodal samples only (samples that have both text and image). In our paper, only samples that have both image and text were used for the baseline experiments and error analysis. Thus, if you would like to compare against the results in the paper, use the samples in the `multimodal_only_samples` folder.

Please ignore the first four columns in the `.tsv` files.

If there are `Unnamed`... columns, you can ignore or get rid of them. Use the `clean_title` column to get filtered text data.

`comments.tsv` consists of comments made by Reddit users on submissions in the entire released dataset. Use the `submission_id` column to identify which submission the comment is associated with. Note that one submission can have zero, one, or multiple comments.

0 comments on commit 040c233

Please sign in to comment.