The copyright data remains with the original owners of the data, do not use this data for commercial purpose.
- Download pdf from here, https://f000.backblazeb2.com/file/malay-dataset/crawler/academia/academia-pdf.zip
- Download texts from pdf extracted using Tika from here, https://f000.backblazeb2.com/file/malay-dataset/crawler/academia/academia-pdf.json
-
12-09-2020, https://f000.backblazeb2.com/file/malay-dataset/crawler/academia/academia.edu.zip
-
15-09-2020, https://f000.backblazeb2.com/file/malay-dataset/crawler/academia/academia.edu-v2.zip
@misc{Malay-Dataset, We gather Bahasa Malaysia corpus!, Crawling Academia.edu,
author = {Husein, Zolkepli},
title = {Malay-Dataset},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huseinzol05/malay-dataset/tree/master/crawl/pdf}}
}