Just a small course project.
The data can be downloaded from here.
Folders:
-
data
: put all the datasets, listed as follows:kddcup.data_10_percent_corrected
the main dataset(for now)kddcup.names
the names file from the dataset.
-
dmproject
main code folder -
tools
some small code files for something like analysing the dataset.
notice: In .gitignore
file, ignore the data/
folder, cause it's really large.
Algorithms:
- KNN method.
- Hierarchical clustering method.
- Other density based method? (maybe)
Pre-processing methods:
- normalization methods?
- feature selection? (maybe)