Skip to content

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

License

Notifications You must be signed in to change notification settings

BenikaHall/HugeCTR

 
 

Repository files navigation

logo Merlin: HugeCTR

v30

HugeCTR is a GPU-accelerated recommender framework designed to distribute training across multiple GPUs and nodes and estimate Click-Through Rates (CTRs). HugeCTR supports model-parallel embedding tables and data-parallel neural networks and their variants such as Wide and Deep Learning (WDL), Deep Cross Network (DCN), DeepFM, and Deep Learning Recommendation Model (DLRM). HugeCTR is a component of NVIDIA Merlin Open Beta, which is used to build large-scale deep learning recommender systems. For additional information, see HugeCTR User Guide.

Design Goals:

  • Fast: HugeCTR is a speed-of-light CTR model framework that can outperform popular recommender systems such as TensorFlow (TF).
  • Efficient: HugeCTR provides the essentials so that you can efficiently train your CTR model.
  • Easy: Regardless of whether you are a data scientist or machine learning practitioner, we've made it easy for anybody to use HugeCTR.

Table of Contents

Core Features

HugeCTR supports a variety of features, including the following:

To learn about our latest enhancements, see our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, follow these steps:

  1. Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:

    docker run --runtime=nvidia --rm -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) -it nvcr.io/nvidia/merlin/merlin-training:0.5
    

    NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.

  2. Activate the merlin conda environment by running the following command:

    source activate merlin
    
  3. Inside the container, copy the DCN configuration file to our mounted directory (/your/container/dir).

    This config file specifies the DCN model architecture and its optimizer. With any Python use case, the solver clause within the configuration file is not used at all.

  4. Generate a synthetic dataset based on the configuration file by running the following command:

    ./data_generator --config-file dcn.json --voc-size-array 39884,39043,17289,7420,20263,3,7120,1543,39884,39043,17289,7420,20263,3,7120,1543,63,63,39884,39043,17289,7420,20263,3,7120,1543 --distribution powerlaw --alpha -1.2
    
  5. Write a simple Python code using the hugectr module as shown here:

    # train.py
    import sys
    import hugectr
    from mpi4py import MPI
    
    def train(json_config_file):
       solver = hugectr.CreateSolver(batchsize_eval = 16384,
                                     batchsize = 16384,
                                     vvgpu = [[0,1,2,3,4,5,6,7]],
                                     repeat_dataset = True)
       reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                                        source = ["./criteo_data/file_list.txt"],
                                        eval_source = "./criteo_data/file_list_test.txt",
                                        check_type = hugectr.Check_t.Sum)
       optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                           update_type = hugectr.Update_t.Global)
       model = hugectr.Model(solver, reader, optimizer)
       model.construct_from_json(graph_config_file = json_config_file, include_dense_network = True)
       model.compile()
       model.summary()
       model.fit(max_iter = 12000, display = 200, eval_interval = 1000, snapshot = 10000, snapshot_prefix = "dcn")
    
    if __name__ == "__main__":
       json_config_file = sys.argv[1]
       train(json_config_file)
    
    

    NOTE: Update the vvgpu (the active GPUs), batchsize, and batchsize_eval parameters according to your GPU system.

  6. Train the model by running the following command:

    python train.py dcn.json
    

For additional information, see the HugeCTR User Guide.

Support and Feedback

If you encounter any issues and/or have questions, please file an issue here so that we can provide you with the necessary resolutions and answers. To further advance the Merlin/HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

About

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 52.1%
  • Cuda 25.0%
  • Jupyter Notebook 12.3%
  • Python 8.4%
  • CMake 1.7%
  • Shell 0.4%
  • Perl 0.1%