-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Issues: hpcaitech/ColossalAI
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[PROPOSAL]: Publish the reproduction results on some mainstream tasks(both image and text).
enhancement
New feature or request
#2831
opened Feb 20, 2023 by
flymark2010
updated Feb 21, 2023
1 task
[DOC]: running train_prompts.sh with strategy of colossalai_zero2,showing a error:"The IPv6 network addresses of (i-7840df31, 37340) cannot be retrieved (gai error: -2 - Name or service not known)"
documentation
Improvements or additions to documentation
#2841
opened Feb 21, 2023 by
xiaodahe
updated Feb 28, 2023
[DOC]: Lack of clear examples/doc of multiple nodes with multiple GPUs
documentation
Improvements or additions to documentation
#2921
opened Feb 27, 2023 by
binmakeswell
updated Mar 2, 2023
[BUG]: start titan example too slow
bug
Something isn't working
#3039
opened Mar 7, 2023 by
joan126
updated Mar 8, 2023
[BUG]: GPT single node multi-card training occurred NCCL Error
bug
Something isn't working
#3137
opened Mar 14, 2023 by
tianxin1860
updated Mar 15, 2023
[BUG]: _all_gather_func = dist._all_gather_base \ AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'
bug
Something isn't working
#3142
opened Mar 16, 2023 by
rabeisabigfool
updated Mar 16, 2023
[FEATURE]: integrate tracer
enhancement
New feature or request
#3160
opened Mar 17, 2023 by
YuliangLiu0306
updated Mar 17, 2023
5 tasks
[BUG]: gpt2 pipeline error
bug
Something isn't working
#3164
opened Mar 17, 2023 by
Xu-Kai
updated Mar 17, 2023
[BUG]: Stable diffusion v1 killed
bug
Something isn't working
#3144
opened Mar 16, 2023 by
FrankieDong
updated Mar 20, 2023
[BUG]: Multiple gpu training issue on (2080ti x2) stablediffusion 2 example.
bug
Something isn't working
#3158
opened Mar 17, 2023 by
HaodiFan
updated Mar 20, 2023
[BUG]: Dreambooth inference
bug
Something isn't working
#3145
opened Mar 16, 2023 by
FrankieDong
updated Mar 21, 2023
[BUG]: InterleavedPipelineSchedule fails to run
bug
Something isn't working
#3187
opened Mar 21, 2023 by
KimmiShi
updated Mar 22, 2023
[FEATURE]: Any plan to support train_dreambooth_colossalai with train_text_encoder?
enhancement
New feature or request
#2309
opened Jan 4, 2023 by
vonchenplus
updated Mar 22, 2023
[BUG]This happened after I ran Colossal after building the environment according to the instructions
bug
Something isn't working
#3175
opened Mar 19, 2023 by
dong49
updated Mar 23, 2023
[BUG]: 不能识别结束符
bug
Something isn't working
#3226
opened Mar 23, 2023 by
withyou1771
updated Mar 24, 2023
[DOC]: The problem abount CUDA version
documentation
Improvements or additions to documentation
#3243
opened Mar 25, 2023 by
GreenMountain-XY
updated Mar 25, 2023
[BUG]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss.
bug
Something isn't working
#3251
opened Mar 27, 2023 by
ybsbbw
updated Mar 27, 2023
[FEATURE]: Integrate the latest analyzer into the automatic parallelism module
enhancement
New feature or request
#3278
opened Mar 28, 2023 by
YuliangLiu0306
updated Mar 28, 2023
4 tasks
[BUG]: 运行GPT2-Gemini example,单卡和多卡并行显存占用没区别?
bug
Something isn't working
#3281
opened Mar 28, 2023 by
zhangyuanscall
updated Mar 29, 2023
[BUG]: GPT2-10b Error at self.cpu_shard = torch.empty(self.shard_size, dtype=self.dtype, pin_memory=self.pin_memory)
bug
Something isn't working
#3322
opened Mar 29, 2023 by
zhangyuanscall
updated Mar 29, 2023
[BUG]: Stage3, after sequence generation, fwd of Actor
bug
Something isn't working
#3340
opened Mar 30, 2023 by
xyyintel
updated Mar 30, 2023
[DOC]: How to start it once installed?
documentation
Improvements or additions to documentation
#3328
opened Mar 29, 2023 by
SoftologyPro
updated Mar 30, 2023
[BUG]: How to save model when utilizing pipeline parallel?
bug
Something isn't working
#3252
opened Mar 27, 2023 by
Phoenix1327
updated Mar 30, 2023
[BUG]: RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
bug
Something isn't working
#3271
opened Mar 28, 2023 by
zy86603465
updated Mar 30, 2023
Previous Next
ProTip!
Updated in the last three days: updated:>2024-06-07.