Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement Request] Integrate Plato into Sedna as a backend for supporting federated learning - Phase one #116

Merged
merged 2 commits into from
Sep 8, 2021

Conversation

XinYao1994
Copy link
Contributor

@XinYao1994 XinYao1994 commented Jul 7, 2021

This is a PR for integrating Plato into Sedna to support federated learning (#50). @li-ch @baochunli @jaypume

  • We currently support Fedavg and Mistnet in Sedna via Plato.
  • We add one example for each federated learning job.

Finished

  • A tool to automatically generate Plato configuration file based on the CRD.
  • Replace the communication libs with Sedna build-in libs. S3 and asyncio are used.
  • Examples and demo presentation
    • CV: Yolo-v5 demo
  • Trainer and Estimator
  • datasource is from sedna dataset

@kubeedge-bot
Copy link
Collaborator

Welcome @XinYao1994! It looks like this is your first PR to kubeedge/sedna 🎉

@kubeedge-bot kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 7, 2021
@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 7, 2021
@@ -0,0 +1,8 @@
import sedna.service.server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -0,0 +1,68 @@
clients:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to discuss whether this yaml file needs to be generated by the developer or just for plato compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a discussion with @jaypume, this Plato configuration file will be automatically generated based on the CRD.

uvicorn~=0.14.0 # BSD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems scary, since here add all dependences for all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


from plato.servers import mistnet

class MistnetServer(mistnet.Server):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work directly with import plato

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we want to keep transparency when the users are using Sedna. So we package the Plato server within the Sedna lib. Sedna's communication libs (transmitter) are under development @jaypume. After that, we will replace the communication libs with Sedna build-in libs.

@@ -130,3 +130,23 @@ def parse(self, *args, **kwargs):
return
self.x = pd.concat(x_data)
self.y = pd.concat(y_data)

# import os
# os.environ['config_file'] = '/home/work/client.yml'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is recommended to clean up unwanted codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -43,6 +42,7 @@ def __init__(self, estimator, aggregation="FedAvg"):
super(FederatedLearning, self).__init__(
estimator=estimator, config=config)
self.aggregation = ClassFactory.get_cls(ClassType.FL_AGG, aggregation)
self.transmitter = ClassFactory.get_cls(ClassType.TRANSMITTER, transmitter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_transmitter doesn't seem to be registered in sedna.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is under development. @jaypume
In the current stage, we enable Plato in Sedna via directly call its server and client.


@abstractmethod
def compress(self): # 传输的内容可能有:weights,压缩后的weights, 特征向量,蒸馏后的数据
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code comments are best described in English.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jie Pu has updated it.

@@ -21,6 +21,58 @@
os.environ['BACKEND_TYPE'] = 'KERAS'


import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the Pytorch framework used, but keras is defined in the Env variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surface_defect_detection is using Keras, and we will perform it in a similar way.

# self.fedavg_server = fedavg.Server(model=model)
pass

def aggregate0(self, weights, size=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is naming aggregate0 more recommended than aggregate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

from .base import BaseAggregation


class MistNet(BaseAggregation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it needs to be called by the registration factory, maybe we can add the registration wrap with ClassFactory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jie Pu has updated it.

@@ -241,3 +268,17 @@ async def client_info(self, request: Request):
if client_id:
return server.get_client(client_id)
return WSClientInfoList(clients=server.client_list)

import os
os.environ['config_file'] = '/home/work/server.yml'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import module can be add on the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@XinYao1994 XinYao1994 force-pushed the main branch 2 times, most recently from 8fcbfee to 4639049 Compare July 8, 2021 01:36
@kubeedge-bot kubeedge-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2021
@jaypume jaypume force-pushed the main branch 2 times, most recently from bc9291c to 8aa9a9e Compare July 27, 2021 02:18
@kubeedge-bot kubeedge-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 27, 2021
@jaypume jaypume closed this Jul 27, 2021
@kubeedge-bot kubeedge-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 27, 2021
@jaypume
Copy link
Member

jaypume commented Jul 27, 2021

/reopen

@kubeedge-bot
Copy link
Collaborator

@jaypume: Failed to re-open PR: state cannot be changed. There are no new commits on the XinYao1994:main branch.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jaypume
Copy link
Member

jaypume commented Jul 27, 2021

/reopen

@kubeedge-bot kubeedge-bot reopened this Jul 27, 2021
@kubeedge-bot
Copy link
Collaborator

@jaypume: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 27, 2021
Create data interface for ```EDGE1_NODE```.

```shell
mkdir -p /data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete mkdir -p /data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Create data interface for ```EDGE2_NODE```.

```shell
mkdir -p /data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete mkdir -p /data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

create the directory `./data/COCO` in the host of `$CLOUD_NODE` for storing test data.
```bash
mkdir ./data
mkdir ./data/COCO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any directory ?

Suggested change
mkdir ./data/COCO
mkdir -p /data/COCO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


```bash
# on the cloud side
mkdir -p /model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir -p /model
mkdir /model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest sync with mkdir -p

```bash
# on the cloud side
mkdir -p /model
mkdir -p /pretrained

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir -p /pretrained
mkdir /pretrained

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

metadata:
name: "yolo-v5-model"
spec:
url: "/model/yolov5"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not /model ?

Copy link
Contributor Author

@XinYao1994 XinYao1994 Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a file, replaced with /model/yolov5.pth

metadata:
name: "yolo-v5-pretrained-model"
spec:
url: "/pretrained/yolov5"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not /pretrained/yolov5.pth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

jaypume and others added 2 commits September 8, 2021 21:06
1. add transmitter, client_choose, aggregation interface to Lib.
2. add example of how to use new added interface.

Signed-off-by: Jie Pu <pujie2@huawei.com>
Signed-off-by: XinYao1994 <xyao@cs.hku.hk>

update
Signed-off-by: XinYao1994 <xyao@cs.hku.hk>
@jaypume
Copy link
Member

jaypume commented Sep 8, 2021

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2021
@llhuii
Copy link

llhuii commented Sep 8, 2021

/approve
Thank you for this great job, @XinYao1994

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: llhuii, luosiqi, MooreZheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 8, 2021
@kubeedge-bot kubeedge-bot merged commit e110ac7 into kubeedge:main Sep 8, 2021
@XinYao1994
Copy link
Contributor Author

/approve
Thank you for this great job, @XinYao1994

Thanks for the suggestions and help from all committee members.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants