Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunkserver: add crc #377

Merged
merged 1 commit into from
Jun 25, 2021
Merged

Conversation

SeanHai
Copy link
Contributor

@SeanHai SeanHai commented Jun 3, 2021

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

What's Changed:

How it Works:

Side effects(Breaking backward compatibility? Performance regression?):

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 5 times, most recently from e18dc81 to 05215b8 Compare June 6, 2021 07:11
// whether the current copyset is on scaning
optional bool scaning = 8;
// timestamp for last success scan (seconds)
optional uint64 lastScan = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wine93 has modified this to lastScanSec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wine93 has modified this to lastScanSec

done

// scan status
bool scaning_;
// last scan time
uint64_t lastScan_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add its unit in variable name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add its unit in variable name

done

Peer peer;
LogicPoolID poolId = conf.logicalpoolid();
CopysetID copysetId = conf.copysetid();
int ret = 0;
// if copyset happen conf change, can't scan and wait retry
if (ret = copyset->GetConfChange(&type,
&confTemp, &peer) != 0) {
if (ret = copyset->GetConfChange(&type, &tmpConf, &peer) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because != has higher priority than =, so copyset->GetConfChange(&type, &tmpConf, &peer) != 0 will evulate frist, and its result will pass to ret. Is this order what you expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because != has higher priority than =, so copyset->GetConfChange(&type, &tmpConf, &peer) != 0 will evulate frist, and its result will pass to ret. Is this order what you expect?

fixed

<< conf.configchangeitem().address()
<< "to copyset "
<< ToGroupIdStr(poolId, copysetId);
scanMan_->Enqueue(poolId, copysetId);
}
}
break;

case curve::mds::heartbeat::CANCEL_SCAN_PEER:
{
// todo Abnormal scenario
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baijiaruo What kind of scenarios should be handled here?

const butil::IOBuf& data = cntl_->request_attachment();
if (0 == Propose(request_, &data)) {
doneGuard.release();
if (nullptr == cntl_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify this to

if (0 == Propose(request_, cntl_ ? &cntl_->request_attachment() : nullptr)) {
    doneGuard.release();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify this to

if (0 == Propose(request_, cntl_ ? &cntl_->request_attachment() : nullptr)) {
    doneGuard.release();
}

done

request.offset(), request.size(), readBuffer);
SendScanMapToLeader();
return;
if (CSErrorCode::Success == ret) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget free readBuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget free readBuffer

done

scanMap->set_crc(crc);
scanMap->set_offset(request.offset());
scanMap->set_len(request.size());
// send rcp to leader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: rpc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: rpc

done

brpc::Channel channel;
if (channel.Init(peer_.addr, NULL) != 0) {
LOG(ERROR) << "Fail to init channel to chunkserver for send scanmap: "
<< peer_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget free scanMap, by the way, FollowScanMapRequest::mutable_scanmap can allocate a scanmap and return a pointer to this, and its memory is managed by FollowScanMapRequest, you don't have to new or delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget free scanMap, by the way, FollowScanMapRequest::mutable_scanmap can allocate a scanmap and return a pointer to this, and its memory is managed by FollowScanMapRequest, you don't have to new or delete it.

protobuf set_allocated_xxx will take care of deleting the object, as long as you do not call release_*,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, but if channel init failed, function directly returns here, and scanMap is left undelete, or you can move the code above brpc::Channel channel after channel init.


stub.FollowScanMap(&cntl, &scanMapRequest, &scanMapResponse, nullptr);
if (cntl.Failed()) {
if (cntl.ErrorCode() == EHOSTDOWN ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some retry policy when timeout?

ScanMap LScanMap = job->localMap;
ScanMap FScanMap1 = job->repMap[0];
ScanMap FScanMap2 = job->repMap[1];
if (!(LScanMap.logicalpoolid() == FScanMap1.logicalpoolid() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to define a operator== for ScanMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to define a operator== for ScanMap

use the protobuf MessageDifferencer::Equals

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 2 times, most recently from d890401 to 39db517 Compare June 8, 2021 03:37
@@ -43,8 +43,8 @@ enum CHUNK_OP_TYPE {
CHUNK_OP_CREATE_CLONE = 5; // 创建clone chunk
CHUNK_OP_RECOVER = 6; // 恢复clone chunk
CHUNK_OP_PASTE = 7; // paste chunk 内部请求
CHUNK_OP_SCAN = 8; // scan oprequest
CHUNK_OP_UNKNOWN = 9; // 未知 Op
CHUNK_OP_UNKNOWN = 8; // 未知 Op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

english

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

english

done

@@ -72,6 +72,7 @@ struct HeartbeatOptions {
uint32_t intervalSec;
uint32_t timeout;
CopysetNodeManager* copysetNodeManager;
ScanManager* scanManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not unique ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not unique ptr?

The scanManager also used in ScanService and scanManager is a member var of ChunkServer will not have problem of memory leak.

@@ -66,8 +66,8 @@ void ChunkOpRequest::Process() {
* 如果propose成功,说明request成功交给了raft处理,
* 那么done_就不能被调用,只有propose失败了才需要提前返回
*/
const butil::IOBuf& data = cntl_->request_attachment();
if (0 == Propose(request_, &data)) {
if (0 == Propose(request_, cntl_ ? &cntl_->request_attachment() :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process makes a memory copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process makes a memory copy

Original code have a bug when cntl_ is nullptr. And changed after will not increase memory copy

uint32_t crc = 0;
char *readBuffer = nullptr;
size_t size = request_->size();
readBuffer = new(std::nothrow)char[size];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Reuse ReadChunkRequest::ReadChunk() ? or use ReadChunk as common function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Reuse ReadChunkRequest::ReadChunk() ? or use ReadChunk as common function

Because the "ReadChunkRequest::ReadChunk()" do more things than only read data, for example it set the response but it hasn't response in 'ScanChunkRequest::ApplyFromLog'. And we need to calculate crc based readbuf, so the 'void ReadChunk()' need modified largely. If so the logic is not clear.

<< " offset: " << request.offset()
<< " read len :" << size
<< " datastore return: " << ret;
} else if (CSErrorCode::InternalError == ret) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FATAL chunkserver will exit
Under what circumstances will it be fatal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FATAL chunkserver will exit
Under what circumstances will it be fatal

when read failed in datastore.

bool done = false;
auto job = GetJob(key);
if (nullptr == job) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG here

done

}

void ScanManager::GenScanJob() {
void ScanManager::GenScanJob(ScanKey key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenScanJobsWithoutLock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenScanJobsWithoutLock?

Do you mean change the function name to ' GenScanJobsWithoutLock' ?


while (!done) {
switch (job_.type) {
switch (job->type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch Encapsulation as
ScanType GenScanJob(ScanKey key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch Encapsulation as
ScanType GenScanJob(ScanKey key)

Now the GenScanJob mainly contains 'switch'. Why encapsulation another function to do the same thing?

// check there are three scanmap
if (job->repMap.size() != 2 || job->localMap.logicalpoolid() == 0) {
LOG(ERROR) << "Compare scanmap failed,"
<< " there are not three scanmaps."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are not three scanmaps -> The information of the replication group is not collected completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are not three scanmaps -> The information of the replication group is not collected completely?

Yes, but find a bug in design, will improve later.

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 4 times, most recently from af0bbbf to 248a3f5 Compare June 10, 2021 06:00
@SeanHai SeanHai force-pushed the curve_discard_flatten branch 4 times, most recently from d311f3e to e4e5ff4 Compare June 11, 2021 04:17
@@ -62,6 +62,8 @@ message ChunkRequest {
optional string location = 11; // for CreateCloneChunk
optional string cloneFileSource = 12; // for write/read
optional uint64 cloneFileOffset = 13; // for write/read
optional uint64 sendScanMapTimeoutMs = 14; // for scan chunk
optional uint32 sendScanMapRetryTimes= 15; // for scan chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does those two items corresponding to

copyset.scan_timeout_ms=1000
copyset.scan_retry_times=3

in configurations? If so, why not directly read those items from configurations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does those two items corresponding to

copyset.scan_timeout_ms=1000
copyset.scan_retry_times=3

in configurations? If so, why not directly read those items from configurations?

Leader read from conf and set into chunkOpRequest pushed to raft, and then follower send scanmap rpc to leader in onApplyFromLog will use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does those two items corresponding to

copyset.scan_timeout_ms=1000
copyset.scan_retry_times=3

in configurations? If so, why not directly read those items from configurations?

Leader read from conf and set into chunkOpRequest pushed to raft, and then follower send scanmap rpc to leader in onApplyFromLog will use it.

Yep, I think the follower can just use those configurations from the config file rather than from rpc


public:
brpc::Controller* cntl_;
FollowScanMapResponse *response_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Member variables' orders are different from the constructor init list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Member variables' orders are different from the constructor init list.

ok , Keep unified

@@ -52,6 +53,35 @@ class ChunkClosure : public braft::Closure {
std::shared_ptr<ChunkOpRequest> request_;
};

class ScanChunkClosure : public google::protobuf::Closure {
public:
explicit ScanChunkClosure(ChunkRequest *request, ChunkResponse *response) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for explicit here, because it has two parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for explicit here, because it has two parameters

done


class SendScanMapClosure : public google::protobuf::Closure {
public:
explicit SendScanMapClosure(FollowScanMapResponse *response,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

done

copyset.scan_interval_sec=5
copyset.scan_size_byte=131072
copyset.scan_timeout_ms=1000
copyset.scan_retry_times=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of chunkserver.conf.0 chunkserver.conf.1 chunkserver.conf.2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of chunkserver.conf.0 chunkserver.conf.1 chunkserver.conf.2?

CI use the conf to start chunkserver.

::google::protobuf::Closure *done) {
scanManager_->SetScanJobType(ScanType::BuildMap);
scanManager_->GenScanJob();
::google::protobuf::Closure *done) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done is not run, in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done is not run, in this function.

fixed

// construct scanmap of this scan unit
ScanMapKey key(job->currentChunkId, currentOffset);
std::shared_ptr<ScanMapUnit> scanmap = std::make_shared<ScanMapUnit>();
job->maps.emplace(key, scanmap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job->maps has concurrent access and modification, and should be protected by a lock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job->maps has concurrent access and modification, and should be protected by a lock

yes, I will add a lock to protect job->maps

return;
void ScanManager::DealFollowerScanMap(const FollowScanMapRequest &request,
FollowScanMapResponse *response) {
ScanMap scanMap = request.scanmap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define it as a const reference to avoid memory copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define it as a const reference to avoid memory copy

done

case ScanType::WaitMap:
{
auto iter = job->maps.begin();
while (iter != job->maps.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this while loop here may concurrent with ScanChunkReqProcess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this while loop here may concurrent with ScanChunkReqProcess.

yes, I will add a lock to protect job->maps

job->type = ScanType::Finish;
} else {
job->type = ScanType::NewMap;
job->iter++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job->iter is increased twice because isCurrentJobFinish also increase it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job->iter is increased twice because isCurrentJobFinish also increase it

fixed.

if (nullptr != job) {
auto iter = job->maps.find(mapKey);
if (iter != job->maps.end()) {
iter->second->followerMap.emplace_back(scanMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followerMap should also be protected by a lock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followerMap should also be protected by a lock

The ScanManager::DealFollowerScanMap in leader will process follower's rpc sequentially. So here followerMap should have not concurrency issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followerMap should also be protected by a lock

The ScanManager::DealFollowerScanMap in leader will process follower's rpc sequentially. So here followerMap should have not concurrency issues

I don't think so, followers can concurrently send their CRC result, and RPC requests can also be processed in different threads.
But after you added job->mapsLock.WRLock(), this guarantee followers' result are processed sequentially.

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 2 times, most recently from 6be38e2 to 4068927 Compare June 15, 2021 01:30
@@ -62,6 +62,8 @@ message ChunkRequest {
optional string location = 11; // for CreateCloneChunk
optional string cloneFileSource = 12; // for write/read
optional uint64 cloneFileOffset = 13; // for write/read
optional uint64 sendScanMapTimeoutMs = 14; // for scan chunk
optional uint32 sendScanMapRetryTimes= 15; // for scan chunk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does those two items corresponding to

copyset.scan_timeout_ms=1000
copyset.scan_retry_times=3

in configurations? If so, why not directly read those items from configurations?

Leader read from conf and set into chunkOpRequest pushed to raft, and then follower send scanmap rpc to leader in onApplyFromLog will use it.

Yep, I think the follower can just use those configurations from the config file rather than from rpc

<< " cntl errorCode: " << cntl_->ErrorCode()
<< " cntl error: " << cntl_->ErrorText();
}
// todo retry policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing fix todo here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing fix todo here?

add retry policy after failed.

if (nullptr != job) {
auto iter = job->maps.find(mapKey);
if (iter != job->maps.end()) {
iter->second->followerMap.emplace_back(scanMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followerMap should also be protected by a lock

The ScanManager::DealFollowerScanMap in leader will process follower's rpc sequentially. So here followerMap should have not concurrency issues

I don't think so, followers can concurrently send their CRC result, and RPC requests can also be processed in different threads.
But after you added job->mapsLock.WRLock(), this guarantee followers' result are processed sequentially.

@wu-hanqing
Copy link
Contributor

BTW, what will happen if the copyset's leader is transferred when the copyset CRC task is running?

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 3 times, most recently from 6ad77fa to 360fe7a Compare June 15, 2021 07:59
@SeanHai SeanHai force-pushed the curve_discard_flatten branch 2 times, most recently from 19e13ec to daf7f81 Compare June 16, 2021 12:28
@SeanHai SeanHai force-pushed the curve_discard_flatten branch 7 times, most recently from 37814cf to b315995 Compare June 23, 2021 04:48

switch (response_->status()) {
case CHUNK_OP_STATUS_CHUNK_NOTEXIST:
LOG(ERROR) << "scan chunk failed, read chunk not exist"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request_->DebugString()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request_->DebugString()

done

<< " offset: " << request_->offset()
<< " read len :" << request_->size();
break;
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

done

}

void SendScanMapClosure::Free() {
std::unique_ptr<SendScanMapClosure> selfGuard(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use delete

I think there is not much difference here.

}
}

void SendScanMapClosure::Free() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void SendScanMapClousre::Guard()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void SendScanMapClousre::Guard()

done

std::unique_ptr<brpc::Channel> channelGuard(channel_);
}

void SendScanMapClosure::Run() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard()?

change the SendScanMapClosure::Free() to SendScanMapClosure::Guard(), and will call before exit.

// init scan model
ScanManagerOptions scanOpts;
InitScanOptions(&conf, &scanOpts);
scanOpts.copysetNodeManager = copysetNodeManager_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copysetNodeManager_ = &CopysetNodeManager::GetInstance();
Does not need copysetNodeManager as opts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copysetNodeManager_ = &CopysetNodeManager::GetInstance();
Does not need copysetNodeManager as opts

Yes, but if use CopysetNodeManager::GetInstance() in Scanmanager::Init() will have a trouble to mock this static function in unittest. Maybe need add another class to wrap on CopysetNodeManager. So add the copysetManager in opts is also fine.

if (job_.waitingNum == 0) {
job_.type = ScanType::CompareMap;
}
job->iter = job->chunkMap.find(job->currentChunkId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else

done

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 2 times, most recently from c95aa36 to bfd3016 Compare June 23, 2021 11:24
@@ -92,6 +92,16 @@ copyset.check_retrytimes=3
copyset.finishload_margin=2000
# 循环判定copyset是否加载完成的内部睡眠时间
copyset.check_loadmargin_interval_ms=1000
# scan copyset interval
copyset.scan_interval_sec=5
# the size each scan 4MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this default value to 4MB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this default value to 4MB?

It will take a long time if scansize is 128KB, there are similar time cost between 128KB and 4MB after test.

ScanService_Stub stub(channel_);
stub.FollowScanMap(cntl_, request_, response_, this);
} else {
LOG(ERROR) << " Send scanmap to leader rpc failed after retry,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what will happen when all retry RPC requests are failed, does it going to wait for infinity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what will happen when all retry RPC requests are failed, does it going to wait for infinity?

It will not. The leader will send next request if last request finished or timeout.

@SeanHai SeanHai force-pushed the curve_discard_flatten branch 2 times, most recently from 70b91e6 to 7538216 Compare June 25, 2021 01:47
@ilixiaocui ilixiaocui merged commit 2ebe8da into opencurve:master Jun 25, 2021
ilixiaocui pushed a commit to ilixiaocui/curve that referenced this pull request Feb 6, 2023
…r-install

Add a `kubectl cert-manager install` proposal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants