Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User data migration #1207

Open
Gozala opened this issue Nov 30, 2023 · 7 comments
Open

User data migration #1207

Gozala opened this issue Nov 30, 2023 · 7 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented Nov 30, 2023

We should provide some ability for users to migrate data from old system into a new one. This implies moving store and upload records. All the stored CARs should already be in the system, so users will not need to download and reupload anything, store/add would just succeed right away.

We currently store records for each upload as CAR regardless of what user sent us see

https://github.com/web3-storage/web3.storage/blob/main/packages/api/src/upload.js#L51-L53
https://github.com/web3-storage/web3.storage/blob/main/packages/api/src/car.js#L160-L173

We do not store CAR cid in the DB, but we encode it in the backupURLs and should be able to derive from it.

Unfortunately our API does not return backupURLs either so it would be impossible to do everything on the client side. We could however amend old API with additional shards field and return CAR cids. If we do that we would be able to then have a client script that fetches all user uploads using JWT token and derives store/add and upload/add invocations from it.

Note that in the old system uploads had names but we don't have those in the new system, not yet anyway. However we could stick names into invocation facts (like we do with space names) in order to retain this information in case we'l need it in the future.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 30, 2023

We should also align all export with #1018

@Gozala
Copy link
Contributor Author

Gozala commented Nov 30, 2023

Since we have to do some work on the backend I wonder if it would be better if instead we implement just /export API endpoint that will return a CAR representing user space as per #1018.

On the w3up side we can implement store/import / upload/import capabilities that user could then pass exported CAR in order to import all the entries from legacy system.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 30, 2023

Yet another option would be to implement /migrate endpoint that takes UCAN delegation with user space store/* and upload/* capabilities. Then it could create export and do the import without having to send user entries which they will need to upload.

@vasco-santos
Copy link
Contributor

vasco-santos commented Nov 30, 2023

All the stored CARs should already be in the system, so users will not need to download and reupload anything, store/add would just succeed right away.

This is mostly true if we still use carpark-prod-0 at that point in time. However, I think that we also had old content in olderbuckets in S3 as well. Therefore, I think we may need to consider having a migration tool that queues things to eventually move them into different buckets or similar. Moving things along within same region in AWS will be easy and not expensive. But, maybe we want to also put it in CF only.

I think this needs more details on requirements and some close work with the bucket migration. For instance, this may be a good reason to punt on changing the bucket until migration is done

@vasco-santos
Copy link
Contributor

We do not store CAR cid in the DB, but we encode it in the backupURLs and should be able to derive from it.

I think we do not have backup URL in all the places. This was even problematic for the migration, but probably once we migrated all CARs to R2, we will have a mapping of rootCid to CARs that we can rely on

@vasco-santos
Copy link
Contributor

Note that in the old system uploads had names but we don't have those in the new system, not yet anyway. However we could stick names into invocation facts (like we do with space names) in order to retain this information in case we'l need it in the future.

Would be great to get that info as a dump to users. And note that this kind of metadata should now be stored at the client level

@vasco-santos
Copy link
Contributor

Finally, I think we should definitely create a backend migration tool. (May be worth to check R2 migration tool from S3). We could then monitor progress and keep things stable instead of go through spikes

We could just queue things and keep users updated. This queue could even send things to Filecoin as well for renewals (given there will be no direct trigger) - depends on some cross team decisions first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants