Current structure might hit limits #16

rickvdbosch · 2020-09-29T16:08:59Z

In the current entity setup, repos and the PRs of that repo a user contributed to are serialized into a Json string and stored in one Table Storage column. The maximum length of one column in Table Storage is 64 KiB:

String values may be up to 64 KiB in size. Note that the maximum number of characters supported is about 32 K or less.
Source: Understanding the Table service data model - Property types

Because of this limit, the current structure might be insufficient for (very) active users.

I propose to implement an alternative structure to make sure we can accommodate even the most active GitHub users. Is that OK?

Layla-P · 2020-09-30T08:35:58Z

@CrypticEngima Are you able to look into this?

CrypticEnigma00 · 2020-09-30T14:45:27Z

@rickvdbosch Currently we are not expecting the usage to hit those limits. but if we do start getting that amount of traffic we can look at refactoring this. in prep for if that happens could you please describe the changes you propose to make for this(I'm personally interested as this is the first time i'm using table storage).

rickvdbosch · 2020-09-30T15:59:40Z

@CrypticEngima As far as the way I'm used to work with TableStorage, you could take a look at my TableStorageRepository for reference. Might be interesting.

For the entities, I would think about the following:

Table	Partition Key	Row Key
Users	"Users"	Username
Repositories	Username	Reponame
PullRequests	Username + Reponame	PrId

There's a downside here since you need to do multiple queries to get all information. But with proper partitioning that shouldn't be a big / an actual issue.

CrypticEnigma00 · 2020-09-30T16:14:03Z

@rickvdbosch Thank you so much for sharing that information I can most certanly see the benifits of this structure. I have one overriding question about the format you suggest here though which is.

Does this format not turn a key value pair storage into a basic Relational Database?

Maybe i'm missunderstanding the useage of ''no sql' style storage i'm so used to using Relational Databases.

rickvdbosch · 2020-09-30T17:14:49Z

Well, the current structure does the same, but only by serializing data instead of having it in separate tables. 😁

Looking at this from an API perspective, there are some clear entry points to be seen.

Get repos per user
Get PRs for a repo (of a user)

This would validate the structure, since you're going to need to call1 before calling 2. Us using MVC might drive us to think we'd need all the data at once for our model.

Come to think about it, maybe the user table is not even needed. It doesn't store anything else than username... right? So having username as the PK of the repos table eliminates that one. And to be honest I'm not entirely sure about the repositories table either.

That would solve the issue entirely 🤓

rickvdbosch · 2020-09-30T19:54:38Z

So I took the time to play a game of tennis, and CrypticEngima's comment and the relaxation gave me some new insights. Nothing in this comment is meant as criticism, only to get us to the best solution. So here goes:

The current solution

The proposal in my earlier comment in this thread was based on an existing model, which actually seems set up with a relational model in mind. But I think we might need to take a step back in defining the model.

Requirements

What we should do first is define what data we actually need to store. The user-table, for instance, can be removed since the only thing we store is the username. That's something we can store elsewhere.
Next we need to take a look at the levels at which we want to retrieve that data. Because if we always get all repositories and the PR's the user has for those repos, the model can be brought back to only one table. That's the cool thing about Table Storage: it's so fast and cheap it's not bad to store things multiple times. Normalization is not that important anymore.

Proposal (beware, based on assumptions above)

	PartitionKey	RowKey	Column	Column
Structure	Username	{owner}:{reponame}:{prId}	Url	Title
Example	`rickvdbosch`	`Layla-P:HacktoberfestProject:19`	`https://github.com/Layla-P/HacktoberfestProject/pull/19`	`Get user from table storage based on GitHub info`

This enables us to get all information for a specific user by querying the entire partition for a user. The current combined RowKey is unique and can be parsed into three different columns owner, reponame and PrId.
As a sidestep: we can generate the URL based on that information. So to be efficient we could remove Url too. But if it's simpler to keep it, then we should.

Input or ideas?

Any ideas @Layla-P and @CrypticEngima?

CrypticEnigma00 · 2020-09-30T20:05:30Z

@rickvdbosch Thanks, I see the way your thinking about this now and yes it's a big change from the way you think about data in a relational database. I think i need to investigate ' No Sql' style further to better understand. but this information has been a real eye opener

CrypticEnigma00 assigned rickvdbosch Oct 1, 2020

rickvdbosch mentioned this issue Oct 1, 2020

Implemented storing and getting data #23

Merged

Layla-P closed this as completed in #23 Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current structure might hit limits #16

Current structure might hit limits #16

rickvdbosch commented Sep 29, 2020

Layla-P commented Sep 30, 2020

CrypticEnigma00 commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020 •

edited

Loading

CrypticEnigma00 commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020

CrypticEnigma00 commented Sep 30, 2020

Current structure might hit limits #16

Current structure might hit limits #16

Comments

rickvdbosch commented Sep 29, 2020

Layla-P commented Sep 30, 2020

CrypticEnigma00 commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020 • edited Loading

CrypticEnigma00 commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020

The current solution

Requirements

Proposal (beware, based on assumptions above)

Input or ideas?

CrypticEnigma00 commented Sep 30, 2020

rickvdbosch commented Sep 30, 2020 •

edited

Loading