-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current structure might hit limits #16
Comments
@CrypticEngima Are you able to look into this? |
@rickvdbosch Currently we are not expecting the usage to hit those limits. but if we do start getting that amount of traffic we can look at refactoring this. in prep for if that happens could you please describe the changes you propose to make for this(I'm personally interested as this is the first time i'm using table storage). |
@CrypticEngima As far as the way I'm used to work with TableStorage, you could take a look at my TableStorageRepository for reference. Might be interesting. For the entities, I would think about the following:
There's a downside here since you need to do multiple queries to get all information. But with proper partitioning that shouldn't be a big / an actual issue. |
@rickvdbosch Thank you so much for sharing that information I can most certanly see the benifits of this structure. I have one overriding question about the format you suggest here though which is. Does this format not turn a key value pair storage into a basic Relational Database? Maybe i'm missunderstanding the useage of ''no sql' style storage i'm so used to using Relational Databases. |
Well, the current structure does the same, but only by serializing data instead of having it in separate tables. 😁 Looking at this from an API perspective, there are some clear entry points to be seen.
This would validate the structure, since you're going to need to call1 before calling 2. Us using MVC might drive us to think we'd need all the data at once for our model. Come to think about it, maybe the user table is not even needed. It doesn't store anything else than username... right? So having username as the PK of the repos table eliminates that one. And to be honest I'm not entirely sure about the repositories table either. That would solve the issue entirely 🤓 |
So I took the time to play a game of tennis, and CrypticEngima's comment and the relaxation gave me some new insights. Nothing in this comment is meant as criticism, only to get us to the best solution. So here goes: The current solutionThe proposal in my earlier comment in this thread was based on an existing model, which actually seems set up with a relational model in mind. But I think we might need to take a step back in defining the model. RequirementsWhat we should do first is define what data we actually need to store. The user-table, for instance, can be removed since the only thing we store is the username. That's something we can store elsewhere. Proposal (beware, based on assumptions above)
This enables us to get all information for a specific user by querying the entire partition for a user. The current combined RowKey is unique and can be parsed into three different columns Input or ideas?Any ideas @Layla-P and @CrypticEngima? |
@rickvdbosch Thanks, I see the way your thinking about this now and yes it's a big change from the way you think about data in a relational database. I think i need to investigate ' No Sql' style further to better understand. but this information has been a real eye opener |
In the current entity setup, repos and the PRs of that repo a user contributed to are serialized into a Json string and stored in one Table Storage column. The maximum length of one column in Table Storage is 64 KiB:
Because of this limit, the current structure might be insufficient for (very) active users.
I propose to implement an alternative structure to make sure we can accommodate even the most active GitHub users. Is that OK?
The text was updated successfully, but these errors were encountered: