-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
figure out the API for tabyl() and helpers #101
Comments
Right now helpers only pertain to the 2-way crosstab. Would they also work on a single-var For the 3-way tabyl (a list of 2-way crosstabs), I think we'd expect the user to add the helpers the way they would with a 2-way, but using |
We can think about whether this is feasible some more. I think it is but I don't know how to do it. We'd have to either beef up the Perhaps we could make a Maybe @chrishaid has thoughts? |
One thought for dealing with different calculations (rows vs cols vs both) is to signal which has been used with an attribute.
|
Clever ideas, both of you: attributes, or calling The back-end to the latter (the list) sounds more complicated to write, and the user interface will be simpler without it. I haven't used attributes before but it looks like it could be pretty unobstrusive to add a "totals" attribute. After you've added a totals row or column, it's IMO unlikely you're doing much more with that data.frame than printing it, anyway, so most users shouldn't even notice. Then we'd have:
Two more challenges:
|
I realized a problem with the totals attribute: even if it successfully passes info about totals row/col to subsequent functions like What if we went bigger and attached the original tabyl data.frame to itself as an attribute? Then the steps can truly go in any order, getting closer to an MS Excel PivotTable. And it embraces the seeming-oddness of the steps not mattering, we'll put that aspect front and center in the examples. So you could then say:
With those function calls coming in any order, because they can all see the underlying tabyl data. I think to make the implementation easier, we'll also want attributes of what steps have been attached. E.g., Would it ruin using adorn_totals or adorn_percentages on non-tabyl inputs? I like having that option. Maybe in that case you first call a function At that point is there a reason to extend A grander vision, but at least it feels like a more coherent API. One more possible issue would be that attaching the data.frame to itself seems inefficient memory-wise - but the result of a call to tabyl is always going to be relatively small, thankfully. Probably good to have a simple function |
I played around a bit with this... the evaluation is tricky, as if you allow these adornment helpers to get called in any order, the number of permutations quickly gets unwieldy. E.g., does I see two options:
I have the start of this approach working.
That is easier to program, I think, since say, The latter is easier to implement. And as long as the error messages give the correct reordering, sacrificing the ability to apply these in any order seems minor. Maybe I will think about the theory of this - is there a sensible hierarchy of the order in which the steps should apply? And then give it a shot. |
@chrishaid and @rgknight I welcome thoughts on that last big comment. And on this smaller one: I'm thinking maybe some imagery that captures the idea of an underlying counts table masked by lots of layered adornment attributes... something like light, or shadows, or makeup or disguise? It should describe an unadorned Should |
I think you're on the right track with "%" format as an option to Here's a couple of ideas Totals as an option
Formatting as an option
Or maybe there's even a utility somewhere to format like Excel, e.g., "0%,-0%,0%" would do the rounding, too, or you could use the Formatting as a function
I think I don't like this one because you have to match the format to the metric by position. ConclusionAfter playing with these options I think I like creating a
The sprintf formats are hard to understand though so maybe we also provide some named shortcuts for folks. |
Feeling so close to ready to merge in a huge PR that implements tabyl 1.0. What else could it be called: a 1-way, 2-way, or 3-way contingency/count table that has its values attached as an attribute? Is there something better than |
I'll buy a beverage of your choice to someone who comes up with a better noun/verb than "tabyl" that I implement. |
Still open to better names than I think the API is pretty set. my velocity on janitor is pretty slow and I've implemented the best approach I saw, the best I could. Time to let it into the wild. |
This is scattered across other issues, consolidating it here.
For 1.0 I think we are headed toward:
Main function:
crosstab()
intotabyl()
tabyl(a_vector)
tabyl
; two = a df currently calledcrosstab
; three = list ofcrosstab
dfsHelper functions:
It would be nice to have them all as modules with a common prefix. The base input is a
tabyl(df, col1, col2)
. Then you can do everything to it that you currently can withadorn_crosstab.
I think the obstacle to modularity is that the ordering of the steps is tricky. A function that calculates percentages on a data.frame without totals row/col will give nonsense values on one with totals. It could analyze the numeric values to decide whether the right and bottom vectors are column totals, but that seems dangerous and heavyweight; it could also take a user flag. Neither of those is as nice as the current
adorn_crosstab
which gets to do everything at once so applies the right steps and logic.Similar issue with adding % signs, I think.
This is the genesis for the master helper function
adorn_crosstab
. But then sometimes you don't want % signs, you want numeric values you can plot or do calculations with. Thusns_to_percents
being exported.Maybe ... a cleaner API to
adorn_crosstab
(call itadorn_tabyl
?) that covers all helpers, so they don't need to be called individually? Could be a lot of possible arguments, but hey this is for getting fussy formatting right. Then don't export the sub-helpers if their functionality can be fully accessed, in any combination, through the master helper function?Perhaps there is a way to implement the modularity in some clever specific order.
The text was updated successfully, but these errors were encountered: