Skip to content

SYSTEMS ARCHITECTURE

Mark Janssen edited this page Sep 6, 2019 · 56 revisions

Systems architecture is architecting a system to not only provide general purpose computing, but natural constructs for building complex systems. This requires mastery of epistemology -- the study of how we know things. When software programmers are programming software, they are approaching a megalith, of sorts, of grokking the whole substructure of a problem. Once they know it, they have arrived at the perfect architecture for their problem that embodies the underlying wisdom which created it. Until then, they must refactor mercilessly (eXtreme Programming maxim).

Good architecture reduces the complexity and size of your code base, partly by simply moving oft-used functions to the Operating System. Much like the game of 20 questions reduces the time to find an answer out of millions of animals, it is a log function. The approximate formula is:

100 log2(LOC) (uncommented, undocumented, using LOC of "unarchitected" code: like Assembly; simple subroutines are allowed up to a small stack of parameters)

The reasoning of the equation is that you can divide your monolithic code in two until you get to the cognitive structures which created it in the first place. Engineers approach the problem from the other direction, but architects approach from the top. Cognitive structures like a data table to hold records of a consistent form, an integer variable to hold the precise numbers, a query language using NLP, etc. -- these high-level ideas are re-useable and can be placed outside the application in the larger data ecosystem.

The equation shows that you actually add more code with small code bases, in anticipation of the generalized constructs you'll be using and to make explicit the otherwise hidden conceptual constructs you're using when writing programs. This scaffolding costs a marginal amount of code. These are things like named functions or an object definition. 500LOC = ~896LOC of well-architected code, the cross-over point is actually around 1000LOC. Like I said, this number assumes you have grokked your problem domain in order to architect the code, completely. If you aren't near this number, you probably haven't understood your problem domain completely, or you're simply too lazy to rewrite your code.

This formula is for undocumented, uncommented LOC counts. For documented code, multiply the result by 2 to arrive at the number of lines you should have. That is about the healthy ratio of code to comments/docs; that is, one to one. If you have less comments/docs, you probably have bad code. If you have more than that, you probably have bad code.

But if your poorly architected code-base is 25M LOC, you should be able to reduce that to about 5,000 lines of documented code. That's the difference between architecture and engineering -- two entirely different specializations, but each must use the other to make good software. These numbers assume your OS is helping with your engineering, giving you record-locking, network scheduling, or whatever services would be used by many apps. As a general rule,

Pretty cool, huh? The magical constant there is the approximate value of the substructures that precede the mind itself, cellular or acoustic structures that underpin words. I'm probably underestimating it due to my knowledge of epistemology*, but in no case would it be over 1000. In other words, elite programmers-architects rearrange their customers MINDS about the problem (or hardware engineer's minds), because they probably (certainly?) haven't grokked it either.

Here's another piece of architecting wisdom. There is no function or object which should need more than 4 parameters (except such meta programs as compilers). If your function or object wants more, you haven't architected it properly. A set of functions for drawing polygons, for example, should be generalized to taking a list of (x,y) points or make your points a separate structure passed at once. If you actually make a program that breaks this rule, it is YOU who is out of harmony with the universe.

Lastly, most code uses re-useable constructs -- that is what the formula above is about. The estimated amount is about 50% of your code. Most mathematical formula, for example, relates to something in the real world, allowing generalization of functions that might be embedded in your own.


(*) The actual term is "epistemics" as written on the wikiwikiweb -- the understanding of knowledge from it`s root: data. Interesting with OOP alone, the formula is about 1000 log2(LOC). With epistemics, it is 100 log2(LOC), because you go deeper to the core.
Clone this wiki locally