Over the past week or so we have discussed the philosophy and reasons behind the philosophy in our software development methodology. Today we are going to dig into the philosophy and give a little detail without getting too specific. We will examine the specifics in later posts.
The database server is the most powerful machine on the
network, and it is that way for a reason. Even more compelling is the fast that Set
Based Coding is orders of magnitude faster than procedural coding. Imagine having the task of picking all the
rocks out of a pile that has a diameter smaller than a quarter. With procedural coding you pick up each rock,
examine it and keep or toss it relative to is diameter and how it compares to
your test quarter. With Set Based
coding, you grab a box, drill several quarter sized holes in it, put the rocks
in the box and shake. Out the bottom
drop all the rocks that meet the criteria.
Entities, Attributes and Relationships
Database Entities are no more than its constituent
tables. These entities have attributes
that correspond to the columns in the tables and the relationships are the
constraints between the entities that say ‘you can only put this type of thing
here’ so we don’t end up adding alternators where tires should go. Defining these things is the biggest failure
developers make. When defining entities
thoroughly describe it. Even more
important, thoroughly define its relationships.
Start with Boyce-Codd’s
3rd Normal Form (3NF) or their Boyce-Codd Normal Form (BCNF) and
decide if you want to de-normalize or super-normalize. One of the tricks is to avoid duplication of
data, make your tables tall and skinny.
Put all information that has a one-to-one relationship in the same
table. Don’t save calculations, usually
(save a birthdate, calculate an age, but save lifetime sales amount instead of
recalculating every time there is a sale).
Put things that can be picked from a list in a table by themselves and
relate it back: a car has a color, there are many colors. The colors table says
blue is identified with 7 so the car’s Color ID attribute is 7. But you can’t show the 7 to the user. More on that later. Know the difference between an Entity and a
lookup table. The Color table would be a
lookup table. The Engine table would be
an entity table and relate back to the Car table through a linking table. There
are many engines, there are many cars but any one car only has one engine. This information is held in the linking
table. More on this later as well.
We all know the guy who can whip out the part number for
every alternator from every Chevy ever built. These people however are few and
far between and they should be. That is a rare and wonderful skill that should
be reserved for baseball statistics and er, alternators. For the rest of us, let’s use English. Take Social Security numbers. They first three digits tell us what state we
were in when the number was issued. It’s the same with area and zip codes. This is all tribal knowledge that really is
just a stand in for something else like state or region. The way to avoid this (and this is happening
with phone numbers) is to avoid tribal knowledge altogether and assign that
number to what it really represents and talk about that thing, not the code. So really we don’t need to know phone numbers
at all, we need to know how to get in touch with John or Sally.
…or whatever language you and your users prefer, but use
it. Don’t use codes, don’t shorten
words, and don’t be afraid of vowels.
Make your table, column, object, property and method names descriptive
and representative of what they contain or do.
If the data itself is text, make the column/property wide enough to
contain real, meaningful information.
The main reason for hiding primary keys is discourage tribal
knowledge. Frequently the code for
something becomes more important than the thing it represents. Then the users will start discussing things
in terms of the primary key, particularly if it houses extrinsic data, as a
surrogate for the thing it represents.
So we have a bin for a part in the rear of an auto parts store with the
part number on the shelf below it. When
that part is discontinued, that shelf will sit empty for large amounts of time,
even years. See the discussion about Ross below.
Using Stored Procedures gives you several things. First is security. If you only use stored procedures you can
lock everyone out of the tables and thus avoid a dependency injection
attack. Second, all the database code is
in one place and it doesn’t require a deployment to change the logic. Third, following the tenets in this book, as
much of the work as possible should be done on the database side. This is the database side.
Custom Built Users and Groups
Not relying on built in models will confuse potential
hackers and will guard against unknown frailties in the security model that
could be found at a later time
Identifying the User’s Session
Once a user logs in, save the session information in the
session and use it to authenticate the user based on his or her session
later. Make sure to check the time the
user has been logged in and to log them out after an appropriate time of inactivity
Always add state to the database.
Less Frequent Re-Indexing
If you never update or delete records from your database,
you never have to re-index it.
Access to Historical data
If you add state, you automatically have access to
historical data and with a simple UserID and DateChanged field, you can tell
who changed what, when and what the old values were.
Set based code should be used as much as possible and should
replace procedural code. This means that
the code in your application should run on the Database Server and the code in
the middle tier
should be as small and light as possible.
For a discussion of Set Based v. Procedural coding go here. This is a discussion of procedural coding in
TSQL but it applies to ALL kinds of procedural programming.
The middle tier is only a procedural or object based
representation of the database. We’ve given examples of how and why set based
coding is faster than procedural code.
Given that, write as little procedural, middle tier code as possible,
only making the calls to the database to do the hard work.
Like the middle tier the user interface should be as small
and with as little code as possible. The
only logic should be validating the data typed in by the user.
Not to get ahead of ourselves, but the ultimate goal here is
to define a framework, and automate the
production of software as much as possible.
We know this is possible because we have done it.
Tomorrow, we will start examining the individual layers in this n-tier architecture we have defined above.