# Tuesday, June 23, 2015

The past few years, I've heard a lot about something called NoSQL. Some people really love it. Those who love it, talk about its lack of ceremony and the speed with which you can develop and the speed with which it reads and writes and its scalability. It sounds all sounds so awesome!

But I grew up on relational databases. My first computer language was FoxPro, which included a relational database and supported a powerful version of SQL. From there, I graduated to SQL Server and I've dabbled occasionally in Microsoft Access. I've even worked with Oracle and MySQL and, as a developer, I find them intuitive. Should I abandon the databases with which I am familiar and travel to this brave, new world of NoSQL? Is NoSQL the best solution for every project? For some projects? How do I know?

Let's start with a definition for NoSQL. This is harder than you might think, because NoSQL databases are basically defined by what they are not. The only real definition is that they are not SQL databases. They tend not to have pre-defined schemas; they tend not to enforce relationships; and they tend to be able to store hierarchical data; and, of course, they tend not to support the SQL language (although some support syntaxes similar to SQL, such as LINQ). These are broad definitions and only address things that NoSQL databases don't do. There is no standard language, syntax, storage mechanism, or API for addressing NoSQL databases.

For purposes of this article, I'll define SQL databases as those in which the database engined provides the following features:

  1. Supports SQL
  2. Enforces pre-defined schemas
  3. Enforces referential integrity among related, normalized tables

This includes database engines supported by large companies, such as Microsoft SQL Server and Oracle, as well as Open Source databases, such as MySQL.

I'll lump all other persistent storage technologies as NoSQL databases. This includes MongoDB, RavenDB, Azure table storage, and DocumentDB.

So when should we choose good old SQL databases and when should we use this newfangled NoSQL thing?

Let's start with SQL databases. They have a few advantages:

SQL databases

Advantages of SQL DBs

First, they are relational, so they make it easy to normalize your database into a set of related tables. This almost always saves disc space and often makes your data more consistent (e.g., you can change the name of a product in one table and it changes throughout your entire application). Databases like this also allow you to create persistent relationships between these tables and these relationships enforce referential integrity, ensuring that we are not left with orphaned records (Who wants an order line without a corresponding order?)  You can set up cascading deletes or force users to create and delete records in an order that will never have inconsistent data.

The SQL language itself is very flexible, allowing users to create either pre-defined or ad-hoc queries against a relational database. This makes SQL databases great for reporting.

The schema in a SQL database helps catch errors in almost the same way that a compiler or a unit test does. If you want to capture a customer's last name and you create a “LastName” column, but one time you accidentally misspell it as "LastNmae", the database will catch this and throw an exception which should be early and obvious enough for you to fix the error.

Disadvantages of SQL DBs

But these features come at a price. There is overhead in enforcing database schemas and referential integrity. As a result, saving to a SQL database tends to be slower.

Also, when developers build an application intended for human interaction, they almost never structure normalize the application's objects in the  way that they normalize the data in their relational database. An entire class of Object Relational Mapper (ORM) software exists simply to deal with this mismatch. It requires code, time, and CPU cycles to map between objects in an application and data in a database.

NoSQL databases

Advantages of NoSQL DBs

Because NoSQL databases don't need to enforce schemas or relationships, they tend to perform faster than their SQL cousins.

Database development tends to be faster because developers and DBAs are not required to pre-define the columns in each table.

The lack of database relationship enforcement also makes it easier to move parts of a database to another server, which makes it easier to support very large data sets. Relational databases can move across servers, but it tends to be more difficult because of their need to enforce referential integrity.

The lack of schema also adds flexibility, especially if you are capturing data in which different objects may have different properties. For example, a product catalogue table may contain some items (such as computer monitors) for which diagonal size in inches is an important property, and other items (such as hard drives) for which capacity in GB is an important property. Mapping these disparate needs to a relational database table would be add complexity to your data model.

Finally, it is possible to serialize and de-serialize objects in the same format that they are used in an application's user interface. This eliminates the need for an ORM, which makes applications simpler and faster.

Disadvantages of NoSQL DBs

When reading data, NoSQL databases tend to be very fast, as long as you are looking up rows by an index or key. If you want to look up a row by any other property or filter your data by a property, this often requires a full table scan, which is very slow. Some NoSQL databases allow you to create index on non-key rows, which speeds up such searches but slows down data writes - decreasing one of the advantages of NoSQL.

Other factors

It's worth looking at the cost of any database solution. For example, Azure provides both SQL and NoSQL databases as a service. If we compare the cost of Azure SQL Database with Azure table storage (a NoSQL option), we can see that the price of table storage is far less than the cost of SQL Server. Table storage might not be the answer for your application, but it's worth examining whether some of your data can work with Azure table storage.


As with most questions facing IT developers, architects and managers, there is no clear-cut answer to whether to use SQL or NoSQL databases. SQL databases tend to be better when ad-hoc reporting is required, while NoSQL databases tend to shine when saving and retrieving transactional data from a user application. Many applications take advantage of the features of both database types by creating a NoSQL database with which their application interacts; then transforming and regularly copying this data into a relational database, which can be queried and reported on.

There are many options for your persistent storage needs. Choose the right one for your application.

Tuesday, June 23, 2015 2:42:00 PM (GMT Daylight Time, UTC+01:00)
# Tuesday, January 7, 2014
Tuesday, January 7, 2014 11:25:31 AM (GMT Standard Time, UTC+00:00)
# Monday, January 6, 2014
Monday, January 6, 2014 6:25:00 PM (GMT Standard Time, UTC+00:00)
# Monday, November 25, 2013
Monday, November 25, 2013 9:46:00 PM (GMT Standard Time, UTC+00:00)
# Monday, November 19, 2012
Monday, November 19, 2012 3:31:00 PM (GMT Standard Time, UTC+00:00)
# Monday, June 4, 2012
Monday, June 4, 2012 3:01:00 PM (GMT Daylight Time, UTC+01:00)
# Friday, June 1, 2012

Here is Steve Smith’s presentation on Common Design Patterns at the May 2012 Great Lakes Area .NET User Group (GANG).

Friday, June 1, 2012 4:00:00 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, July 7, 2010


It’s no secret that software developers, managers and analysts do a poor job estimating projects. Few IT projects complete within the time they are estimated and far more go over the original estimate than under it.

Steve McConnell knows how difficult estimation can be His 2006 book Software Estimation is subtitled Demystifying the Black Art.

Developing reasonable estimates of software projects may not be a black art, but it does cause problems and most people fail at it for a variety of reasons.

McConnell refers to estimation as an art, not because it has no basis in science, but because formulas don't tell the whole story. Experience and difficult-to-measure inputs are required to generate a complete estimate. And even then, you may still get it wrong.

When McConell lists sources of estimation error (subjectivity, missing tasks, unwarranted optimism, excess precision), it's startling how many of those factors I have experienced or contributed in my own career.

The author provides various methods for creating an estimate and guidance on improving the accuracy of your estimates. Among his advice is:

  • Base your estimates on something you can measure - preferably historical data on similar projects in your own organization. Estimates based on measurable data are far superior to those based on subjective criteria.
  • Estimates are never precise (they're not called "exactimates"). Present estimates as ranges and don't include more significant digits than your inputs can justify.
  • If possible, get effort estimates from those who will actually perform the work. Developers vary in how quickly they can accomplish a given task - sometimes that variance is in orders of magnitude.

If part of your job includes estimating software projects, this is an essential book to guide you. Like most of McConnell’s books, I recommend it.

Wednesday, July 7, 2010 3:50:10 PM (GMT Daylight Time, UTC+01:00)
# Monday, May 17, 2010

Episode 88

In this interview, Microsoft Product Unit Manager Cameron Skinner describes the architecture tools that his team built into Visual Studio 2010

Monday, May 17, 2010 10:55:01 AM (GMT Daylight Time, UTC+01:00)
# Wednesday, May 5, 2010

Microsoft Product Unit Manager Cameron Skinner came to the midwest to show off the Architecture features of Visual Studio 2010. He began his tour in the Detroit area, speaking at local companies in the afternoon and at the Great Lakes Area .Net User Group (GANG) in the evening. I recorded two of his presentations, which are available here.

Here is the presentation at GANG

Part 1:

Part 2:

Part 3:

Part 4:

Part 5:

Here is the presentation at a Detroit-area company.

Part 1:

Part 2:

Part 3:

Part 4:

Wednesday, May 5, 2010 3:04:35 AM (GMT Daylight Time, UTC+01:00)
# Friday, April 30, 2010

I began reading Agile Principles, Patterns and Practices in C# by Robert C Martin and Micah Martin after a friend recommended the chapters on pair programming.  My friend was right, of course. The Martins not only decribed pair programming but included an entertaining script of two developers pairing on a programming problem.

But, as I dove deeper into this book, I found a wealth of other information.

The book begins with a section on agile development, defining some basic terms and concepts recommended practices. It follows with a detailed section on good design practice. This second section is the most interesting, as it describes the famous SOLID principles. SOLID is an acronym for a set of good design practices:

S=Single Responsibility Principle: Each class should serve only one purpose and have only one reason to change.
O=Open-Close Principle: Classes should be open for extension but closed for modification
L=Liskov Substitution Principle: It should always be possible to substitute a derived class with its base class
I=Interface Segregation Principle: Interfaces implemented by a class are defined by the client objects that use that class; a class should implement a separate interface for each client that calls it.
D=Dependency Inversion Principle: To maintain flexibility, you should write code that depends on abstractions, such as interfaces.

Next, the authors present an overview of Unified Markup Language (UML), a graphical language used to describe software designs and requirements. Common UML diagrams and shapes are described and the author offers opinions of which ones are most useful and when to best use them.

The last half of the book is a case study of a Payroll System in which the authors use examples to illustrate the concepts introduced in the first half of the book.

Although C# is included in the title, the book does not focus on C# and almost none of the concepts are specific to any particular language. All the code examples are in C#, which makes it a bit more accessible if that is your strongest language.

The book is filled with lots of information and good advice. For example, the authors recommend an iterative approach to writing software, a test-first approach to development and encourage developers to refactoring their code frequently.

Whether you read all of Agile Principles, Patterns and Practices in C# or pick through the sections of interest, you will benefit from this book.

Friday, April 30, 2010 7:41:42 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, February 3, 2010

Episode 68

James Bender, Mike Wood and Chris Woodruff created NPlus1.org to assist software architects, lead developers and those aspiring to these roles. In this interview, James and Mike discuss the goals and accomplishments of NPlus1.

Wednesday, February 3, 2010 12:47:35 PM (GMT Standard Time, UTC+00:00)
# Monday, December 21, 2009

Complexity is the Enemy! 

This is the message driven home repeatedly by Roger Sessions in his book Simple Architectures for Complex Enterprises

Sessions recommends tackling a complex enterprise architecture by identifying the subcomponents of a complex system and dividing that system into autonomous subsystems. He refers to these subsystems as Autonomous Business Capabilities (ABCs) and the process of dividing them as a Simple Iterative Process (SIP).  

Before describing how to approach this process, Sessions presents a mathematical proof that subdividing a complex system into a set of subsystems reduces the complexity of the system as a whole. This seems intuitive to many of us, but the mathematics allow us to be more forceful in our commitment to this process. The mathematics is relatively simple (nothing beyond high school math) and he even recommends training team members in this mathematics before beginning any SIP.

A large part of an Enterprise Architect's job is to define the optimal way to partition the complex system. By applying mathematics to his model, he removes the emotions that so often dictate how a project is broken up.

The process of splitting a complex system into appropriate subsystem isn't overwhelming, but it is critical to managing complexity. According to Sessions, Each ABC should contain only elements that relate to one another; and the elements of one ABC should not relate directly to or communicate directly with any element in another ABC. Once partitioned, each ABC should be roughly the same size, although it is possible to split a subsystem further into sub-subsystems. It is also critical that communication between each subsystem take place only at a few clearly-defined points.

If this sounds like a recipe for Service Oriented Architecture, this is no coincidence. Sessions concludes his book with recommendations on moving from business partitions (ABCs) to software partitions, which he describes as "fortresses". These software partitions follow many of the same rules as ABCs created with the SIP, so making this transition is straightforward.

This is a good book for anyone who aspires to be an Architect (Enterprise or otherwise) and wants to apply a systematic approach to managing complexity.

Monday, December 21, 2009 7:45:02 PM (GMT Standard Time, UTC+00:00)
# Thursday, July 30, 2009

NPlus1 is an organization designed to assist architects and lead developers and those aspiring to these roles.

The organization began last year with the launch of the NPlus1.org web site. This site features articles written by and for architects; links to screencasts and other resources; and announcements of upcoming events.

Recently, NPlus1 decided to expand its reach by organizing events of its own.  One of these events - the Architecture Summit - takes place Friday July 31 at the Microsoft office in Southfield, MI.  This event will feature three topics: "Introduction to Object Oriented Programming"; "Software Patterns"; and "How I Learned to Love Dependency Injection". The first topic is optional, as it is aimed at those who are new to Object Oriented Programming and, therefore might struggle with the concepts presented in the other two presentations.

I will be delivering the first two presentations (Intro to OOP and Software Patterns) while James Bender will deliver the Dependency Injection presentation.

It is not too late to register for this event and you can do so at https://www.clicktoattend.com/invitation.aspx?code=139245

Thursday, July 30, 2009 4:00:45 PM (GMT Daylight Time, UTC+01:00)