A few decades ago life was simple when we believed Data structres + alogrithms = programs
. Then came along object oriented programming that urged people to think in terms of entities and what they are capable of doing rather than how it is coded. Before I go any further, I want to make it clear that I am actually a believer in OOP; it is just that it has some unintended consequences. As a mental model, OOP makes things easier to understand (and hopefully more managable) but computantionally, it might get very expensive. Let me demonstrate the problem with an example
Let us say that you are trying to generate a report of the team roster for some sport. At a very simplistic level, there exists a Person class which represents an individual's bio-data and a Team object that contains amongst other things, a list of people who are a part of that team. As part of generating the report, we need to show the names of the players in the team, their photo and their date of birth. Not surprisingly, the Person class is capabale of providing all these pieces of information for a given instance of the object (for those of you who just joined the OOP party, each player's details will be available as a distinct individual object). So the "generate report" operation of the Team class is no more than iterating over all its constituent players (aggregation in OOP speak) and then invoking the appropriate operation of the "Person" class. So far so good.
The problem in question revolves around how the Person class actually answers your questions. Typically, you would imagine the details of a person to be stored in some sort of a lookup table (either highly structred in a RDBMS with one column per piece of datum like photo or in a quasi-structred way as a blob in a hash table against an individual's name and a value being some sort a serialized representation of the various details). The steps invovled in fetching a simple piece of information like Give me the url to anomalizer's photo
is as follows:
SELECT name, age, photo_url FROM person WHERE person_id IN (708, 124, 2, 6342)
Strictly speaking, there really isn't a problem unless you want to believe there is one. Things are fine if you plan to perform this operation very sparingly. But, if you had to do this millions of times each day, you will have a seriously bogged down system at hand. Let us assume that the data is stored in a traditional RDBMS. You can assume that it is available as XML returned by a web service and nothing changes. Ultimately, it will be accessing some data store (though not necessarily an RDBMS) but even pretending that a web service can answer the question without consulting anyone will still be acceptable for the remainder of the discussion.
Most object designs starts off with a flurry of getters and setters being defined for each attribute of the object. The most naive implementation would involve opening a connection to the DB, running a select statement which does some sort of a lookup (hopefully a primary key), identify a row and then extract the field in question and return it back. The question is what happens when I want photo, name and age for the same person. In a highly granular getter/setter world, there is no way for me to pass a list of attributes that I am interested in and get the answer. The problem becomes more pronouced in strongly typed languages since there is no reasonable way to return a collection of attributes whose type is unknown i.e. think what happens if I want name, age, names of childern and current net worth in pound sterling returned back in one call. The return type will be a collection of the most generic data type the language supports (which is sort of unreasonable in OO world). I will not tread into the area of caching since it brings with a whole lot of problems like cache coherency, unused resource cleanup etc. etc. The other problem of course is the large number of connections being made to the datasource. This is a relatively easier problem to solve with persistent connections being used out of a connection pool but nonetheless, it exists.
The next dimension of this problem is around getting information for multiple people. Even if we had the best implementation to get arbitrary attributes of a person (at the cost of design elegance), we still have to go visit the data source as many times as the number of people present in a team. There is no way an object of type Person will support a way to get details for multiple people in a single operation since that is not a reflection of a person's attributes. At best you would have to resort to a class static method
if you had to support such an operation. It would still look very clunky since it now takes two lists as input: one which has a list of people and another that has a list of attributes.