I think it's common for vendors and consultancies to push their own technologies as solutions to the problems of clients. Even individual consultants are often bullish about the skills they've acquired. Together, these behaviours make it difficult (if not impossible) for the optimum solution to be found and implemented. Of course, an optimum solution isn't always what a client wants: consider delivering a sub-optimal solution that leaves your client better off than the nearest competitor. Still, I feel that recognising that an optimal solution does exist is an important step towards better architectures.
Adam Smith - the economist - wrote about the division of labour in his magnum opus, The Wealth of Nations. To paraphrase: he talks of a pin factory and the difference in productivity between two scenarios - sharing responsibilities across a group of employees, and assigning specific tasks to employees. There are many conclusions one can draw, but the one that shines out particularly to me is that performance gains can be made by choosing the right tool for the job (RTFJ).
Databases. Great for relational data storage and retrieval. They're designed to meet ACID requirements, and even with all the log shipping and replication in the world, don't scale quite as nicely as other technologies can/do. In my book, that would have been reason enough to keep business logic out of the database. However, certain cases might call for a gargantuan set of data to be worked on at once, and it might be prudent to "bring the computation to the data".
Grids and high performance computing. Great for compute intensive operations. However they're distributed by nature, and that generally makes things more difficult. They usually offer only a subset of the common operating system constructs we're used to - well conceptually, anyway. Spinning up a new thread locally is the "equivalent" of starting up a new process on a remote machine. Also there's the problem of moving data. (De)serialization is computationally intensive - optimisations can take the form of using shared metadata of common versions (e.g. .NET assemblies, and binary serialization) which bring new problems of managing versioning across the environment.
Whatever you're doing, always make "efficient" use of your CPU. Use asynchronous patterns for non-CPU tasks (e.g. waiting on I/O) using callbacks. Thread.Sleep() and spinning in a tight loop are generally evil (but I'm sure there exists a case where both are specifically wonderful).
Distribute only what you have to. If your constraint is virtual ("addressable") memory then it might be OK just to have multiple processes on the same machine with lots of physical memory, talking to each other via some non-network IPC mechanism.
Cache hits should be quick. Cache misses (generally) shouldn't result in multiple simultaneous requests to insert fresh data in the cache. Tricky bit is not making any "threads" wait while the cache data is inserted. That ties my previous point in with the next:
DRY. Don't repeat yourself. This goes for operations as well as for boilerplate copy and pasted code. If a cache can give you the result of an expensive operation you've already computed, for less cost, then consider caching. In-memory, disk, distributed and bespoke caches exist. Each will have a