The other day I was chatting with an ETL developer and he said he always pushes queries into the database instead of dragging data across the network. I thought “Hmm, I remember talking about those topics when I was a DBA.”
I'd like to share those thoughts with you now.
Most databases, used for business intelligence, are not overly saturated with transactional type queries. This leaves resources (platform power and memory) available in the database environment that could be used for complex queries. Queries that may join multiple tables together for an ETL process or program. That said, writing that query properly will be very important.
The important thing about writing a good query is to understand the indexing and the data types. I wrote a blog a couple of years ago called VARCHAR(255). I was amazed that many attributes in database tables are defaulted to this data type. These attributes are not good candidates for indexing, nor will a query be quick and efficient on attributes with textual content of VARCHAR(255).
That said, if you are writing a complex query that joins multiple tables, that has many rows of data, then consider pushing that query to the database using the techniques that work best. For example, SAS has magic ways when using Teradata.
If you are querying for specific indexed rows that may join one or two tables, consider letting the ETL server bear the burden of the process.
Balancing the workload is an ART. Those that do it every day are worth their weight in gold.
So, am I balanced? NEVER! But, hopefully you have an ARTIST who can help optimize your workloads for those times when the database is queried the most. If not, then monitor those queries and look for the ones that are taking the most resources – start your analysis there.