"Compose" here doesn't mean "to write". It means to build up a database query in...

BinaryIdiot · on Feb 6, 2016

Oh I know what compose means. Having written a ton of stored procedures for large scale web applications I've never found the need to dynamically build such queries. Instead I focus on what I need a specific query to do, provide the options to a stored procedure or function (depending on the RDMS) and that's it. Much like unix with simple programs that do very niche things I look at procedures much the same way.

Don't get me wrong I see the simplicity in it from the ORM side I've just never had a good experience with ORM tools. Having worked directly against many RDMS and key value stores at least for me I don't see much value in them.

But anything that helps get a product out the door more power to ya, in my opinion.

barrkel · on Feb 6, 2016

What do you do when you want e.g. user controlled sorting and filtering?

I just implemented this recently in my job, but it was even tougher than that; I was joining dynamically created tables, and the number of joins depended on criteria. I ended up building my own AST to represent filters, and compiling to SQL. Every extra table join hurts performance - typically filters are implented as joins against a filtered sorted limited subquery returning ids, so fewer different table joins means less data to pull in when evaluating filters even when all the data is needed for the results page.

Point being, I had to compose complex dynamic queries and SQL was not a particularly pleasant target language to target directly. Abstractions (like asts) help with composition.

mborch · on Feb 6, 2016

You can use a CASE-statement for each possible filter and apply it only if the filter value(s) are non-trivial.

    -- Parameter :filter
    --
    AND CASE WHEN :filter IS NOT NULL THEN
        table.column = :filter
    ELSE TRUE END

This has zero cost in the case where :filter is unset.

barrkel · on Feb 6, 2016

The filters are predicates, not values.

BinaryIdiot · on Feb 6, 2016

> What do you do when you want e.g. user controlled sorting and filtering?

Sorting is easy. The stored procedure just accepts a sorting flag, add it into your query and your done. Filtering though, I could think of a dozen types of things you may want to filter on and I still may not hit your use case so it's hard to say how to do it based on what data is stored in the database, how it's stored and which RDMS you're using.

I've primarily done this type of work in MS SQL where I've built some pretty complex queries but I was always able to replace dynamically generated SQL with SQL that could generate a cacheable query plan.

Not saying there won't be edge cases and I don't know how your filtering worked so I don't have any type of satisfying answer for you. I would just be surprised if it wasn't doable somehow (I've seen some crazy shit in SQL queries :D)

barrkel · on Feb 6, 2016

The job I have in mind was basically implementing a subset of Excel's sorting and filtering capabilities into a paginated table view on a SPA web page.

Sorting has a couple of wrinkles. When Excel sorts rows, it changes the in-memory order; its sort is also stable. That means if you sort by column C, then B, then A, you're effectively doing "order by a, b, c". Seems simple enough. But that's not the only wrinkle in this problem.

Not all sortable columns are in the same table. If you do something naive like "select x.* , y.* from x join y [...] where [...] order by x.a, y.b limit 10 offset 20000", you're doing too much work - you're sorting a wide result set. It's normally better to do your pagination in a subquery and join against it, like this:

    select x.*, y.* from x join y [...] join (
      select x.id from x join y [...] where [...]
      order by x.a, y.b limit 10 offset 20000
    ) f on f.id = x.id
    order by x.a, y.b

Depending on the body of the where clause, you want to join the minimum set of tables to include just enough data to evaluate the page of ids - every join has a cost.

The body of the where clause is another story, it's pretty much entirely arbitrary. The user may choose to filter on any column, just like Excel autofilter. Creating good indexes up front isn't feasible - you don't know how many columns to include in any one index, they take time to create and they're not free in disk storage either. And the filtered columns span multiple tables; the final result set may contain several hundred columns across all these tables.

So what I did is take the criteria - the filter clause, which might look something like this:

    (and (or (eq (field 'x.f1') "foo")
             (eq (field 'y.f2') "bar"))
         (between (field 'm.timestamp') "date1" "date2"))

And turn it into a where clause. I analyze the filter tree to determine the minimum set of joins required to access all the fields needed by the filter and sorting criteria. I can then put that into a subquery to join against.

There are more wrinkles. To give a good experience, we need to populate a dropdown of potential filter values, like Excel autofilter. Potential filter values need to be filtered themselves - we only display filter values for that are potentially visible given the current filter. They also need to be searchable, because there may be millions of them. And they may include joins, like when you want to filter by tags or some other 1:N style relation, like row:tags. There are a bunch of different shapes to the different queries, but there are common composable underlyers to them.

All the above runs on mostly unindexed tables - we're relying on pure query compute performance within the designed data limits (less than 3 million rows). Sub-millisecond query response times aren't the goal here; we're supporting the user doing reasonably complex data manipulation operations with worst case performance measured in single-digit seconds.

(This is all in MySQL, because reasons that include business model constraints.)

mhd · on Feb 6, 2016

So you don't even build dynamic queries in the stored procedures themselves (e.g. with string concatenations and EXECUTE)?

That's where I usually get fed up with stored procedures, as pseudo-ADA ain't a nice language for list and string manipulation. I'd rather do that in a language better suited, and that quickly leads to query builders…