[LONG] reasons why "advanced" expressions are needed (Was: [Newbie] Advanced expressions)

From: Giulio Cesare Solaroli (slrgcs..bn-italy.com)
Date: Tue Sep 30 2003 - 13:18:19 EDT

  • Next message: Andrus Adamchik: "Re: DataContext delegate?"

    On Monday, Sep 29, 2003, at 18:52 Europe/Rome, Andrus Adamchik wrote:

    > Hi,
    >
    > I would really like to see many things that you suggested in Cayenne
    > (and don't really object to the reset :-)). Per your discussion with
    > Mike, in any possible implementation, the main issue to address is
    > cross-db compatibility. This can be done in Cayenne, since Query +
    > Expression are abstractions translated to SQL using
    > org.objectstyle.cayenne.access.QueryTranslator. Subclass of
    > QueryTranslator can be returned by DbAdapter, so e.g. EXISTS can be
    > substituted by WHERE count(1) > 0. Default implementation should
    > normally use the least common denominator.
    >
    > On Monday, September 29, 2003, at 05:41 AM, Giulio Cesare Solaroli
    > wrote:
    >> In EOF we have written a few qualifiers (in Cayenne they are called
    >> expressions, if I get it right) like:
    >> - empty relationship;
    >> - not empty relationship;
    >> - no match for value;
    >
    > I guess we need to define expression semantics first, and then see how
    > the SQL translation can be done. Something like Expression.IS_EMPTY,
    > Expression.IS_NOT_EMPTY?
    >
    >
    >> - match all values;
    >
    > IN Expression already takes care of that.

    Here are a few examples I hope can illustrate how a naive
    implementation of the above expressions can lead to differences between
    the common logic and the achieved results.

    Let me try to use a simplified auction scenario; everything is fantasy
    here, so don't take it too seriously. It is just for illustration
    purposes.

      EMPTY RELATIONSHIP EXPRESSION
    -------------------------------
    Suppose we have a "Item" entity and "Bid"; "Bid" has a to-one
    relationship to "Item" ("to_item") and "Item" has a to-many
    relationship with "Bid" ("to_bids").

    Now, suppose we want to find all the items that have no bids; What I
    would like to do is write something like this:

            expression = EmptyRelationshipExpression("to_bids");
            query = new SelectQuery(Item.class, expression);
            result = dataContext.performQuery(query);

    This should generate the following SQL statement:

          select t0.id_item, t0....
          from item t0
            where not exists (select t1.id_bid
                             from t1 bid
                             where t0.id_item = t1.id_item)

    This problem could be much simpler if a simple to-one relationship (or
    list of to-one relationships) had to be followed in order to define the
    expression. In these situations, a "simple" noMatchExp would be
    perfectly fine, but I think that delegating to the user the choice of
    which expression to use is plain wrong. To the user the semantic of the
    two expressions are just the same; the problem is that, due to how the
    entities are related to each other (and how the tables are arranged on
    the DB) two differente solutions needs to be used. But this is a task
    that the code should take care of, as it has all the elements to
    correctly choose the right way to build the SQL statement.

      NOT EMPTY RELATIONSHIP
    ------------------------
    Almost like the empty relationship qualifier, but in the SQL statement
    the where clause looks like "where exists (....)".

      MATCH ALL VALUES
    ------------------
    > IN Expression already takes care of that.

    I have not checked deeply what SQL will be generated by the IN
    Expression, but I guess it will fail to produce the "right" statements
    if a to-many relationship is involved.

    Example: let's take the structure described above and add a "User"
    entity.

    Item <-->> Bid <<--> User

    The "Bid" has a to-one relationship to the "User" entity ("to_user");
    the "User" entity has a to-many relationship to the "Bid" entity
    ("to_bids").

    Now, suppose I am interested in the following expression: "all items
    that have bids done by User_A and User_B"

    The code I would like to write is something like this:

            users = new Vector();
            users.add(User_A);
            users.add(User_B);
            expression = MatchAllValuesExpression("to_bids.to_user", users);
            query = new SelectQuery(Item.class, expression);
            result = dataContext.performQuery(query);

    This should generate the following SQL statement:

          select t0.id_item, t0....
          from item t0, bid t1, bid t2, user t3, user t4
            where t0.id_item = t1.id_item
            and t1.id_user = t3.id_user
            and t3.id_user = <id User_A>
            and t0.id_item = t2.id_item
            and t2.id_user = t3.id_user
            and t3.id_user = <id User_B>

    As you can see, in oder to achieve the same semantic that the
    expression above implies, here we need to "duplicate" the "to_bids"
    relationship, in order to generate a set of distinct joins for every
    value of the values array.

    The problem gets a little bit tougher if many to-many relationships are
    involved in the path to the information I want to check.
    Let's add one more entity, "Appraisal", with two to-one relationship to
    the "User" entity, one for the author of the appraisal, "to_author",
    and the other to the subject of the appraisal "to_subject", both with
    the relevant inverse to-many relationships ("to_published_appraisal"
    and "to_about_appraisal").

    | |<- to_item ---| |--- to_user ->| |<------- to_author
    ----------| |------- to_subject ------>| |
    | Item | | Bid | | User |
             | Appraisal | | User |
    | |-- to_bids ->>| |<<- to_bids --| |--
    to_published_appraisal -->| |<<-- to_about_appraisal --|
    |

    I hope this "sketch" makes the arrangement clear enough.

    Now, I want get the "items with bids done by users that have received
    an appraisal posted by User_A and User_B"
    (this could sound quite artificial, and it may even be, but this same
    condition happened to us solving a real problem, otherwise I would have
    never found this problem in the first place).

    The code looks quite the same:

            users = new Vector();
            users.add(User_A);
            users.add(User_B);
            expression =
    MatchAllValuesExpression("to_bids.to_user.to_about_appraisal.to_author",
      users);
            query = new SelectQuery(Item.class, expression);
            result = dataContext.performQuery(query);

    This COULD (and beware, could, as there is an hidden ambiguity in the
    following solution) generate the following SQL statement:

          select t0.id_item, t0....
          from item t0, bid t1, bid t2, user t3, user t4, appraisal t5,
    appraisal t6, user t7, user t8
            where t0.id_item = t1.id_item
            and t1.id_user = t3.id_user
            and t3.id_user = t5.id_subject
            and t5.id_author = t7.id_user
            and t7.id_user = <id User_A>
            and t0.id_item = t2.id_item
            and t2.id_user = t4.id_user
            and t4.id_user = t6.id_subject
            and t6.id_author = t8.id_user
            and t8.id_user = <id User_B>

    Here the whole relationships followed by the expression have been
    duplicated. But the results will not be quite right; can you see why?

    The problem is that the above query finds "items with bids done by
    users that have received an appraisal posted by User_A AND items with
    bids done by users that have received an appraisal posted by User_B".
    This is a completely different question!

    How should the right SQL statement look like? (I leave the same table
    aliases as in the previous query to simplify the comparison)

          select t0.id_item, t0....
          from item t0, bid t1, user t3, appraisal t5, appraisal t6, user
    t7, user t8
            where t0.id_item = t1.id_item
            and t1.id_user = t3.id_user
            and t3.id_user = t5.id_subject
            and t5.id_author = t7.id_user
            and t7.id_user = <id User_A>
            and t3.id_user = t6.id_subject
            and t6.id_author = t8.id_user
            and t8.id_user = <id User_B>

    Can you spot the difference? Basically, you should avoid to split the
    to-many relationships until the Appraisal entity.
    This means that the original code does not give enough information to
    the library to decide how to behave. This because both the above
    queries are reasonable and can be useful in different contextes.

    Now the code could be changed to add the new information:

            users = new Vector();
            users.add(User_A);
            users.add(User_B);
            expression =
    MatchAllValuesExpression("to_bids.to_user.to_about_appraisal.to_author",
      users);
            expression.dontSplitUpTo("to_bids.touser");
            query = new SelectQuery(Item.class, expression);
            result = dataContext.performQuery(query);

    This syntax is quite random, but should give an idea of what is needed.
    In this case, the default implementation of the expression could try to
    split the relationship on the first to-many relationship found
    following the given path. If the dontSplitUpTo method is invoiced, the
    expression will split the first to-many relationship found AFTER the
    path passed in the method call.

    Have I been clear enough? Can you see my point?

      NO MATCH FOR VALUE
    --------------------
    Here the situation si almost like the "empty relationship" situation,
    but instead of having a simple joing inside of the not exists where
    clause, you can have any expression.

    I avoid further description and examples as this post is getting too
    long, but I could explain further if needed.

    >> [...]

    I hope this LONG message have shown you my point.

    Now the main question is: which is the best place, given Cayenne
    current architecture, to place the logic to handle these cases?
    I will be very pleased to "migrate" our current EOF/Objective-C
    implementation into Cayenne if this is achievable with a reasonable
    effort.

    Thank you very much for you patience.

    Giulio Cesare Solaroli



    This archive was generated by hypermail 2.0.0 : Tue Sep 30 2003 - 13:18:03 EDT