[Still LONG] Re: [LONG] reasons why "advanced" expressions are needed

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Thu Oct 02 2003 - 02:35:34 EDT

Next message: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"

Previous message: Giulio Cesare Solaroli: "Re: I am missing Expression architecture"
Next in thread: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Reply: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Reply: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tuesday, September 30, 2003, at 01:18 PM, Giulio Cesare Solaroli
wrote:
> I hope this LONG message have shown you my point.

It sure did. I perfectly understand the deficiencies of Cayenne
expression API in this respect (and "vanilla" EOF for that matter). I
am glad you are not just pointing them out, but also offering an O/R
solution. I am aware of SQL solutions; we just needed a push from
somebody to actually start bringing this to Cayenne :-).

> Now the main question is: which is the best place, given Cayenne
> current architecture, to place the logic to handle these cases?
> I will be very pleased to "migrate" our current EOF/Objective-C
> implementation into Cayenne if this is achievable with a reasonable
> effort.

CAYENNE API

1. Expressions (org.objectstyle.cayenne.exp)

As you noticed, Expression class simply defines semantics of the
expression, and doesn't contain any processing logic. Processing of SQL
generation is done by the access layer. In-memory evaluation is done by
ExpressionEval (called from Expression.eval(), still incomplete). Since
creating expressions directly is sort of counter-intuitive (at least in
their current form), ExpressionFactory static methods are used instead.
Their names (ideally should :-) ) follow the "common logic".

Also note that some of currently defined expression types (mostly
aggregate functions and things like ALL, EXISTS, etc.) are not used or
supported in Cayenne. We may reuse some of them where it makes sense in
the new API described below.

2. QueryTranslator/QualifierTranslator

These access layer classes define algorithms for SQL translation. There
is a default implementation, which can be (optionally) customized by
each DbAdapter (e.g. if some database doesn't support feature X, or
implements it differently).

IMPLEMENTATION IDEAS/NOTES

0. Preparation.

My +1 for the idea to start by creating tests for all the cases we plan
to cover.... Creating upfront user documentation is another thing that
will help us here.... Document cases that absolutely require EXISTS
support in the database and will blow without it.

1. Unit Tests

Though Cayenne is not using DBUnit, it has an extensive testing
framework of its own, and it should be easy to write all relevant
tests. A few hints:

   - Test cases are located under cayenne/src/tests/java.
   - Ant scripts will run all matching "*Tst" as unit tests, so adding a
new test suite is as simple as creating an XYZTst class in an
appropriate package.
   - Subclass CayenneTestCase to get access to the Cayenne stack during
testing.
   - Test DataMap is located under src/tests/resources/test-resources.
Db schema is dropped and recreated on each run (but not on each test of
course).

There were some proposals to use DBUnit in the past, but it didn't get
too far, so we have to use our own API to create test data sets.

2. Expression Semantics.

The fact that Expressions are abstract and do not have any processing
logic should allow to define semantics regardless of how it may be
eventually translated to SQL...as long as there is enough info/hints
collected in the expression.

From what I can tell from the cases provided (if there are more, please
bring them on), we are dealing with a general problem of matching a
*collection* of values against a relationship path (with special cases
being : matching an attribute value instead of relationship, matching a
to-one relationship, matching against a single-object collection and,
finally, matching against a wildcard value, e.g. "relationship not
empty"). Additional logical operation applied after the match is
"none", "any", "all". Since collections can consist of either
DataObjects or scalar values, we may introduce another variable into
the equation - match type (e.g use ">" instead of "="), but lets not do
it just yet, or my head will explode :-). This is how it can possibly
be defined using ExpressionFactory:

[please let me know if this analysis attempt fails to cover any of the
cases we are trying to solve]

// Wildcards...

   /** Qualifier for not empty relationship. Creates unary expression of
type EXISTS.
    * Translated to "FK is not null" for to-one, and "EXISTS" for
to-many.
    */
   public static Expression hasAnyExp(String relationshipPath);

   /** Qualifier for an empty relationship. Creates unary expression of
type NOT_EXISTS.
    * Translated to "FK is null" for to-one, and "NOT EXISTS" for
to-many.
    */
   public static Expression hasNoneExp(String relationshipPath);

// Single objects... some existing API can be reused...

/** Don't think current implementation has to change. */
public static Expression matchExp(String relationshipPath, Object
value);

/** Redo translator for the existing expression to use NOT EXISTS for
to-many. */
public static Expression noMatchExp(String relationshipPath, Object
value);

// Collections...
// additionally have to handle "doNotSplit"... internally i suggest
making OBJ_PATH
// expression a binary expression containing two operands to describe
the full path and the doNotSplit part....
// This can be easily made backwards compatible, assuming "doNotSplit
is fullPath"

   /** Qualifier for matching all of collection values. Creates binary
expression of type EQUAL_TO with
    * second parameter being a collection. Translated to the list of
joins taking split policy into account.
    * Should blow during execution if used with to-one relationship and
collection with size > 1. (??)
    */
   public static Expression hasAllOfExp(String relationshipPath, String
doNotSplitPath, Collection values);

   /** Qualifier for matching any of collection values. Creates binary
expression of type IN with
    * second parameter being a collection.
    * Notes:
    * - Is this any different from our current "inExp", other that
in doNotSplitPath?
    * - Matching on a compound PK DataObject will definitely prevent
using IN, instead a group of OR
    * statements will be needed: ((PK1 = v1 AND PK2 = v2) or ...)
    */
   public static Expression hasAnyOfExp(String relationshipPath, String
doNotSplitPath, Collection values);

   /** Qualifier for matching none of collection values. Creates binary
expression of type NOT_IN with
    * second parameter being a collection. Translated to NOT EXISTS
with correct joins for to-many,
    * or NOT IN for to-one, or NOT ((PK1 = v1 AND PK2 = v2) or ...) for
compound PK DataObjects.
    */
   public static Expression hasNoneOfExp(String relationshipPath, String
doNotSplitPath, Collection values);

3. SQL Translation

SQL Translators is arguably the messiest part of our current codebase.
Anyway, I can identify the following pieces needed in the translator:

1. I like Scott's idea about having support for EXIST and subqueries.
I've been toying with this idea from the day one of Cayenne, but never
got back to actually implementing it.... Doing it independently might
as well give us building blocks to do the spec above.

2. Add support for "doNotSplit" OBJ_PATH and DB_PATH.. This shouldn't
be too hard to do (in a backwards compatible way too - current policy
is "doNotSplit = fullPath").

3. (1) and (2) being prerequisites, implement support for all the new
expressions... may turn out to be less work than it seems now. Esp.
after all the translator refactoring that might be needed by [1], and
fresh eyes looking at the translator's messy flow :-)

I suspect we will need a separate discussion of SQL Translators... I
don't know if it is time for a redesign contest :-/ Maybe it is....
It'll definitely be a good thing to discuss alternative translator
implementations once we stumble on any problems while extending them
(if we don't...oh well). Current test cases should give us a good
cushion in case we have to seriously redo them.

Andrus

Next message: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Previous message: Giulio Cesare Solaroli: "Re: I am missing Expression architecture"
Next in thread: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Reply: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Reply: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2.0.0 : Thu Oct 02 2003 - 02:34:54 EDT