Re: Idea for index/limit support

From: Gili (cowwo..bs.darktech.org)
Date: Tue Sep 06 2005 - 10:59:28 EDT

  • Next message: Andrus Adamchik: "Re: Idea for index/limit support"

            Michael, to the best of my knowledge Hibernate provides index/limit
    support in a database-agnostic way without fetching all the results, so
    this *is* possible. Even if not all databases support this feature we
    should support it for those who do (which are the majority). The
    performance benefit is huge.

            If Cayenne wants to know how many results there are, issuing a "select
    count(*)" query should be enough and far more efficient than actually
    retrieving a list of all PKs.

            Lastly, I don't think 100,000 records is a lot. I mean, this is what
    databases are all about: massive amounts of data. We should be designing
    for this kind of thing, as it is not the exceptional case. Currently
    I've got 2,464 records in one of my tables and my application is just at
    its birth stages. In another year I wouldn't be surprised if I reach or
    surpass 100,000 records.

    Gili

    Gentry, Michael (Contractor) wrote:
    > Cayenne (or another ORM) will always need to know how many records are
    > in the relationship. It doesn't matter if it fetches the entire object
    > or the primary key, it still needs to fetch all of them. There is no
    > database-agnostic way of fetching N records starting at record 300, so
    > Cayenne/etc needs to have them all (or at least the PKs) in order to
    > function. If it only fetched the PKs, it can still do 1 SELECT passing
    > the DB N PKs (I'd like to buy a vowel there), which is at least fairly
    > efficient. If you have 100,000 records in a relationship, you might
    > have bigger issues, too ... :-)
    >
    > /dev/mrg
    >
    >
    > -----Original Message-----
    > From: Gili [mailto:cowwo..bs.darktech.org]
    > Sent: Tuesday, September 06, 2005 10:41 AM
    > To: cayenne-deve..bjectstyle.org
    > Subject: Re: Idea for index/limit support
    >
    >
    >
    > Right, this is related to pagination. All I'm saying is that if
    > I have
    > 100,000 results (not necessarily blobs) and I only want to pick up the
    > first five results, I should be able to do just that. Hitting all
    > 100,000 results when I know in advance I will not read them is a huge
    > waste of both memory and time.
    >
    > Regarding my suggestion, the default would be the current
    > behavior
    > (i.e. retrieve all values) but I'm saying it would be nice to have the
    > option to tweak this further. To my understanding, even with Pagination,
    >
    > Cayenne will retrieve all primary keys from a result. I saw an open JIRA
    >
    > issue for this I think.
    >
    > Gili
    >
    > Gentry, Michael (Contractor) wrote:
    >
    >>I really can't imagine my to-many faults not returning everything in
    >
    > the
    >
    >>relationship. I certainly wouldn't want it to only fault in what I
    >
    > try
    >
    >>to read (that would require far too many SELECTs and hurt
    >
    > performance).
    >
    >>And it's probably too arbitrary to pick N records for it to read in at
    >
    > a
    >
    >>time.
    >>
    >>Given your example, it still seems like you are still having your BLOB
    >>data inside a table with other data instead of isolated. So, when you
    >>fault your to-many relationship, you are getting all of the BLOB data
    >
    > (N
    >
    >>rows) in addition to the other attributes. This is almost always
    >
    > going
    >
    >>to be a lose situation using ORM.
    >>
    >>All that being said, perhaps a paginated query could help you out?
    >>
    >>
    >
    > http://www.objectstyle.org/cayenne/userguide/perform/paged-queries.html
    >
    >>/dev/mrg
    >>
    >>-----Original Message-----
    >>From: Gili [mailto:cowwo..bs.darktech.org]
    >>Sent: Monday, September 05, 2005 3:29 AM
    >>To: cayenne-deve..bjectstyle.org
    >>Subject: Idea for index/limit support
    >>
    >>
    >>
    >> Aside from the obvious idea that queries should be able to set
    >>index/limit, I'm thinking we should modify the behavior of
    >
    > relationship
    >
    >>getting methods. Currently they return a List from xxxArray
    >>relationships. I'm thinking we should implement the List interface,
    >
    > and
    >
    >>provide special index/limit/fetchSize methods in our implementation.
    >>This way old code continues working as is and new code that wishes to
    >>optimize performance does something like this:
    >>
    >> List images = dataObject.getImageArray();
    >> CayenneList typedImages = (CayenneList) images;
    >>
    >> // (optional) configure List before using it
    >> typedImages.setFetchSize(10);
    >> typedImages.setMinIndex(15);
    >> typedImages.setMaxIndex(30);
    >>
    >> // use list like we always did
    >> typedImages.get(20);
    >>
    >> now... min/max indexes are hints to the List implementation as
    >>to what
    >>index/limit values to issue the query with. If a user tries to get()
    >
    > an
    >
    >>index outside this range we will issue a second query but this is
    >>unlikely to happen. The typical usage scenerio would be if the above
    >
    > had
    >
    >>100,000 rows in the result we'd see a noticable performance benefit of
    >
    >
    >>only retrieving 15 rows (index 15 - 30 inclusive). And best of all,
    >
    > this
    >
    >>change is completely backwards compatible with older releases. What do
    >
    >
    >>you think?
    >>
    >>Gili
    >
    >

    -- 
    http://www.desktopbeautifier.com/
    



    This archive was generated by hypermail 2.0.0 : Tue Sep 06 2005 - 10:59:28 EDT