Author Topic: object cache for larger number of cached object handles  (Read 1116 times)

mtsm

  • Newbie
  • *
  • Posts: 13
  • Karma: +0/-0
    • View Profile
object cache for larger number of cached object handles
« on: June 11, 2013, 02:35:50 AM »
Hi,

I have been using the object cache you supplied a while back with one of the pyxll releases.

Lately I have had to deal with sheets that have a sizeable number of cached objects - of the order of a few hundred. Then I observe the following
problem. Whenever I update a single cell a sand clock appears for up to a few seconds to minutes (for about 1000 cached objects).

Can you explain why this happens by any chance? What would be the solution for this?

I am using the object cache as is, i.e. as delivered in your software. Also, I am using the win32 extensions, so all the COM stuff works.

In general the whole object cache thing works like a charm, except for this big cache issue.

BTW, two more details:

1. I know from experience that 1000 or so objects is not a big deal at all in excel. Depending how the object cache is being built, this
can be dealt with without any issues.

2. I noticed the following also, but I am not 100% certain. When I update any single cell even only with a literal, I get the sand clock latency
issue described above. However, when I update a selection of cells using ctrl-enter, then that latency issue seems to be not there (less problematic).
Again can you explain that by any chance?

Thanks so much,
mtsm

Tony

  • Administrator
  • Hero Member
  • *****
  • Posts: 568
  • Karma: +13/-1
    • View Profile
Re: object cache for larger number of cached object handles
« Reply #1 on: June 11, 2013, 07:52:50 AM »
Hi,

I suspect that the callback the example uses to determine whether the cache needs updating or not is what's causing the slow down. It's not very sophisticated and simply loops through all the cells it knows about and tests to see if they're in the range that has changed (see ObjectCacheWorkbookEventHandler.OnSheetChange). In the inner function (check_cell) it constructs and Excel Range object for each item in the cache and checks it against the range that's changed - this could possibly be quite slow as it's calling Excel via COM each time to do that. It could be faster to get the coordinates for the range that's being changed once at the start of the even handling and check those against each cell in the cache instead, just in python without using any COM. You would also want to store the coordinates of each cell in the cache instead of (or as well as) the address in that case to compare them with the range to avoid having to determine them each time.

Before trying to optimise anything, however, you could try commenting out that method to confirm that's the cause of the slowdown. You might also want to put some logging in to see how many times it's being called, as it may be that it's called more often that you would expect if there are lot of dependencies within the sheet that are causing it to be fired over and over as the sheet gets updated.

Another check that would be worth doing would be to leave the method on the class, but comment out the last two lines so that the cache isn't checked each time a change happens (so that method does nothing) just to see if it runs at a normal speed with the event handler being called but not doing anything.

Please can you let me know if it does turn out to be that OnSheetChange method or not, and I will look to see how to improve the example.

Best regards,
Tony

jchen

  • Newbie
  • *
  • Posts: 4
  • Karma: +0/-0
    • View Profile
Re: object cache for larger number of cached object handles
« Reply #2 on: December 14, 2013, 08:28:34 AM »
I also encountered the same issue and i can confirm that it's because of this com thing. Im wondering how to "get the coordinates for the range that's being changed once at the start of the even handling " per your suggestion in the prior reply. Thanks

dblob

  • Newbie
  • *
  • Posts: 16
  • Karma: +0/-0
    • View Profile
Re: object cache for larger number of cached object handles
« Reply #3 on: July 31, 2014, 11:00:24 PM »
Tony,

I have a diff that, I think, fixes this problem.  Not sure how best to send, so I'll copy the basics below.  Idea is that you want to create a range that has all the cells in it which contain objects, then intersect that with the range that changed.  I think this works - it made changing sheets with 100+ objects nearly instant for me.

Best,
Dave



My new ObjectCache.deleteAll:
    def delete_all(self, workbook, sheet=None, cell_filter=None):
        """
        deletes all references in the cache by workbook, worksheet.
        If cell_filter is not None, only the cells returned by
        cell_filter will be deleted
        """
        wb_cache = self.__cells.get(workbook)
        if wb_cache is not None:
            if sheet is not None:
                sheets = [sheet]
            else:
                sheets = wb_cache.keys()

            for sheet in sheets:
                ws_cache = wb_cache.get(sheet)
                if ws_cache is not None:
                    object_cells = ws_cache.keys()

                    if cell_filter:
                        object_cells = cell_filter(object_cells)

                    for cell in object_cells:
                        self.delete(workbook, sheet, cell)


And my new ObjectCacheWorkbookEventHandler.OnSheetChange:

    def OnSheetChange(self, sheet, range):
        # delete all the cells from the cache where the cell is in range
        # and the current value is not the cached object id

        def create_range(sheet, addresses):
            addr_chunks = izip_longest(*[iter(addresses)] * 32)


            result = None
            for addr_chunk in addr_chunks:
                objcell_list = ",".join([ addr for addr in addr_chunk if addr != None ])
                newrange = sheet.Range(objcell_list)

                if result:
                    result = sheet.Application.Union(result, newrange)
                else:
                    result = newrange

            return result

        def cell_filter(cell_list):
            # check this cell is in the range that's changed

            # BLOB: Cache this?
            objcell_range = create_range(sheet, cell_list)
            intersection = sheet.Application.Intersect(objcell_range, range)

            if intersection is None:
                return []

            affected_cells  = [ cell.Address for cell in intersection.Cells ]
            return affected_cells

        if self.__cache is not None:
            self.__cache.delete_all(str(self.Name), str(sheet.Name), cell_filter=cell_filter)


I also import itertools.

-Dave

Tony

  • Administrator
  • Hero Member
  • *****
  • Posts: 568
  • Karma: +13/-1
    • View Profile
Re: object cache for larger number of cached object handles
« Reply #4 on: August 03, 2014, 06:41:16 PM »
Hi Dave,

great, thanks for this. I've started putting more examples on github here: https://github.com/pyxll/pyxll-examples

I've been thinking about moving the examples that come in the download to there too, so it seems like now would be a good time to do that, and then I can incorporate your changes there.

Best regards,
Tony

dblob

  • Newbie
  • *
  • Posts: 16
  • Karma: +0/-0
    • View Profile
Re: object cache for larger number of cached object handles
« Reply #5 on: August 04, 2014, 07:25:38 PM »
That code has a bug - I've rewritten it, but I can't quite tell if your call to Find() was intentional - if you were seeing if the objID is ultimately anywhere in the final changed range.  Range.Find() is an expensive call and is made once per object.   I still think it can be eliminated with intersect.

My logic is roughly:

changed_objid_cells = intersect(objid_cells, changed_range)
for cell in changed_objid_cells:
    if cell.value != cached_objid(cell): delete_cache(cell)

I think this works - even for objects that have moved, because on the move, I think the call that created the object happens again - and results in a new object/new cache entry.

At any rate, maybe it's better to talk privately and then post/release when we have something we both agree works.

« Last Edit: August 04, 2014, 07:31:59 PM by dblob »