DELETING rows of the internal table within the LOOP. Is it a Taboo? A big NO NO?

All these years, I was told that DELETE statement within a LOOP is always prohibited. If you delete any rows inside a LOOP, it is guaranteed that the Quality Reviewer would give you a point and ask you to modify your code to delete the entries outside the LOOP.

After going through Naimesh’s post on performance on DELETE, I wondered how would deleting the entries of the table on which we are looping would behave in two scenarios. This is a very practical scenario where we need to DELETE entries of the LOOPING table based on some checks (For simplicity, I am checking a field of the same table, but in real projects, you might check another table fields and then delete the driver LOOPing table).

Since most of the developers use STANDARD INTERNAL tables, I am analysing only STANDARD internal tables (SORTED/HASHED Tables will have to wait for some other day).. ūüėõ

The two scenario DELETEs we are talking today are:
1) Parallel Cursor Technique with DELETE w.r.t INDEX within the LOOP.
2) Parallel Cursor Technique with marking the row to be DELETED and then DELETING all at one shot outside the LOOP.

Interested in SAP ABAP for HANA? Check our popular series ABAP on SAP HANA Tutorials.

What does your logical mind say?

Let me be honest. Since my training days, almost a decade ago, we were told that DELETE within LOOP is never acceptable. So, I always took the second route. Mark by Parallel Cursor technique and DELETE outside the LOOP.

Later on, as I started understanding the concept of Index and Table scan, I started believing that, DELETING within LOOP in parallel cursor with the target INDEX known, would be better in the view of Performance. Also, I thought that if we mark and DELETE outside loop with WHERE clause, the whole table needs to be scanned for the item to be deleted and it would affect the performance negatively. So I began to believe, DELETING within LOOP should be acceptable too.

But, never before did I try to do the quantitative analysis. Finally yesterday, after months of procrastination, I thought of wearing the statistician’s hat. I spent almost a day playing with the numbers. I tried to increase my sample size as much as I could. I ran the data multiple times in multiple hours of the day (as system can have variable load in different hours of the day) to get the best average figure for the analysis.

Here is the code snippet which I used for doing this comparison. I used the same set of data for both DELETE statements so that I could compare the two in neutral venue. ūüôā

PARAMETERS p_row TYPE i DEFAULT 1000.
DATA:
      i_cdpos      TYPE STANDARD TABLE OF cdpos INITIAL SIZE 0,
      i_cdpos1     TYPE STANDARD TABLE OF cdpos INITIAL SIZE 0,
      ls_cdpos     TYPE cdpos,
      ls_cdpos1    TYPE cdpos,
      lv_sta_time  TYPE timestampl,
      lv_end_time  TYPE timestampl,
      lv_diff_w    TYPE p DECIMALS 5,
      lv_no        TYPE i.
FIELD-SYMBOLS:
      <lfs_cdpos>  TYPE cdpos.

* Get entries from Change Log Item table
SELECT * UP TO p_row ROWS FROM cdpos
  INTO TABLE i_cdpos.

* Sorting for BINARY Search
SORT i_cdpos BY objectclas.
* Keeping a copy in another table
i_cdpos1[] = i_cdpos[].

Have questions in SAP HANA? Check this: SAP HANA from Space Level.

1) Parallel Cursor Technique with DELETE w.r.t INDEX within the LOOP.

* Start time
GET TIME STAMP FIELD lv_sta_time.
* Scenario1: Parallel Cursor with DELETE within LOOP (The driver table rows are deleted)
LOOP AT i_cdpos INTO ls_cdpos.
  READ TABLE i_cdpos TRANSPORTING NO FIELDS
                     WITH KEY objectclas = ls_cdpos-objectclas
                     BINARY SEARCH.
    IF sy-subrc EQ 0.
      LOOP AT i_cdpos INTO ls_cdpos1 FROM sy-tabix.
        IF ls_cdpos-objectclas NE ls_cdpos1-objectclas.
          EXIT.
        ELSEIF ls_cdpos1-chngind EQ 'I'.
*   DELETING with INDEX
          DELETE i_cdpos INDEX sy-tabix.
        ENDIF.
      ENDLOOP.
    ENDIF.
ENDLOOP.
* End time
GET TIME STAMP FIELD lv_end_time.
* Time taken for LOOP and DELETE
lv_diff_w = lv_end_time - lv_sta_time.
WRITE: /(25) 'DELETE Inside Loop', lv_diff_w.
* Entries left in table
lv_no = lines( i_cdpos ).
* Number of entries DELETED
lv_no = p_row - lv_no.
WRITE:/(25) 'No of entries deleted', lv_no

2) Parallel Cursor Technique with marking the row to be DELETED and then DELETING outside the LOOP.

CLEAR: lv_no, lv_sta_time, lv_end_time, lv_diff_w, ls_cdpos.
* Start time
GET TIME STAMP FIELD lv_sta_time.
* Parallel Cursor with MARKING of rows to be DELETED
* Actual DELETE outside of the LOOP
LOOP AT i_cdpos1 INTO ls_cdpos.
  READ TABLE i_cdpos1 TRANSPORTING NO FIELDS
                      WITH KEY objectclas = ls_cdpos-objectclas
                      BINARY SEARCH.
    IF sy-subrc EQ 0.
      LOOP AT i_cdpos1 ASSIGNING <lfs_cdpos> FROM sy-tabix.
        IF ls_cdpos-objectclas NE <lfs_cdpos>-objectclas.
          EXIT.
        ELSEIF <lfs_cdpos>-chngind EQ 'I'.
* MARKING the ROW to be DELETED.
* Outside the LOOP, objectclas field would be used to identify
          CLEAR <lfs_cdpos>-objectclas.
        ENDIF.

      ENDLOOP.
    ENDIF.
ENDLOOP.
* DELETE the MARKED rows at one shot
DELETE i_cdpos1 WHERE objectclas IS INITIAL.

* End time
GET TIME STAMP FIELD lv_end_time.
* Time taken for LOOP and DELETE
lv_diff_w = lv_end_time - lv_sta_time.
WRITE: /(25) 'DELETE Outside Loop', lv_diff_w.
* Entries left in table (this should be same as above)
lv_no = lines( i_cdpos1 ).
* Number of entries DELETED
lv_no = p_row - lv_no.
WRITE:/(25) 'No of entries deleted', lv_no.

The time taken is for the complete activity set i.e. complete LOOP and DELETE in both scenarios.

Scared of programming in OOPs ABAP? Check this: Just 4 Versions of the same program to understand OOPs ABAP

Below is the output data which I collected after numerous runs.

Table 1. Multiple DELETE within LOOP comparison with respect to DELETE outside LOOP
Graph 1. Chart showing tables with rows 13K to 30K has better performance while DELETE within LOOP

The first glance of the data shows; for table rows with entries less than 10,000; DELETE outside LOOP works better. Look at column 6 (% change). This column is the percentage change between DELETE outside LOOP and DELETE within LOOP. Reference is DELETE outside LOOP.

Are you an ABAPer? A must read for you. SAP ABAP on HANA. Part I: Create your first program in HANA Studio.

How to interpret column 6.
Negative number means, DELETE outside LOOP is better.

If you go through the above table 1 or graph 1, it clearly depicts that DELETE inside LOOP using INDEX and PARALLEL Cursor is clear winner from 13,000 to 30,000 records in internal table. But again after 35,000 rows in internal table, DELETE outside LOOP is the winner.

So, it made me conclude (wrongly) that:
i) For small internal tables, less than 10,000 rows, one DELETE outside LOOP was good.
ii) For medium tables 10K to 30K row entries, DELETING using index within LOOP in PARALLEL cursor was good.
iii) For large tables 35K and greater, one DELETE statement outside LOOP has better performance.

Please note, I have mentioned I wrongly concluded. The above three statements are NOT completely correct.

I was still not convinced with above result. I was wondering, if there is any other factor which is influencing the result other than the number of entries.

Then I decided to run the program in a different client. I ran for the same numbers of records starting from 1000 till 40,000.

Table 2. Multiple DELETE within LOOP showing negative improvement over DELETE outside LOOP

Table 2 shows the result of running the same program in different client. The output does not tally with Table 1 results at all. Here one DELETE statement outside the LOOP is clear winner. The age old funda passed on by seniors/mentors holds good here. DELETE inside LOOP is never advisable.

Graph 2. Charts showing % improvement of DELETE within LOOP even after following Parallel Cursor Technique is NEGATIVE

Look at the graph, the performance of DELETE within LOOP is below zero (i.e not better than DELETE outside LOOP)

[adToAppearHereLink]

Also Read: Lazy and Smart ABAPers.

But, I wanted to know the reason. WHY? Why is that in the other client, there was different pattern and in this client DELETE outside LOOP is better?

Then, I happened to notice the ‚ÄėNo of Rows Deleted‚Äô column 2. I began to develop my theory and started doubting, is this one of the several factors which is influencing the overall performance.

Referring to Table 1
Out of 1000 enteries in table, 430 were deleted. So by the time the last entry was deleted, i.e. 430th entry, there were 1000-429 = 571 rows in the table. So, when 430th was deleted, indexes were regenerated for 570 rows in the internal table.

Which method impacts the performance more?

a) Regenerating the indexes every time the internal table row is deleted in LOOP; or
b) Regenerating the indexes when it encounters the DELETE statement only once outside the LOOP.

Depending on the cost of the regeneration of indexes, probably the performance depends.

SAP Help document says: When deleting rows in an internal table, costs are incurred for updating all existing table keys and table indexes.

Food for thought.
In Table 2, line with 30,000 records in internal table. Which method would have better performance.
a) Deleting 1931 rows in LOOP and regenerating the index in every loop for maximum of 29999 rows (first delete) and minimum of 28069 records (last delete)? 

When the first row is deleted, there are 29,999 rows in internal table to be re-indexed. When the last row i.e. 1931th is deleted, there are 28,069 rows in internal table to be re-indexed. In total, the internal table is re-indexed 1931 times in LOOP.

Also Read: Google Map API in SAP.

[ad3ToAppearHere]

You might also like: SAP Adobe Forms for Beginners.

b) Deleting 1931 rows at one shot outside the LOOP and re-indexing?

DELETING outside LOOP seems superior even though in the PARALLEL cursor we are DELETING using INDEX. But, may be the cost of updating the existing table keys and table indexes downgrades the advantage of PARALLEL CURSOR DELETE using INDEX.

I am not sure, how correct the above analysis is. But, one thing is clear, internal table re-indexing cost should be considered before deciding which route to go.

We can safely conclude that performance of the two scenarios is dependent on:
i) The number of entries in the table
ii) The number of entries to be deleted in the table
iii) The total number of entries left after delete statement
iv) Total number of times the index of the internal table need to be regenerated

And for now, DELETE outside LOOP seems to have an Upper Hand.. (I am stating something which might attract lots of criticism/debate from Parallel Cursor Technique lovers).

I would like to request experts and mentors to throw more light on this topic. If you have better explanation to this topic, please let us know. Also please point us to the right source/link which are authentic and where we can get more answers and better understanding.

Please share your experience and thoughts so that we might learn from your experience. Did you every try to do a similar analysis? What were your results?

 

Image source: kognitio.com.