Calculating the overlap of date/time intervals

24

3 ways of calculating length of overlapping of two integer intervalsIn this post, we tackle the problem of calculating the length of overlap of two date/time intervals. For example, we might have two events — each lasting several days — and need to calculate the number of shared days between both events. In other words, we need to calculate the length of the events’ overlap.

Such tasks are typical in clinical trials programming, health (and non-health) insurance claim processing, project management, planning and scheduling, etc.

The length of the overlap can be measured not only in days, but also in any other units of the date/time dimension: hours, minutes, seconds, weeks, months, years and so on. Moreover, the date/time application is just one special case of a broader task of calculating the overlap length of two integer intervals. An integer interval [x .. y] is a set of all consecutive integer numbers beginning with x and ending with y (boundaries included and x ≤ y).

Instead of suggesting a single “best” way of solving this problem, I offer three different strategies. This allows you to compare and weigh their pros and cons and decide for yourself which approach is most suitable for your circumstances. All three solutions presented in this blog post are applicable to the date/time use case, as well as its superset of integer intervals’ overlap length calculation.

Problem description

Suppose we have two events A [A1 .. A2] and B [B1 .. B2]. Event A lasts from date A1 until date A2 (A1 ≤ A2), and event B lasts from date B1 until date B2 (B1 ≤ B2).  Both starting and ending dates are included in the corresponding event intervals.

We need to calculate the overlap between these two events, defined as the number of days that belong to both events A and B.

Overlapping intervals: sample data

Before solving the problem, let’s create sample data to which we are going to apply our possible solutions.

The following data step creates data set EVENTS representing such a data sample:

data EVENTS;
   input A1 A2 B1 B2;
   format A1 A2 B1 B2 date9.;
   informat A1 A2 B1 B2 mmddyy10.;
   lines;
01/02/2022 01/05/2022 01/06/2022 01/10/2022
01/22/2022 01/30/2022 01/16/2022 01/18/2022
01/02/2022 01/05/2022 01/03/2022 01/10/2022
01/02/2022 01/05/2022 01/03/2022 01/04/2022
01/10/2022 01/15/2022 01/06/2022 01/14/2022
01/01/2022 01/05/2022 01/05/2022 01/09/2022
01/07/2022 01/13/2022 01/10/2022 01/13/2022
;

Now let’s go over the following three distinct possible solutions, each with its own merit and downside. At the end, you decide which solution you like the most.

Solution 1: Brute force

In general, brute force algorithms solve problems by exhaustive iteration through all possible choices until a solution is found. It rarely results in a clever, efficient solution. However, it is a straightforward, valid and important algorithm design strategy that is applicable to a wide variety of problems.

For integer intervals overlap calculation, a brute force approach would involve the following steps:

  1. Determine the earliest date in the definitions of our two intervals
  2. Determine the latest date in the definitions of our two intervals
  3. Create an iterative loop with index variable ranging from the earliest to the latest date
  4. Within this loop, increment a counter by one if index variable falls within both intervals
  5. At the end of this loop, the counter will equal the sought value of the overlap. Zero value indicates that the two intervals do not overlap.

Here is the code implementation of the described above overlap length calculation algorithm:

data RESULTS;
   set EVENTS;
   OVERLAP = 0;
   do i=min(A1,B1) to max(A2,B2);
      if (A1<=i<=A2) and (B1<=i<=B2) then OVERLAP + 1;
   end;
run;

As you can see, the code implementation of the brute force solution is quite simple. However, it is hardly efficient as it involves extensive looping.

Solution 2: Exhaustive logic

Exhaustive logic algorithms solve problems by splitting them into several smaller problems (logical units or classes). This way, instead of working on a problem in its entirety, you can work separately on smaller and simpler problems that are easier to comprehend, digest and solve.

Here is an illustration of various arrangements of integer intervals representing separate logical units (numbered 1...5) when calculating the intervals overlaps:

Different arrangements of date/time integer intervals

The following SAS code demonstrates an implementation of the exhaustive logic algorithm (each IF-THEN statement corresponds to the numbered logical unit in the above figure):

data RESULTS;
   set EVENTS;
   if A1<=B1 and A2>=B2    then OVERLAP = B2 - B1 + 1; else
   if A1>=B1 and A2<=B2    then OVERLAP = A2 - A1 + 1; else
   if A1<=B1<=A2 and B2>A2 then OVERLAP = A2 - B1 + 1; else
   if B1<=A1<=B2 and A2>B2 then OVERLAP = B2 - A1 + 1; else
   if B1>A2 or A1>B2       then OVERLAP = 0;
run;

Notice that we need to add 1 to each of the two days difference. That is because the number of days spanned is always 1 less than the number of days contained in an interval. Here is an illustration:

Days spanned vs. number of days

Solution 3: Holistic Math

Finally, a holistic math method looks at the problem in its entirety and condenses it to a comprehensive mathematical formula. While it is usually not the most obvious approach, and often requires some brain stretching and internalization, the effort is often rewarded by achieving the best possible solution.

The exhaustive logic solution may help in better understanding the problem and arriving at the generalized mathematical formula.

If you dwell on the exhaustive logic solution for a while, you may come to realization that all the variety of the logical units boils down to the following simple formula:

Overlap = min(A2, B2) - max(A1, B1) + 1.

In other words, the overlap of two integer intervals is a difference between the minimum value of the two upper boundaries and the maximum value of the two lower boundaries, plus 1.

Positive OVERLAP value represents actual overlap days, while zero or negative value means that intervals do not overlap at all. The higher absolute value of the negative result – the farther apart the intervals are. Therefore, zero and negative values can serve as a measure of the gap between the intervals. If we don’t care about measuring the gap, we can just set all negative values to zero.

The following SAS code implements the described approach:

data RESULTS;
   set EVENTS;
   OVERLAP = min(A2,B2) - max(A1,B1) + 1;
   if OVERLAP<0 then OVERLAP = 0;
run;

We can also zero out the negative values of variable x using max(0, x) expression that results in the following formula for the date ranges overlap calculation:

Overlap = max(0, min(A2, B2) - max(A1, B1) + 1).

This all-in-one formula reduces code implementation to a single statement:

data RESULTS;
   set EVENTS;
   OVERLAP = max(0, min(A2,B2)-max(A1,B1)+1);
run;

As you can see, this in the most concise, elegant, and at the same time the most efficient solution with no iterative looping and complicated exhaustive (and exhausting) logic.

Output

Here is the output data table RESULTS showing our sample intervals (columns A1, A2, B1 and B2) and the resulting column OVERLAP (all three solutions described above should produce identical OVERLAP values):

Output data set with calculated days overlaps

Questions? Thoughts? Comments?

Which of the presented here three methods do you like most? Which one would you prefer under different circumstances? Can you come up with yet another way of calculating date intervals overlaps? Do you have questions, concerns, comments? Please share with us below.

Additional resources

Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

24 Comments

  1. Hello Leonid! I hope this message finds you well.

    I recently found this blog posting. I'm finding it very useful, but I'd like to ask you a technical question.

    I have a list of housing transactions that I need to find the number of overlapping days between all pairs of transactions. Each transaction has a listing date and a sale date, but they are in one column each. So for example:

    House Listing Date Selling Date
    1 1/1/2024 6/1/2024
    2 2/26/2024 8/20/2024
    3 2/13/2024 4/1/2024
    4 5/14/2024 6/16/2024
    5 3/24/2024 8/24/2024

    I need to find the number of overlapping days between House 1 and House 2, then House 1 and House 3, etc., subsequently with each House.

    Your approach in SAS doesn't match what I have, so I was wondering how to implement it?

    I would appreciate any help.

    Thanks! Looking forward to hearing back from you.

    AJ

  2. Leonid,
    Thank you for posting this solution.
    I'm comparing between "Confidence Intervals" (CI) from survey estimates.
    I removed the "+1" to get the correct answer.
    I also calculated the range as a proportion of the widest CI as follows:

    overlap=(MIN(ucl_a,ucl_b)-MAX(lcl_a,lcl_b))/(MAX(RANGE(lcl_a,ucl_a),RANGE(lcl_b,ucl_b)));

  3. I just have a question if we need to understand if 3 or more dates overlap mutually what is the formula we can use?
    Eg :
    1 - 1.30
    1.10 - 1.15
    1.20 - 1.25

    • Leonid Batkhan

      Hi Vivek,
      If you want to determine the overlap length of 3 or more intervals you can just generalize the last formula in this blog, which will be this formula:

      Overlap = max(0, min(A2, B2, C2, ...) - max(A1, B1, C1, ...) + 1)

      where A2, B2, C2, ... are ending points (upper boundaries) of your intervals, and A1, B1, C1, ... are their beginning points (lower boundaries).

  4. Bocheng Jing on

    Hi Leonid,

    Thank you so much for this post. It is definitely inspirational.
    However, I am encountering an issue where I need to identify at least three medication overlapping (I don't need to calculate the overlapping period, but I need to flag it). The overlapping requires medications A, B, and C are all overlapped -- hard task to tackle.

    Would you have any clues or ideas on how to solve this case?

    Thank you.
    Bocheng

    • Leonid Batkhan

      Hi Bocheng,the problem you are facing is quite different. It seems to be rather complex combinatorial problem. In a simplified example, to check if three medications are overlapping, I would make sure that the END of each medication interval is GREATER OR EQUAL than the BEGINNINGs of the other two medications. Consider the following code:

      data A;
         input B1 E1 B2 E2 B3 E3 @;
         informat B1--E3 mmddyy10.;
         format B1--E3 mmddyy10.;
         datalines;
      05/02/2022 05/20/2022
      05/15/2022 05/20/2022
      05/14/2022 05/19/2022
      05/02/2022 05/14/2022
      05/15/2022 05/20/2022
      05/14/2022 05/19/2022
      ;
      
      data B;
         set A;
         FLAG = (B1<=E1>=B2 & B1<=E2>=B3 & B1<=E3>=B2);
      run;
      
      

      • Bocheng Jing on

        Hi Leonid,
        Thank you for your response. The sample answer you wrote above was helpful. However, in my case, the overlapping was a bit different. Say a patient was on three medication: antidepressants, antiepileptic, opioids. He has several fills (several start dates and end dates) for each medication. However, the fills for each medication were not in equal length, making it difficult to examine overlap on three medications. Here is an example:

        Medication            Start_dt     End_dt
        CNS_antidepressants   8/29/2013    9/28/2013
        CNS_antidepressants   10/3/2013    11/2/2013
        CNS_antidepressants   11/4/2013    12/4/2013
        CNS_antidepressants   12/7/2013    1/6/2014
        CNS_antidepressants   1/22/2014    2/21/2014
        CNS_antidepressants   2/16/2014    3/18/2014
        CNS_antidepressants   3/19/2014    4/18/2014
        CNS_antidepressants   4/16/2014    5/16/2014
        CNS_antidepressants   5/7/2014     6/6/2014
        CNS_antidepressants   5/11/2014    6/10/2014
        CNS_antidepressants   6/4/2014     7/4/2014
        CNS_antidepressants   6/29/2014    7/29/2014
        CNS_antidepressants   7/27/2014    8/26/2014
        CNS_antiepileptic     8/29/2013    9/28/2013
        CNS_antiepileptic     10/1/2013    10/31/2013
        CNS_antiepileptic     10/26/2013   11/25/2013
        CNS_antiepileptic     12/7/2013    1/6/2014
        CNS_antiepileptic     1/12/2014    2/11/2014
        CNS_antiepileptic     2/16/2014    3/18/2014
        CNS_antiepileptic     3/30/2014    4/29/2014
        CNS_antiepileptic     5/13/2014    6/12/2014
        CNS_antiepileptic     6/7/2014     7/7/2014
        CNS_opioid            9/6/2013     10/6/2013
        CNS_opioid            10/3/2013    11/2/2013
        CNS_opioid            2/7/2014     2/14/2014
        

        As you can see, the first fill of antidepressants and antiepileptic both started on 8/29/2013 and ended on 9/28/2013 (these two were overlapped), and the first opioid started 9/6/2013 and ended 10/6/2013, which overlapped majority of the antidepressants and antiepileptic. Therefore, this patient is flagged as "having at least three medication overlapped".

        I guess the coding should have a nested loop to examine each pair of the medications' overlapping. In this case, it should be antidepressants & antiepileptic, antidepressants & opioid, and antiepileptic & opioid. Then if all pairs were flagged, then this patient was flagged.

        • Viranika Ramya on

          Thank you for the great topic. Do you have a solution for the above scenario, I have a similar case about the Provider contracts on overlapping dates. I would greatly appreciate anybody could provide a solution for the nested loop coding to examine each overlapping dates.
          Would greatly appreciate your time if you could provide a code solution.

          Thanks,
          Viranika

  5. Arlen Harmoning on

    Thank you for another inspiring post. Your ability to concisely and expertly explain and solve relevant programming issues with your favorite programming tool sets you apart. Since we first met way back during a Y2K project, I’ve benefited from you sharing your SAS knowledge in such an unselfish way.

  6. Deborah Summons on

    Solution 3 is my favorite. It is elegant and seems obvious after you explain it. Brain stretched me.

    Your presentation and explanations are excellent 👌

    Deb

  7. This is great stuff, Leonid. I can see how this would be relevant in healthcare claims data analysis. Especially for determining readmits and episodes.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top