In this post, we tackle the problem of calculating the length of overlap of two date/time intervals. For example, we might have two events — each lasting several days — and need to calculate the number of shared days between both events. In other words, we need to calculate the length of the events’ overlap.
Such tasks are typical in clinical trials programming, health (and non-health) insurance claim processing, project management, planning and scheduling, etc.
The length of the overlap can be measured not only in days, but also in any other units of the date/time dimension: hours, minutes, seconds, weeks, months, years and so on. Moreover, the date/time application is just one special case of a broader task of calculating the overlap length of two integer intervals. An integer interval [x .. y] is a set of all consecutive integer numbers beginning with x and ending with y (boundaries included and x ≤ y).
Instead of suggesting a single “best” way of solving this problem, I offer three different strategies. This allows you to compare and weigh their pros and cons and decide for yourself which approach is most suitable for your circumstances. All three solutions presented in this blog post are applicable to the date/time use case, as well as its superset of integer intervals’ overlap length calculation.
Problem description
Suppose we have two events A [A1 .. A2] and B [B1 .. B2]. Event A lasts from date A1 until date A2 (A1 ≤ A2), and event B lasts from date B1 until date B2 (B1 ≤ B2). Both starting and ending dates are included in the corresponding event intervals.
We need to calculate the overlap between these two events, defined as the number of days that belong to both events A and B.
Overlapping intervals: sample data
Before solving the problem, let’s create sample data to which we are going to apply our possible solutions.
The following data step creates data set EVENTS representing such a data sample:
data EVENTS; input A1 A2 B1 B2; format A1 A2 B1 B2 date9.; informat A1 A2 B1 B2 mmddyy10.; lines; 01/02/2022 01/05/2022 01/06/2022 01/10/2022 01/22/2022 01/30/2022 01/16/2022 01/18/2022 01/02/2022 01/05/2022 01/03/2022 01/10/2022 01/02/2022 01/05/2022 01/03/2022 01/04/2022 01/10/2022 01/15/2022 01/06/2022 01/14/2022 01/01/2022 01/05/2022 01/05/2022 01/09/2022 01/07/2022 01/13/2022 01/10/2022 01/13/2022 ; |
Now let’s go over the following three distinct possible solutions, each with its own merit and downside. At the end, you decide which solution you like the most.
Solution 1: Brute force
In general, brute force algorithms solve problems by exhaustive iteration through all possible choices until a solution is found. It rarely results in a clever, efficient solution. However, it is a straightforward, valid and important algorithm design strategy that is applicable to a wide variety of problems.
For integer intervals overlap calculation, a brute force approach would involve the following steps:
- Determine the earliest date in the definitions of our two intervals
- Determine the latest date in the definitions of our two intervals
- Create an iterative loop with index variable ranging from the earliest to the latest date
- Within this loop, increment a counter by one if index variable falls within both intervals
- At the end of this loop, the counter will equal the sought value of the overlap. Zero value indicates that the two intervals do not overlap.
Here is the code implementation of the described above overlap length calculation algorithm:
data RESULTS; set EVENTS; OVERLAP = 0; do i=min(A1,B1) to max(A2,B2); if (A1<=i<=A2) and (B1<=i<=B2) then OVERLAP + 1; end; run; |
As you can see, the code implementation of the brute force solution is quite simple. However, it is hardly efficient as it involves extensive looping.
Solution 2: Exhaustive logic
Exhaustive logic algorithms solve problems by splitting them into several smaller problems (logical units or classes). This way, instead of working on a problem in its entirety, you can work separately on smaller and simpler problems that are easier to comprehend, digest and solve.
Here is an illustration of various arrangements of integer intervals representing separate logical units (numbered 1...5) when calculating the intervals overlaps:
The following SAS code demonstrates an implementation of the exhaustive logic algorithm (each IF-THEN statement corresponds to the numbered logical unit in the above figure):
data RESULTS; set EVENTS; if A1<=B1 and A2>=B2 then OVERLAP = B2 - B1 + 1; else if A1>=B1 and A2<=B2 then OVERLAP = A2 - A1 + 1; else if A1<=B1<=A2 and B2>A2 then OVERLAP = A2 - B1 + 1; else if B1<=A1<=B2 and A2>B2 then OVERLAP = B2 - A1 + 1; else if B1>A2 or A1>B2 then OVERLAP = 0; run; |
Notice that we need to add 1 to each of the two days difference. That is because the number of days spanned is always 1 less than the number of days contained in an interval. Here is an illustration:
Solution 3: Holistic Math
Finally, a holistic math method looks at the problem in its entirety and condenses it to a comprehensive mathematical formula. While it is usually not the most obvious approach, and often requires some brain stretching and internalization, the effort is often rewarded by achieving the best possible solution.
The exhaustive logic solution may help in better understanding the problem and arriving at the generalized mathematical formula.
If you dwell on the exhaustive logic solution for a while, you may come to realization that all the variety of the logical units boils down to the following simple formula:
In other words, the overlap of two integer intervals is a difference between the minimum value of the two upper boundaries and the maximum value of the two lower boundaries, plus 1.
Positive OVERLAP value represents actual overlap days, while zero or negative value means that intervals do not overlap at all. The higher absolute value of the negative result – the farther apart the intervals are. Therefore, zero and negative values can serve as a measure of the gap between the intervals. If we don’t care about measuring the gap, we can just set all negative values to zero.
The following SAS code implements the described approach:
data RESULTS; set EVENTS; OVERLAP = min(A2,B2) - max(A1,B1) + 1; if OVERLAP<0 then OVERLAP = 0; run; |
We can also zero out the negative values of variable x using max(0, x) expression that results in the following formula for the date ranges overlap calculation:
This all-in-one formula reduces code implementation to a single statement:
data RESULTS; set EVENTS; OVERLAP = max(0, min(A2,B2)-max(A1,B1)+1); run; |
As you can see, this in the most concise, elegant, and at the same time the most efficient solution with no iterative looping and complicated exhaustive (and exhausting) logic.
Output
Here is the output data table RESULTS showing our sample intervals (columns A1, A2, B1 and B2) and the resulting column OVERLAP (all three solutions described above should produce identical OVERLAP values):
Questions? Thoughts? Comments?
Which of the presented here three methods do you like most? Which one would you prefer under different circumstances? Can you come up with yet another way of calculating date intervals overlaps? Do you have questions, concerns, comments? Please share with us below.
24 Comments
Hello Leonid! I hope this message finds you well.
I recently found this blog posting. I'm finding it very useful, but I'd like to ask you a technical question.
I have a list of housing transactions that I need to find the number of overlapping days between all pairs of transactions. Each transaction has a listing date and a sale date, but they are in one column each. So for example:
House Listing Date Selling Date
1 1/1/2024 6/1/2024
2 2/26/2024 8/20/2024
3 2/13/2024 4/1/2024
4 5/14/2024 6/16/2024
5 3/24/2024 8/24/2024
I need to find the number of overlapping days between House 1 and House 2, then House 1 and House 3, etc., subsequently with each House.
Your approach in SAS doesn't match what I have, so I was wondering how to implement it?
I would appreciate any help.
Thanks! Looking forward to hearing back from you.
AJ
Leonid,
Thank you for posting this solution.
I'm comparing between "Confidence Intervals" (CI) from survey estimates.
I removed the "+1" to get the correct answer.
I also calculated the range as a proportion of the widest CI as follows:
overlap=(MIN(ucl_a,ucl_b)-MAX(lcl_a,lcl_b))/(MAX(RANGE(lcl_a,ucl_a),RANGE(lcl_b,ucl_b)));
Great expansion. Thank you for sharing this, Carlos!
I just have a question if we need to understand if 3 or more dates overlap mutually what is the formula we can use?
Eg :
1 - 1.30
1.10 - 1.15
1.20 - 1.25
Hi Vivek,
If you want to determine the overlap length of 3 or more intervals you can just generalize the last formula in this blog, which will be this formula:
Overlap = max(0, min(A2, B2, C2, ...) - max(A1, B1, C1, ...) + 1)
where A2, B2, C2, ... are ending points (upper boundaries) of your intervals, and A1, B1, C1, ... are their beginning points (lower boundaries).
Hi Leonid,
Thank you so much for this post. It is definitely inspirational.
However, I am encountering an issue where I need to identify at least three medication overlapping (I don't need to calculate the overlapping period, but I need to flag it). The overlapping requires medications A, B, and C are all overlapped -- hard task to tackle.
Would you have any clues or ideas on how to solve this case?
Thank you.
Bocheng
Hi Bocheng,the problem you are facing is quite different. It seems to be rather complex combinatorial problem. In a simplified example, to check if three medications are overlapping, I would make sure that the END of each medication interval is GREATER OR EQUAL than the BEGINNINGs of the other two medications. Consider the following code:
Hi Leonid,
Thank you for your response. The sample answer you wrote above was helpful. However, in my case, the overlapping was a bit different. Say a patient was on three medication: antidepressants, antiepileptic, opioids. He has several fills (several start dates and end dates) for each medication. However, the fills for each medication were not in equal length, making it difficult to examine overlap on three medications. Here is an example:
As you can see, the first fill of antidepressants and antiepileptic both started on 8/29/2013 and ended on 9/28/2013 (these two were overlapped), and the first opioid started 9/6/2013 and ended 10/6/2013, which overlapped majority of the antidepressants and antiepileptic. Therefore, this patient is flagged as "having at least three medication overlapped".
I guess the coding should have a nested loop to examine each pair of the medications' overlapping. In this case, it should be antidepressants & antiepileptic, antidepressants & opioid, and antiepileptic & opioid. Then if all pairs were flagged, then this patient was flagged.
You are welcome! I think you are on the right track 🙂 .
Thank you for the great topic. Do you have a solution for the above scenario, I have a similar case about the Provider contracts on overlapping dates. I would greatly appreciate anybody could provide a solution for the nested loop coding to examine each overlapping dates.
Would greatly appreciate your time if you could provide a code solution.
Thanks,
Viranika
Dear Viranika, until you formulate your problem you should not expect a solution.
Thank you for another inspiring post. Your ability to concisely and expertly explain and solve relevant programming issues with your favorite programming tool sets you apart. Since we first met way back during a Y2K project, I’ve benefited from you sharing your SAS knowledge in such an unselfish way.
I am glad to hear from you, Arlen. Thank you for the Y2K memories and for such kind words!
Thank you for continuing your support of our team and sharing your love of SAS Leonid!
Thank you, Gary, for your appreciation. It's very inspiring to get such an acknowledgement from the Director of SAS Federal Consulting Team.
Solution 3 is my favorite. It is elegant and seems obvious after you explain it. Brain stretched me.
Your presentation and explanations are excellent 👌
Deb
Thank you, Deb! Great pick.
Great solutions to pick from!
Another great blog Leonid and of course very useful and informative.
Thank you!
Thank you, Mona! I am glad you find it useful.
Hi Leonid,
Many congrats, this is a great article to kick off the year 2022.
Thanks, Daniel. Happy New Year!
This is great stuff, Leonid. I can see how this would be relevant in healthcare claims data analysis. Especially for determining readmits and episodes.
Thank you, John, for your feedback!