I finished something recently, that I am very proud of - I literally had to hold on to my chair at my desk to not jump up and shout Wooo-Hooo! I tore into a cursor and obliterated it!! :) I wanted to talk about the process of the transformation - just thought I would record my thoughts so that hopefully they will help someone.
I work in the health care industry, and the cursor is used to calculate the number of active patients for each day.The dataset has an admit date and a discharge date, along with some other things. Due to the way the transactional database is set up, there are important conditions to make sure you get the correct insurance, etc. for past dates and such. The total number of active patients can change due to late data entry and/or mistakes, therefore the entire summary table is re-built each night.
The query looks similar to this (although I changed it to be able to post it). As a general note, we are pulling from Source System A; but some of the agencies have transitioned to Source System B. We need to capture the data when they were in Source System A before the date they transitioned to Source System B.
declare @Now datetime declare DateCursor cursor for select Date_Range.Date from EDS_ETL..Date_Range order by Date_Range.Date open DateCursor fetch next from DateCursor into @Now while @@FETCH_STATUS = 0 begin -------------------------------------------------------------------- -- insert statement goes here select a.FacilityID ,i.InsuranceType ,@Now as Census_Date -- I kept in some of the logic: -- when: if the Facility is not a Home Health agency (i.e. is Hospice) (fh.ServiceLineID <> 1) -- use the date they transitioned to Source System B (fx.StartDate) -- if @Now is before the transition date then count them -- when: if the Medical Record number does not exist in Source System B (pd.LegacyMRNumber is null) -- then that patient has not transitioned, therefore count them -- else: dont count them ,ISNULL(SUM(CASE WHEN fh.ServiceLineID <> 1 AND fx.StartDate IS NOT NULL AND fx.StartDate >= @Now THEN 1 WHEN pd.LegacyMRNumber IS NULL THEN 1 ELSE 0 END ),0) AS Census_Count FROM dbo.SourceSystemA_ADMISSIONS a INNER JOIN dbo.SourceSystemA_PATIENTS b ON a.PATIENT_ID = b.PATIENT_ID LEFT JOIN dbo.SourceSystemB_PatientDetail pd ON b.MedicalRecordNumber = pd.LegacyMRNumber INNER JOIN dbo.SourceSystemA_INSURANCE i on a.PATIENT_ID = i.PATIENT_ID
AND a.ADMITID = i.ADMITID LEFT JOIN dbo.FacilityHierarchy fh ON b.FacilityID = fh.FacilityID LEFT JOIN dbo.FacilityXSource fx ON fx.SourceID = 9
AND b.FacilityID = fx.FacilityID where -- Find admission as of @Now a.ADMIT_DATE <= @Now AND (a.DISCHARGE_DATE is null OR a.DISCHARGE_DATE >= @Now) AND a.ADMIT_DATE is not null -- Find active insurance date as of @Now AND b.STATUS_DATE <= @Now AND (b.STATUS_END_DATE is null OR b.STATUS_END_DATE >= @Now) -- Some branch_id's are null in the insurance table AND b.BRANCH_ID is not null group by a.FacilityID ,i.InsuranceType -------------------------------------------------------------------- fetch next from DateCursor into @Now end close DateCursor deallocate DateCursor
That isnt a bad looking cursor, but it is bad because it is a cursor that doesnt have to be. This came to my attention because I have multiple reports that tell me when my SSIS packages are running and how long they take, etc. When I was creating the reports, I didnt know what I wanted, so THANK YOU to Jaime Thompson who inspired my own SSRS reports (check out the SSIS Reporting Pack for more info)! So, in reviewing my reports, I tried to go for the lowest hanging fruit, and fix them. The SSIS package that ran the cursor was the worst offender.
In looking at the query, I noticed that what the cursor did was essentially make a "row" for each date within a patient's term of care. Therefore, the cursor was used to loop through and count each day they are an active patient. That led me to my thinking - there has to be a way of doing this better.
I started with the query itself. I trimmed down the query as much as possible (that isnt shown, the query within the cursor above has already been trimmed). Then, I fought here and there, but eventually ended up making that query a sub-query within the FROM, and using a date table to create a row for each day within the patient's care. When I startedat my current job, the data warehouse already had a structure in place for handling the hierarchy of dates (Day, Week, Month, Quarter, Year, Rolling 12 month, Rolling 3 month, etc). I will share that in another blog post, but for now, this date table has 1 record for each day of the year. Therefore, the join between my sub-query and the date table includes BETWEEN. After that, it was relatively easy to tweak it until it was completed. Here is the final output:
select i.FacilityID , i.InsuranceType , D.DAY_DATE , D.DAY_ID , CensusCount = SUM( -- explanation of the logic: -- when: if the Facility is not a Home Health agency (i.e. is Hospice) (fh.ServiceLineID <> 1) -- use the date they transitioned to Source System B (fx.StartDate) -- if the D.DAY_DATE is before the transition date, then count them -- when: if the Medical Record number does not exist in Source System B (pd.LegacyMRNumber is null) -- that means that the patient has not transitioned, therefore count them -- when: if the Medical Record number does exist in Source System B (pd.LegacyMRNumber is null) -- and the D.DAY_DATE is befor the patient started their care in the Source System B, then count them -- else: dont count them case WHEN fh.ServiceLineID <> 1 AND fx.StartDate IS NOT NULL AND D.DAY_DATE < fx.StartDate THEN 1 WHEN x.LegacyMRNumber IS NULL THEN 1 WHEN x.LegacyMRNumber is not null and x.StartOfPatientCareDate > D.DAY_DATE THEN 1 ELSE 0 END ) FROM ( select a.FacilityID ,a.PatientID ,a.AdmitID ,a.AdmitDate ,a.DischargeDate ,pd.LegacyMRNumber ,pd.StartOfPatientCareDate FROM dbo.SourceSystemA_ADMISSIONS a INNER JOIN dbo.SourceSystemA_PATIENTS b ON a.PATIENT_ID = b.PATIENT_ID LEFT JOIN dbo.SourceSystemB_PatientDetail pd ON b.MedicalRecordNumber = pd.LegacyMRNumber where a.ADMIT_DATE is not null ) AS x INNER JOIN LU_DAY D on D.DAY_DATE BETWEEN x.AdmitDate AND ISNULL(x.DischargeDate,DATEADD(d,1,getdate())) INNER JOIN dbo.SourceSystemA_INSURANCE i on x.PATIENT_ID = i.PATIENT_ID AND x.ADMITID = i.ADMITID AND D.DAY_DATE >= i.InsuranceStartDate AND (i.InsuranceEndDate IS NULL OR D.DAY_DATE <= i.InsuranceEndDate) AND i.FacilityID IS NOT null LEFT JOIN dbo.FacilityHierarchy fh ON x.FacilityID = fh.FacilityID LEFT JOIN dbo.FacilityXSource fx ON fx.SourceID = 9 -- this indicates Source System B, so we will get only the transition start date for that AND x.FacilityID = fx.FacilityID group by i.FacilityID , i.InsuranceType , D.DAY_DATE , D.DAY_ID
The end result is still being tested, but everything has come out right so far. I am really excited that actually worked! The process used to take approx 60-90 minutes (depending on what else is being processed on the server at that time), but now takes less than 5 minutes.
When I told my co-workers, they thought it was great, but no-one really got my excitement. To them, they were happy I fixed something. To me, I did something that validated, yet again, that I LOVE what I do. With that, I do have a confession - if you are reading this blog post, I know you get it, so hopefully you walk away excited too. :)
Good luck to your tuning projects and talk to you soon!
P.S. If you copy and paste the code above into an editor, it might not line up correctly. I had to change the tabs/spaces to make it look lined up in this blog post.