'R SUR regression with systemfit gets error "LU computationally singular: ratio of extreme..." can work around but still concerned about error margins
Before I get into the problem, I want to acknowledge that I have seen that there is a previous question that has been answered, and it gave me an idea for a work around for getting the regressions together for presentation, but I am still somewhat worried about the differences in standard error, since I intend to include the confidence interval. I also want to know whether what I want to do is actually possible or if there was something wrong on my end that I have overlooked, and I also want to understand more about the function/package in any event.
The actual code I was trying to run was basically basically as follows, on a dataset of nearly 22000 anonymized patients with cost and demographic data, and dividing up the :
systemfit::systemfit(formula = list(
TotalCost = Cost_Total_After - Cost_Total_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.EmergencyMed = Cost_EM_After - Cost_EM_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.GeneralMed = Cost_GM_After - Cost_GM_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.MentalHealth = Cost_MH_After - Cost_MH_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.NeurologyAndPain = Cost_NeuroPain_After - Cost_NeuroPain_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.PhysRehab = Cost_PR_After - Cost_PR_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.SensoryHealth = Cost_SensoryH_After - Cost_SensoryH_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.SpecialtyMedicine = Cost_SpecialtyMed_After - Cost_SpecialtyMed_Before ~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition,
Cost.Surgery = Cost_Surgery_After - Cost_Surgery_Before~ gender + AgeGroup + ethnicity + Other_Demographics + Whether_has_certain_medical_condition
),
method = "SUR",
data = droplevels(patientCost) )
The actual resulting error is
Error in .solve.dgC.lu(as(a, "dgCMatrix"), b = b, tol = tol) :
LU computationally singular: ratio of extreme entries in|diag(U)| = 4.666e-20
Additional information is that I have done a SUR almost identical to this, but the dependent variable was based on number of healthcare encounters instead of costs, and that SUR completed successfully with no error. The only difference I can think of for that is that the numbers must be lower. This means that one hypothesis I had, that it was being thrown off through something similar to multicollinearity or such because of the fact that the total cost difference is based on the sum of the pre- and post- costs added up to those in the TotalCost
equation. (This was a legitimate problem I had in earlier attempts which I have since corrected.)
Since the predictor variables are all identical in each of the functions within the SUR, they are all effectively OLS regressions, so I am able to solve the immediate problem by tacking on a basic lm(Cost_Total_After - Cost_Total_Before ~...)
to the SUR without that first formula.
I still would like more insight into what may be going wrong and how I can improve things going forward on this. I am also somewhat concerned that the confidence intervals may be affected. Would that measure be meaningfully thrown off by my strategy? If it is, how should I deal with it?
If this whole thing is presented confusingly, I apologize, I am new to actually participating on the site, and am still figuring things out. I can't think of a way to create sample data for you that is close enough to mine that you could use to duplicate it. If you want to try doing so on you own, most of the demographic variables, including AgeGroup
and ethnicity
are categorical variables in factor
or character
format, though at least one is a logical
'yes/no' thing, as is Whether_has_certain_medical_condition
(not what I actually named it).
If you need any more information, I can try to help answer that for you.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|