Introduction
Cox regression or the proportional hazards regression is the method conducted with SPSS help for investigating the effects of several variables in the data set upon the time a specified event takes to happen. Hence, the cox regression is known as the time-to-event data analysis method, where the researchers are concerned in analysing the time-to-event data. For example, the cox regression or proportional hazard model is widely used in medical research, where the experts try to use statistical regression models to investigate the association between the survival time of the patients and one or more predictors variables. Hence, the time series data is being analysed well through the cox regression analysis, in which the relationship between the dependent and independent variables are predicted over a specific period of time.
What is the Cox-regression analysis method?
It is a semi parametric survival model, and a regression method. Regression is a statistical method for investigating the relationship between a dependent variable and the explanatory variables. It is also known as covariates and independent variables that have critical impacts on the dependence variable in the data set. Hereby, the cox regression permits simultaneous effects of the several factors or adjusted comparison on survival. In this context, the univariate and multivariate models can also be performed well. The formula for performing cox regression analysis is,
Where:
- t is the survival time;
- h(t) is the hazard function, determined by a set of p independent variables X1i, X2i, .., Xpi for i subjects;
- β1, β2, .., βp are the coefficients (also called parameters) which quantify the statistical relationship between the p covariates and the survival (regression coefficients);
- h0 is the baseline hazard. It corresponds to the value of the hazard if all the Xi are equal to zero.
In the cox regression method, the event variable must be coded as a binary variable 1 and for no event, it sets as zero. The definition of the event for example is death, the observation with an unknown censor value is being considered as missing, and the subjects are removed from the analysis. The survival time has continuous variables and in the example, it is the time from surgery to the event or the censoring. The predictors in this context can be categorical or continuous. It is the most time consuming method, where the exact probability of all the possible orderings of the events is calculated efficiently. In this case, the multivariate model can also be performed in cox regression analytical technique, where the risk factors are associated with a lower 3 year survival. Age, gender, simplified cancer stage, forced expiratory, Neoadjuvant therapy and decortication procedure are included in the model. The result can be interpreted efficiently through regression coefficient. A positive regression coefficient indicates that an increased hazard of death and a negative correlation refers to the fact of a lower hazard. In the example, the regression coefficient of age greater than 70, female gender, decortication, cancer stage more than 1 and Neoadjuvant therapy are positive, as the death rate will be increased with such independent variables of age greater than 70, female gender, decortication, cancer stage more than 1 and Neoadjuvant therapy.
How regression effects coefficients
Through regression, coefficients are always presented with their standard error (SE) that is the measurement of the uncertainty of the regression coefficient. The HR is obtained from the exponential of the regression coefficient, which further gives the effect size of the predictors. In the example, the age variable has a HR of 1.793, which means the hazard, or risk of death in this group of patients above 70 years is about 1.8 times higher than the group of people, aged below 70 years. A 95% confidence interval must be tested for performing the cox regression analysis, and this means that if the estimation process were repeated infinite times, then the 95% of the calculated intervals would contain the true parameter value. The association between the survival and the tested variables in a specific time period is statistically significant if the value 1 is not contained in the interval. The level of significance is set before the beginning of the statistical analysis and it is commonly set at 0.05. In this context, the researchers try to develop a research hypothesis, after identifying the dependent and independent variables in the data set. The alternative hypothesis is that there is a relationship between the dependent and independent variables. On the other hand, the null hypothesis is that there is no such association between the independent and dependent variables in the particular data set in a specific period of time. When the value of P is lower than the threshold, the null hypothesis of no difference in survival between the groups can be rejected in the research and the researchers can accept the alternative hypothesis. In the example, the significance level is being set at 0.05. Age above 70, decortication and pathological stages were significantly associated with survival according to P value and 95% CI. On the other hand, the factors like Female gender, FEV1 equal above 80 and Neoadjuvant therapy variables are related to P value higher than 0.05. Throughout the regression analysis, it can be concluded that, the people aged 70 years, simplified cancer stage above 1 and decortication are significantly associated with an increased hazard of death.
Conclusion
Hereby, the Cox regression model is widely used for analysing the relationship between the dependent variables and the independent variables. It is used widely through formal application and also through SPSS software, where data interpretation and SPSS data analysis will be easier. The value of P and the interval levels must be reviewed for drawing a final conclusion in cox regression. The researchers are efficient to conduct univariate and multivariate analysis with the help of cox regression technique. It is beneficial to conduct time series data interpretation in a systematic way. This further helps in testing the research hypothesis and drawing the final conclusions of the research.