In its most basic form, ordinary least squares (OLS) regression describes the linear relationship between a dependent variable and one or more independent variables by fitting a line or plane to a collection of data points. Most introductory treatments of regression analysis use examples with dependent and independent variables measured at the interval level (e.g., income in dollars and years of formal education). But many researchers work with variables measured at the categorical (nominal) level.
OLS regression should not be used if the dependent variable is categorical. In this situation, one cannot assume a linear relationship between the independent and dependent variables. The relationship is likely to be best illustrated by an S-curve rather than a straight line. Even though OLS is remarkably flexible and can be used to estimate some curvilinear relationships, it is not suited to describing relationships involving categorical dependent variables. Researchers should use logistic regression, probit, or another model designed for limited dependent variables.
OLS can accommodate categorical independent variables, such as gender, region, or marital status. This is done through the use of dichotomous dummy variables representing the categories of the independent variable. Standard practice is to code a dummy variable 1 if a case falls into a specific category or 0 if it does not. Gender, for instance, would yield two dummy variables: Female (1 = female, 0 = otherwise) and Male (1 = male, 0 = otherwise). Region might be transformed into four dummy variables: Northeast (1 = Northeast, 0 = otherwise), Midwest (1 = Midwest, 0 = otherwise), South (1 = South, 0 = otherwise), and West (1 = West, 0 = otherwise).
One of the dummy variables must be excluded from the regression analysis to avoid perfect collinearity between the dummy variables. An important rule of thumb is, if the original independent variable has C categories, include C-1 dummy variables in the regression analysis. The excluded dummy variable will serve as a baseline for interpreting the effect of the included dummy variable. The choice of which variable to exclude is primarily substantive rather than statistical; which baseline will provide the most sensible or interesting interpretation of results?
Take as an example a multiple regression analysis examining the influence of education and gender on annual income:
Annual Income = a + b1 Education + b2 Male
In this example, the dependent variable, Annual Income, can be measured at the interval level in dollars. Similarly, Education can be measured at the interval level as years of formal education. Because gender is a categorical variable, it can be represented by a dummy variable for males, where 1 = male, 0 = otherwise (female). Because the dummy variable for females was left out of the equation, female is the excluded, or baseline, category for gender. The coefficient for Male—b2 in the equation above—will show the average difference in income for men (holding education constant) as compared to women. Suppose this regression analysis yielded the following results:
Annual Income = 31,000 + 1,650
Education + 4,500 Male
The coefficient of 4,500 for the dummy variable Male indicates that, on average, annual income for men exceeds that for women by $4,500, holding education constant. Had the researcher included Female in the equation instead of Male, the b2 coefficient would have been -4,500, showing that, on average, women had an annual income $4,500 less than men. The specific statement of results is different, but the substantive conclusion about gender differences in income is the same.
Bibliography:
- Hardy, Melissa A. Regression with Dummy Variables. Newbury Park, Calif.: Sage, 1993.
- Lewis-Beck, Michael S. Applied Regression: An Introduction. Beverly Hills, Calif.: Sage, 1980.
This example Regression With Categorical Data Essay is published for educational and informational purposes only. If you need a custom essay or research paper on this topic please use our writing services. EssayEmpire.com offers reliable custom essay writing services that can help you to receive high grades and impress your professors with the quality of each essay or research paper you hand in.
See also:
- How to Write a Political Science Essay
- Political Science Essay Topics
- Political Science Essay Examples