Basic Econometrics Research Report Group Assignment -
This assignment uses data from the BUPA health insurance call centre. Each observation includes data from one call to the call centre. The variables describe several characteristics of the call (eg the length of the call, the amount of silence in the call), characteristics of the customer (eg state of residence, family type, number of adults and children), and measures of performance (eg net promoter score, sentiment score of the customer). In this assignment we are interested in predicting the net promoter score and the length of the call.
Please use the dataset CallCentre.dta and associated information file CC_DEFINITIONS_.XLSX to answer these questions. Use the software program STATA 15 available through RMIT MyDesktop for all data analysis. This is a group assignment where you can work alone or with up to three other students (a maximum group size of four). All group members will receive the same marks for the assignment. You must submit an electronic copy of your assignment in Canvas in pdf, doc or docx format. Hard copies will not be accepted. Show your tables and calculations as well as answering the questions in full sentences. Please make sure your tables of results are neatly formatted, not just copied and pasted from STATA, and that you write your answers in clear sentences. You should write no more than 1000 words (not including tables/calculations) in total for this assignment. The number of words, tables, graphs, calculations given in parentheses after each question are a guide.
1. Calculate descriptive statistics using the 'summarize' command for the variables net_promoter_score, total_silence, total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted and present the results in a table. Comment on what we learn about these variables from the descriptives. Graph a scatter plot of net_promoter_score against agent_crosstalk_weighted and describe the relationship between these two variables. (100 words, 1 table, 1 graph)
2. Estimate a multiple linear regression with net_promoter_score as the dependent variable and total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted as the explanatory (independent) variables. Predict the change in net_promoter_score associated with a 0.1 increase in total_silence_weighted and a 0.01 increase in agent_crosstalk_weighted. Assuming this is the correct model specification, are we sure that total_silence_weighted has a negative effect? [Hint: consider the t-statistic and p-value] (50 words, 1 table, 2 calculations)
3. Add dummy variables to the regression to control for all of the potential effects of State and Package. Make sure the base category is customers with the "HOSPITAL AND EXTRAS" package in NSW. Carefully interpret the estimated coefficient on the package1 dummy variable you have included. Why is this NOT a very important result? [Hint: Use the variable labels to include and interpret the correct variables, consider the descriptive statistics of the dummy variables to interpret their importance] (50 words, 1 table)
4. Include a quadratic specification of the variable "sentiment_score_cust" in the model along with the existing explanatory variables. Calculate and interpret the marginal effect of a 1 point change in "sentiment_score_cust" when sentiment_score_cust = 1 and when sentiment_score_cust=4. (50 words, 1 table, 2 calculations)
5. Explain the conditional mean independence assumption and assess its relevance with respect to the explanatory variable "sentiment_score_cust". [Hint: Think about factors that may be included in the error term of the regression: the customer's experience with the company (positive or negative), the general attitude of the customer towards call centre conversations (positive or negative) and whether these may be correlated with sentiment_score_cust] (100 words)
6. As agent time is a cost to their business, BUPA may also be interested in predicting lcall_duration (the natural log of call_duration). Design a regression model to predict lcall_duration. Choose the explanatory variables to include, and whether to include them as dummies/ logs/ polynomials/ interactions as you feel appropriate. Present the results of the descriptive statistics and your final regression model in tables. Discuss the statistical significance of the explanatory variables in your model. Discuss how you have designed your model with reference to the "Gauss Markov" assumptions and whether these assumptions are likely to be met. Interpret the results of THREE of your explanatory variables, which you consider to be the key drivers of lcall_duration (ie the length of the call). Do NOT include the variables net_promoter_score, nps_group3, sentiment_score_cust, call_duration or call_durationsq in your model. (400 words, 2 tables, 3 calculations).