For this assignment, you will be provided a database backup for a database called FinanceDB. You will have to restore this backup to your own version of SQL Server. All of the questions in this assignment relate to the FinanceDB database unless specifically stated otherwise.
Most questions will require you to write a query using t-SQL. Queries should be copied into your answer document and formatted so that they are easily readable. Your queries should follow the style guide used by SQL Services, which you should all be familiar with by now. Some questions require you to create a visualisation using PowerBI. For these questions, you can submit a screenshot of the visualisation as part of your answer.
Some questions a intentionally ambiguous. In the real world, you would be able to sit down with your users to clarify exactly what they want. Obviously you can't do this here. So, if you believe that there are more than one way to interpret a question, you should consider all reasonable alternatives and either: a) choose the most reasonable interpretation and justify your choice or, b) provide all interpretations with a brief comment as to their differences. You should always justify your decisions. Where required, please provide concise explanations for your decisions.
Part A - FinanceDB Reporting Queries
You are to work on a series of questions that help the Finance team understand their regional sales performance. Use the FinanceDB database to answer each of the following questions with an SQL query.
Section A
A database backup for FinanceDB is available from here. Download this backup and restore it to your SQL Server instance. Once it has restored, provide a screenshot that shows the FinanceDB database in your SQL Server Object Explorer.
Section B
All t-SQL follows the appropriate style as per all class examples.
Section A - Query Writing
This section has a series of questions which will require you to a) write a t-sql query, b) produce some basic visualisations using PowerBI and c) provide brief answers to short answer questions.
You should include all t-sql
Query one:
This company has a presence in 5 countries across 5 industries (Segments) within each country. Calculate the total sales per year and the total profit per year for each Country / Segment. Note that profit can be calculated:
1A: Run this query and include a screenshot of the results.
1B: Produce one or more visualisations using PowerBI to display this information.
Based on your visualisations, which region performed the best? Which region performed the worst?
Query two
2A:
Each sales person has a yearly sales KPI. This is their yearly sales target which they are expected to meet. I'd like you to use this information to calculate a yearly sales KPI for each Country and Segment:
Include your t-sql below.
2B:
Once you have calculated this KPI, calculate the yearly performance against the KPI (i.e. if the KPI for Mexico, Midmarket is $100,000 and the total sales was $110,000, then the yearly performance would be 110%). Include your t-sql below.
2B: Produce one or more visualisations in PowerBI to show this information.
Query three
3A: A lot of information about sales performance is lost when it is aggregated yearly. Change your query from (Query Two 2B) to calculate the month-by-month total sales performances and plot these data in PowerBI.
3B: What general conclusions can you draw from this visualisation? Justify your reasoning.
Query Four
Finally, the company wants to reward the best performing sales people. But they don't really know what they mean by "best performing".
4A: Explain how could you rank & compare each salesperson's performance?
4B Create a query & one or more visualisations that allows the company to explore the performance of their salespeople. Include the t-sql and a screenshot of the visualisations below.
4C Using your results, which salespeople do you believe are the "top 10 best performers"?
Section B - Query Performance and indexing
Question B
Run the following query and review the execution plan:
select
year(so.SalesOrderDate) as SalesYear,
c.CountryName,
s.SegmentName,
sp.FirstName,
sp.LastName,
p.ProductName,
count(*) as TotalProductSales,
sum(case when sli.PromotionID = 0 then 0 else 1 end) as TotalPromotionalSales
from SalesOrderLineItem sli
inner join Product p on p.ProductID = sli.ProductID
inner join SalesOrder so on so.SalesOrderID = sli.SalesOrderID
inner join SalesRegion sr on sr.SalesRegionID = so.SalesRegionID
inner join SalesPerson sp on sp.SalesPersonID = sr.SalesPersonID
inner join Region r on r.RegionID = sr.RegionID
inner join Segment s on s.SegmentID = r.SegmentID
inner join Country c on c.CountryID = r.CountryID
where year(so.SalesOrderDate) > 2012
group by
year(so.SalesOrderDate),
c.CountryName,
s.SegmentName,
sp.FirstName,
sp.LastName,
p.ProductName
;
B1A: What are the most expensive operations in this query execution plan? Include the relative cost of each operation you identify.
B1B: What is a clustered index scan? Why can this be a problem for performance? When would it not be a major concern?
B1C: Design an index to remove the clustered index scan on SalesOrderLineItem. Include the t-sql you used to create the index.
B1D: After creating your index, review the execution plan again. Did this index substantially reduce the relative execution cost of querying data from SalesOrderLineItems?
B1E: Describe what indexes are used for and when they improve query performance.
B1F: In what situations would you limit the number of indexes you have on a table and why.
B1G: Explain whether you would keep the index you created in B1C.
Question B2
Review the following query:
with monthly_sales_info as (
select
sales_info.SalesMonth,
c.CountryName,
s.SegmentName,
sales_info.PromotionRate,
sales_info.TotalMonthlySales
from Region r
inner join Country c on c.CountryID = r.CountryID
inner join Segment s on s.SegmentID = r.SegmentID
inner join SalesRegion sr on sr.RegionID = r.RegionID
left join (
select
so.SalesRegionID,
so.SalesMonth,
sum(case when sli.PromotionID = 0 then 0.0 else 1.0 end) / count(*) as PromotionRate,
sum(SalePrice) as TotalMonthlySales
from SalesOrder so
inner join SalesOrderLineItem sli on sli.SalesOrderID = so.SalesOrderID
group by
so.SalesRegionID,
so.SalesMonth
) sales_info on sales_info.SalesRegionID = sr.SalesRegionID
)
select *
from monthly_sales_info
where SalesMonth >= '2016-01-01';
B2A: In simple terms, explain the business question which this query is addressing.
B2B: A developer has suggested creating the following index to improve the query:
create index idx_promotions on SalesOrderLineItem (PromotionID, SalesOrderID);
- Review the execution plan before creating the index. What part of the execution plan do you think the developer is trying to improve? Include a screenshot of this part of the execution plan
- Create the index and review the execution plan again. Has the index improved this part of the execution plan? Explain why
- Drop this index and create a suitable index to improve the execution of this query. Include a screenshot of the new execution plan.
- Has your index improved the part of the execution plan that you expected it to? (i.e. has it substantially decreased the execution cost of this part of the plan?). If so, why? If not, what has it done?
B2C
Have a careful look at the results from the query above. Notice that there is a row for each Country / Segment every month. Adjust this query so that it only returns the Country / Segment with the highest TotalMonthlySales in each month. You should get 12 rows.
Note that there are a few different ways that you could write this query and get the correct result.
- 3 marks will be given for the correct solution.
- 2 marks will be awarded for a simple, elegant approach.
Include your query below and a screenshot of the results.
Section C - Query Refactoring
This section has one question which requires you to first understand and then refactor a badly performing query. You should try to simplify this query as much as possible, balancing readability and performance. You should investigate potential indexes to improve the performance of this query.
Note that this query is very similar to a real query that I had to refactor for a client a number of years ago. It's not always easy... Take your time and try to break it down into small pieces. Aim to understand all the small parts and then combine them back up to create the big picture.
While I was creating this question, the original query took ~40 seconds to run on my laptop. After my changes, I was able to get my revised query down to < 2 seconds.
You should include all t-sql (copy and paste and then format it so that it is easy to read) and screenshots of relevant parts of the execution plans where appropriate
To maximise their future profits, the Marketing Team need to be able to track the margin (profitability) and discount on all orders in real-time . They have had a business analyst attempt to write a SQL query which tracks information about every order. An example of the output of the query is shown below for you:
Question CA:
Review the query execution plan and clearly describe why this query will not scale well.
Question CB:
Rewrite this query so that it is scalable. Include your t-sql code below.
Question CC:
Run both the original query and your version of the query. Review the execution plans of both queries. Make any additional changes that will improve the performance of this query.
- Explain simply how has the execution plan changed from the original query to your query?
- Make any additional changes (for example indexing) that you think would help. Include the t-sql for these changes below
- Run both queries together and include a screenshot that shows the relative costs of both queries
- Include a screenshot of the execution plan of your query after all changes have been applied.
Attachment:- SQL Query.rar