**M****arks**

**A****ssignment 2**

• This assignment is marked out of 100 marks and worth 10% of the assessment for this unit.

**D****u****e date**

**Assignment Data and Questions:**

• In order to answer the questions, you will need to refer to the Excel data file provided in

**Assignment **section on Moodle.

**S****u****b****m****ission of Answers:**

1. You must submit your assignment online via the upload link in the Assignment section.

There are 8 questions and for each question, you can attach a Word/PDF/PNG/JPEG file

OR type your answers in the text box.

2. The link is set up using a Quiz tool, but it is not a Quiz, it is an Assignment.

**P****re****s****e****n****t****ation**

1. Remember, ** how **you present your answers is

2. Always provide evidence and make appropriate references to tables and charts produced.

3. You must also draw borders and label all graphs appropriately.

4. You must use an appropriate font size. Font size 12 – 14 is desirable. Any appropriate font type is acceptable.

5. Please spell-check your answers before submitting.

6. Please note that marks will be deducted for poor presentation.

7. You can choose to use an image for Excel output, formula etc. To capture the image (eg, Excel output, formula, etc) currently displayed on your screen, use Windows’ Snipping Tool (available via the START button/All Programs/Accessories) and paste it to the desired location in your Word document.

**L****ate submissions:**

1. If the assignment is submitted late, an in-semester special consideration form MUST be completed. Otherwise a penalty of 5% may apply for EACH DAY (weekday and weekend).

2. Extensions will only be granted for substantive reasons at the discretion of the Lecturer

(Chief Examiner - CE).

3. You have to put ALL special consideration applications through Student and Education

Business Services (SEBS) at the following link:

4. No extensions will be granted if your request does not contain supporting documents.

Submitting a request **does not **guarantee an extension.

__Questions on Flight Data__

In this assignment, we look at flights that depart three major airports in New York City (NYC)

in 2013.

• John F. Kennedy International Airport (JFK)

• LaGuardia Airport (LGA)

• Newark International Airport (EWR)

Note:

For Question 1, do not refer to the Excel data set provided. You are required to use the data set provided for questions 2-8 ONLY, unless otherwise indicated.

**Question 1: [16 marks]**

We wish to investigate the level of service provided at these three airports. The level of service depends on departure and arrival delays, among other things. Assume that the length of departure delays for flights from the NYC airports in 2013 follows a normal distribution with mean 12.64 minutes and standard deviation 40.21 minutes. (Note: This question does not refer to the Excel data set provided)

(a) What is the maximum length of departure delays for 20% of flights that departed from the

NYC airports in 2013? Give your answer correct to one decimal place. Remember to define the variable of interest and state the distribution. Show all working.

(4 marks)

(b) Given the number of flights that depart from all three major airports in NYC, departure delays are inevitable. We categorise departures that are delayed by more than an hour as

‘long delays’

i. What is the likelihood that a flight chosen at random experiences a “long delay”?

(3 marks)

ii. If there were 336,800 flights that departed from the NYC airports in 2013, how many flights are expected to have “long delays”? Show all working.

(2 marks)

(c) A random sample of 72 flights was taken from flights that departed from the NYC airports in 2013. Calculate the probability that the average delay of this sample is not more than 20

minutes. Show all working.

(4 marks)

(d) If the distribution of delay times is right skewed, (instead of normally distributed) with the same mean and standard deviation, will your answers to

• parts (b) change or remain unchanged? Briefly explain

• parts (c) change or remain unchanged? Briefly explain

**No calculation is required for this question.**

(3 marks)

__I____n____s____t____r____u____c____t____ions for Questions 2 – 8__

Data on 72 randomly selected flights departing from the three major NYC airports in 2013 is contained in the file **A2_Data.xlsx.**

The data set contains different variables and these variables are described below:

**year, month, day **– Date of departure.

**d****e****p****_time, arr_time – **Actual departure and arrival times (format HHMM or HMM), local time zone.

**s****c****h****e****d****_dep_time, sched_arr_time – **Scheduled departure and arrival times (format HHMM or

HMM), local time zone.

**d****e****p****_delay, arr_delay – **Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.

**c****arrier **– Two letter carrier abbreviation.

**f****l****ight – **Flight number.

**t****ailnum **– Plane tail number.

**origin, dest – **Origin and destination. See airports for additional metadata.

**air_time – **Amount of time spent in the air, in minutes.

**d****istance – **Distance between airports, in miles.

**d****e****l****ay status – **“Yes” for **dep_delay **> 0 and “No” for otherwise

**Answer the following questions (2-8) using the data set provided in A2_Data.xlsx**, unless otherwise indicated**.**

**Question 2**: **[10 marks]**

(a) Calculate the 90% confidence interval estimate of the average departure delay times for flights that departed from the NYC airports in 2013. Remember to define the variable of interest and state the distribution. Show all working.

(7 marks)

(b) Assume that we want to be more confident that the true average is contained in the interval.

Hence, we increase the confidence level to 95%. Without any calculation, explain what would happen to the confidence interval width and estimation precision if the confidence level is increased while all other factors remain the same.

(3 marks)

**Question 3: [7 marks]**

We want to estimate the **proportion o**f flights that departed from the NYC airports in 2013 which are delayed. There are two ways we can do this. We can either obtain a point estimate or calculate an interval estimate. Provide estimates using both methods. Use a 99% confidence level. Show all working, define variables and state the distribution as needed.

(7 marks)

**Question 4: [8 marks]**

NYC airport management claim that the average departure delay times for flights that departed from major NYC airports in 2013 is less than 15 minutes. Test the claim at the 1% level of significance using the **critical value **approach. Show all working. (Hint: you need to obtain the sample statistics required from the data and state all values to two decimal places)

(8 marks)

**Question 5: [11 marks]**

(a) We now wish to test if more than 30% of flights that departed from the NYC airports in

2013 are delayed. Use a **p-value **approach at the 10% level of significance. Show all working. (Hints: you need to obtain the sample statistic using the data and keep the value

to three decimal places)

(8 marks)

(b) Define a Type I error and explain it in the context of the hypothesis test in (a).

(3 marks)

**Question 6: [15 marks]**

(a) Generate a pivot table of frequency broken down by the variables **origin **on the (ROWS) and the variable **delay status **(COLUMNS) using the data set provided in A2_Data.xlsx. Please label your table appropriately.

(2 marks)

(b) We want to know whether delay status is dependent on the origin of the flight. Based on the information obtained from the table constructed in (a), conduct an appropriate hypothesis test using the 10% level of significance. Use the **critical value **approach. Show your

working and keep three decimal places in the calculations.

(8 marks)

(c) What is the p-value of the test you conducted above? State the Microsoft Excel function used to find this value.

(2 marks)

(d) Define a Type II error and explain it in the context of the hypothesis test in (b).

(3 marks)

**Question 7: [17 marks]**

In this question, we look at the relationship between numerical variables and how this affects the level of service. We suspect there is a correlation between departure and arrival delays.

In order to investigate this, we refer to a visualisation and the model showing the relationship between the two variables.

(a) Generate a **scatter plot **of arrival delays on departure delays using the data set provided

in A2_Data.xlsx. Label your graph appropriately. Add a trendline to the plot. You are not required to add the equation or indicate R2 on the trendline. Describe the relationship between the two variables based on the **scatterplot ONLY**.

(3 marks)

(b) Generate a regression output of arrival delays on departure delays.

(2 marks)

(c) Interpret the coefficient of determination for this regression output.

(2 marks)

(d) Is the linear relationship between arrival delays and departure delays significant? Use the

**p****-****value **approach and a 1% level of significance.

(3 marks)

(e) How long is the predicted arrival delay if the flight is delayed by 1.5 hours at departure? Is this a valid prediction? Use all relevant available information from parts (a) to (d), discuss

the likely validity of this prediction.

(7 marks)

**Question 8: [16 marks]**

(a) We consider other variables that affect **arrival delays**, other than departure delays. We now consider the variable **distance. **Please answer following questions **in part (a) **using the data set provided in A2_Data.xlsx.

i. Generate a regression output of arrival delays on departure delays and distance.

(2 marks)

ii. Write down the estimated regression equation. Define the variables used.

(2 marks)

iii. At the 1% level of significance, test if there is a significant linear relationship between arrival delays and the two variables, departure delays and distance. Use the **p-value**

approach.

(4 marks)

iv. At the 5% level of significance, discuss whether the variable distance contributes to the regression. Use the **p-value **approach.

(2 marks)

(b) This is a general concept question and is not related to the data set provided. “Correlation does not imply causality” is an important concept when referring to regression. Briefly describe this concept. Provide one example of how it may occur. **Your answer in this section does not need to be related to the data set provided for the assignment**. Your answer **must**

**n****ot exceed 100 words.**

(6 marks)

Flights that depart from airports in New York City - Flight Data

Validation error occured. Please enter the fields and submit it again.

Thank You ! Your email has been delivered.