Friday, May 22, 2020

Mining Patterns for Career path based on Innate Talents - Free Essay Example

Sample details Pages: 11 Words: 3431 Downloads: 1 Date added: 2019/02/05 Category Career Essay Level High school Tags: Career Path Essay Did you like this example? Abstract Selecting an appropriate career path is one of the most important decisions in an individual’s life span. People end up getting into a profession where they neither enjoy nor get out of it due to several reasons like financial situation, family pressure, single source of income, cost of education and availability of vast career opportunities. Thus, student may select a wrong career option and the consequences of this wrong decision could be job dissatisfaction. Don’t waste time! Our writers will create an original "Mining Patterns for Career path based on Innate Talents" essay for you Create order An ultimate motive behind this research is to identify the most suitable career path that fits personality and working environment resulting positive outcome such as job satisfaction by using an appropriate data mining technique and a validated Holland’s theory, which is one of the most popular models used for career personality tests. Apart from this, other three factors will be obtained. Thus, for finding the Intersection, four factors are going to be considered: their personality traits, their interests, market trends and pay scales. The proposed system would help students to select an appropriate career path based on their personality traits by matching their â€Å"three-letter code† with the employee’s code. I. INTRODUCTION In Today’s world Career recommendation to the college students is a herculean task. The awareness of Career among the students is very less. Some students don’t know their abilities. Some students choose the career because his/her friend has chosen the same or their guardian forces them to opt for a career without knowing the actual interests, strengths and abilities in a particular area. Some parents force to satisfy his dream which they have seen in their childhood. Thus, students suffer a whole life. So, to help students from such conditions people have started the career counselling organizations. They provide guidance regarding career but does not analyse the abilities of the students. So, they allow students to choose career on their own. Here same problems occur that students don’t know their actual interest, abilities and strengths. Thus, to overcome such situation this project aims at evaluating some patterns by applying data mining techniques on employee’s data that would help students to select an appropriate career path based on their personality traits, their interests, market trends and pay scales. II. RELATED WORK There are various websites and web applications over the internet which helps students to know their suitable career path. But most of those systems only used personality traits as the only factor to predict the career, which might result in an inconsistent answer. Similarly, there are few sites that suggest career based on only the interests of the students. But the systems did not consider market trends and pay scales to increase the job satisfaction. None of the system has considered all the four factors namely personality traits, interests, market trends and pay scales. Also, the suggestion provided by the system for course is much generalized. For example, the results of few systems were a group of courses like data analyst, accountant, law etc. Thus, if a student gets such a recommendation then he/she might again get confused as the above specified course belong to different streams. The paper by [1] Elakia, Gayathri, Aarthi and Naren J suggest suitable career options for high school students based on every student’s interests, skills, likes, hobbies etc. and they have considered â€Å"discipline† as an important factor to continue higher studies and pursue one’s career. hence the chance of a student to get violent in future is predicted. The main objective of the paper by [2] Avinsh Kumar, Akshat Gawankar, Kunal Borge Mr Nilesh M Patil is to provide an overview on the data mining algorithm that are been used to predict student profile and personality. They have created online survey system that will help student to make career choices and understand their personality traits. Another paper by [3] Gentaneh Berie Tarekegn Dr. Vuda Sreenivasarao have attempted to use data mining techniques to analyse student’s entrance exam result to predict student’s placement into departments. The paper by [4] Nikita Gorad, Ishani Zalte, Aishwarya Nandi Deepali Nayak recommends the student, a career option based on their personality trait, interest and their capacity to take up the course. According to the paper by [5] Lokesh S. Katore, Bhakti S. Ratnaparkhi Dr. Jayant S. Umale they have developed the career recommendation system which will recommend the career to the students based on their personality traits. The paper by [6] Ms. Roshani Ade Dr. P.R. Deshmukh suggested incremental ensemble of classifiers in which the hypothesis from number of classifiers were experimented and by using ‘Majority voting rule’, the fin al result was determined. III. OVERVIEW The basic idea of this research is to acquire the data from the employees and to evaluate some patterns from that data. From that evaluated patterns certain career can be suggested to the students. For evaluating patterns from the employee’s data, four factors are going to be considered: their personality traits, their interests, market trends and pay scales. Figure 1: Four factors 1. Personality traits: Hollands six personality types are considered here as various personality traits. According to Holland’s theory of career choice most people are one of six personality types: Realistic Investigative Artistic Social Enterprising Conventional Thus, using these personality types, different careers will be classified. [7] Here, 42 questions are asked for evaluating personality traits. The â€Å"three-letter code† with the highest scores will be determined from these six personality types. Then after this â€Å"three-letter code† will be matched with some already defined professions and if there is a match between this profession and a code then it will return â€Å"Yes† in â€Å"P-E fit† field otherwise â€Å"No†. Thus, first factor named â€Å"P-E fit† will be evaluated. 2. Interest: Interest in this context means asking employees whether they are doing interest-based job or not. If â€Å"yes† then only we will consider their data for pattern evaluation and if â€Å"no† then we will simply ignore that entries because we aim to suggest the career on the basis of the employee’s data and if employee is not satisfied with his/her job then that is not the perfect match for him/her also, ultimately they are doing something in what they not even interested so, how can we suggest it to students? So, its mandatory that we verify the data which we are going to use for suggesting the career path to the students. Thus, second factor named â€Å"Interest based† will be evaluated. 3. Market trend: Top trending jobs from the market will be taken into consideration. The labour market is changing rapidly. No one can be sure of what will happen in the future, but some trends in the labour market do give clues about what is likely to happen. When making decisions about your education or career, it is important to understand these trends and to make good choices based on this information. [11] As of now, for this research purpose, its assumed that â€Å"Travel agent† is not a trending job as the internet has turned vacationers into their own travel agents. Websites, such as Kayak and Expedia, and Web applications, such as MakeMyTrip, Trivago, TripAdvisor enable travellers to book flights, cruises, and hotel rooms with ease. Hence, no travel agents are needed any more. So, if there is a travel agent in the responses then it will return â€Å"No† in â€Å"Trending job† field otherwise â€Å"Yes†. Thus, third factor named â€Å"Trending job † can be evaluated. 4. Pay scale: A pay scale (also known as a salary structure) is a system that determines how much an employee is to be paid as a wage or salary, based on one or more factors such as the employees level, rank or status within the employers organization, the length of time that the employee has been employed, and the difficulty of the specific work performed. [8] For evaluating fourth factor named â€Å"Pays well†, we have assumed that 10,000 should be the minimum salary for any employees working in any field, so if their salary is less than 10,000 then it will return â€Å"No† in â€Å"Pays well† field otherwise â€Å"Yes†. IV. IMPLEMENTATION Figure 2: Implementation steps Step 1. Data collection (using google form-spreadsheet): The first step of implementation was to collect data from employees working in different fields. For this purpose, an online survey was conducted using Google forms. The questions asked in the survey are based on personality traits (42), and two more questions for asking about their interest and income. This data has been collected from the employees working in various job sectors such as State Bank of India(Modasa), Union Bank(Gandhinagar), Travel Infoline(Ahmedabad), Institute for Photography Excellence(Ahmedabad), inifd(Gandhinagar), District court(Gandhinagar), Rajshree Studio(Idar), Torrent Pharmaceuticals Limited (Mehsana) and Nootan Vidyalaya(Kadi). As this is the google form, I shared the link with all my friends and family members and asked them to fill it and forward it in their groups. Figure 3: Google form sample Step 2. Downloaded as MS Excel: Responses was downloaded as MS Excel (.xlsx) Figure 4: Raw dataset Step 3. Pre-processing (in excel): Then data obtained from the survey had to pre-processed and consolidated into a common format as required by the system in MS Excel. Based on the answers given by employees, three-letter code for each individual was generated. For example, with a code of RIA you would most resemble the Realistic type, somewhat but less resemble the Investigative type, and somewhat but even less resemble the Artistic type. The types that are not in your code are the types you resemble least of all. Most people, and most jobs, are some combination of two or three of the Holland interest areas. [9] By using this data â€Å"P-E fit†, â€Å"Interest based†, â€Å"Trending job† and â€Å"Pays well† was determined and then after â€Å"Intersection† was calculated by considering all these four factors. If all the four factor’s values are â€Å"Yes† then â€Å"Intersection† field’s value will be â€Å"Yes† otherwise â€Å"No†. Thus, target attribute named â€Å"Intersection† will be evaluated. Figure 5: Pre-processed dataset Step 4. DM Tool (RStudio): RStudio is a data mining open source tool for applying data mining algorithms over the data collected from the users. It is an â€Å"Integrated development environment (IDE)† that helps you develop programs in R that means R is a â€Å"Programming language† while R studio is a â€Å"Platform† to use R. You can use R without using RStudio, but you cant use RStudio without using R, so R comes first. [10] Step 5. DM Algorithm: Data mining is all about extracting patterns from an organizations stored or warehoused data. These patterns can be used to gain insight into aspects of the organizations operations, and to predict outcomes for future situations as an aid to decision-making. [4] A. Decision tree algorithm: A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is the root node. [22] 1. ID3: In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan, used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains.[12] 2. C4.5: C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlans earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. Authors of the Weka machine learning software described the C4.5 algorithm as a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date. It became quite popular after ranking #1 in the Top 10 Algorithms in Data Mining pre-eminent paper published by Springer LNCS in 2008. Improvements from ID.3 algorithm: C4.5 made a number of improvements to ID3. Some of these are: Handling both continuous and discrete attributes In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values C4.5 allows attribute values to be marked as â€Å"?† for missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation C4.5 goes back through the tree once its been created and attempts to remove branches that do not help by replacing them with leaf nodes. [13] 3. C5.0: C5.0 is widely used as a decision tree method. It provides the set of rules which is easy to understand. C5.0 algorithm gives acknowledge on noise and missing data. Problem of over fitting and error pruning is solved by the C5.0 algorithm. In classification technique, the C5.0 classifier can anticipate which attributes are relevant and which are not relevant in classification. [4] Improvements in C5.0 algorithm: C5.0 offers a number of improvements on C4.5. Some of these are: Speed C5.0 is significantly faster than C4.5 Memory usage C5.0 is more memory efficient than C4.5 Smaller decision trees C5.0 gets similar results to C4.5 with considerably smaller decision trees. Support for boosting Boosting improves the trees and gives them more accuracy. Weighting C5.0 allows you to weight different cases and misclassification types. Winnowing a C5.0 option automatically winnows the attributes to remove those that may be unhelpful. [14] Boosted C5.0: Adaptive boosting involves making several models that â€Å"vote† how to classify an example. To do this you need to add the ‘trials’ parameter to the code. The ‘trial’ parameter sets the upper limit of the number of models R will iterate if necessary. [15] 4. CART: Classification and Regression Trees (CART) split attributes based on values that minimize a loss function, such as sum of squared errors. [16] Classification and regression trees (CART) are a non-parametric decision tree learning technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively. Decision trees are formed by a collection of rules based on variables in the modelling data set: Rules based on variables values are selected to get the best split to differentiate observations based on the dependent variable Once a rule is selected and splits a node into two, the same process is applied to each child node (i.e. it is a recursive procedure) Splitting stops when CART detects no further gain can be made, or some pre-set stopping rules are met. (Alternatively, the data are split as much as possible and then the tree is later pruned.) Each branch of the tree ends in a terminal node. Each observation falls into one and exactly one terminal node, and each terminal node is uniquely defined by a set of rules. [17] 5. Random Forest: Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees habit of overfitting to their training set. [18] Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a tree at each decision point to a random sub-sample. This further increases the variance of the trees and more trees are required. [16] 6. This algorithm stands for â€Å"Conditional Inference Tree†. Statistics-based approach that uses non-parametric tests as splitting criteria, corrected for multiple testing to avoid overfitting. This approach results in unbiased predictor selection and does not require pruning. [19] Ctree is a non-parametric class of regression trees embedding tree-structured regression models into a well-defined theory of conditional inference procedures. It is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. [20] B. Neural Network: An Artificial Neural Network, often just called a neural network, is a mathematical model inspired by biological neural networks. Neural networks are used to model complex relationships between inputs and outputs or to find patterns in data. [21] A neural network is a model characterized by an activation function, which is used by interconnected information processing units to transform input into output. A neural network has always been compared to human nervous system. Information in passed through interconnected units analogous to information passage through neurons in humans. The first layer of the neural network receives the raw input, processes it and passes the processed information to the hidden layers. The hidden layer passes the information to the last layer, which produces the output. The advantage of neural network is that it is adaptive in nature. It learns from the information provided, i.e. trains itself from the data, which has a known outcome and optimizes its weights for a better prediction in situations with unknown outcome. [23] C. Naà ¯ve Bayes: The Naive Bayesian classifier is based on Bayes’ theorem with the independence assumptions between predictors. A Naive Bayesian model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets. Despite its simplicity, the Naive Bayesian classifier often does surprisingly well and is widely used because it often outperforms more sophisticated classification methods. [24] V. RESULTS OF IMPLEMENTATION The dataset was then used to derive the results, using the various packages available in R for generating decision tree. The C5.0 algorithm applied on the dataset had the accuracy of 100%. The output after plotting the decision tree is shown in Figure 6. This tree is generated by considering all the four factors namely personality traits, their interest, market trends and pay scales. Figure 6: decision tree with all the four factors We have also generated the various trees with considering one factor at a time while applying C5.0 algorithm on the dataset. The visualization of the decision trees with â€Å"Trending job†, â€Å"P-E fit†, â€Å"Interest based†, â€Å"Pays well† are â€Å"Figure 7†, â€Å"Figure 8†, â€Å"Figure 9†, â€Å"Figure 10† respectively. Figure 7: decision tree with â€Å"Trending job† Figure 8: decision tree with â€Å"P-E fit† Figure 9: decision tree with â€Å"Interest based† Figure 10: decision tree with â€Å"Pays well† From the following graph, we can get a clear idea of the comparison of the five methods. Figure 11: accuracy with various factors Thus, this graph shows that for selecting a career of a student all the four factors are important namely personality traits, their interest, market trends and pay scales. VII. CONCLUSION This work has discussed the Holland’s theory and various data mining techniques in relation to observations indicating that some students have difficulty in determining a suitable career. As this affects their performance, productivity and satisfaction, it is critically important to understand how to find a career that fits their personality. The results generated from the employee’s data can be useful for evaluating patterns in order to determine a suitable career path for the students based on the four factors namely personality traits, interests, market trends and pay scales. REFERENCES [1] Elakia, Gayathri, Aarthi and Naren J, â€Å"Application of Data Mining in Educational Database for Predicting Behavioural Patterns of the Students†, IJCSIT, 2014 [2] Avinsh Kumar, Akshat Gawankar, Kunal Borge Mr Nilesh M Patil, â€Å"Student Profile Personality Prediction using Data Mining Algorithms†, IJARIIE, 2017 [3] Gentaneh Berie Tarekegn Dr. Vuda Sreenivasarao, â€Å"Application of Data Mining Techniques to Predict Students Placement in to Departments†, IJRSCSE, 2016 [4] Nikita Gorad, Ishani Zalte, Aishwarya Nandi Deepali Nayak, â€Å"Career Counselling using Data Mining†, IJESC, April 2017 [5] Lokesh S. Katore, Bhakti S. Ratnaparkhi Dr. Jayant S. Umale, â€Å"Novel Professional career prediction and recommendation method for individual through analytics on personal traits using C4.5 algorithm†, IEEE, 2015 [6] Ms. Roshani Ade Dr. P.R. Deshmukh, â€Å"An incremental ensemble of classifiers as a technique for prediction of student’s career choice†, IEEE, 2014 [7] https://www.careerkey.org/choose-a-career/hollands-theory-of-career-choice.html#.WpEoWKhuY2x [8] https://en.wikipedia.org/wiki/Pay_scale [9]https://www.nhes.nh.gov/elmi/career/documents/holland-code-sparks.pdf [10]https://www.quora.com/What-is-the-difference-between-R-and-RStudio [11]https://www.employmentcrossing.com/article/900012648/Important-Labor-Market-Trends-and-Career-Planning/ [12] https://en.wikipedia.org/wiki/ID3_algorithm [13] https://en.wikipedia.org/wiki/C4.5_algorithm [14] https://en.wikipedia.org/wiki/C4.5_algorithm#Improvements_in_C5.0.2FSee5_algorithm [15] https://educationalresearchtechniques.com/2016/05/25/3838/ [16] https://machinelearningmastery.com/non-linear-classification-in-r-with-decision-trees/ [17] https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees_.28CART.29 [18] https://en.wikipedia.org/wiki/Random_forest [19] https://wiki2.org/en/Decision_tree_learning [20] Torsten Hothorn, Kurt Hornik and Achim Zeileis â€Å"ctree: Conditional Inference Trees† [21] https://www.vskills.in/certification/tutorial/data-mining-and-warehousing/neural-networks-and-data-mining/ [22] https://www.tutorialspoint.com/data_mining/dm_dti.htm [23] https://www.analyticsvidhya.com/blog/2017/09/creating-visualizing-neural-network-in-r/ [24] https://www.saedsayad.com/naive_bayesian.htm

Thursday, May 7, 2020

Bank Marketing - 2517 Words

I. Introduction Within our society, financial institutions are becoming more abundant. Along with this present growth, the field of marketing financial services has also grown in size and scope with new entrants everyday. The relatively stable banking environment is being altered with innovation, opportunism, and government intervention. This era, marked by the government s luminous hand of deregulation (defined as the act of removing regulations or restrictions from a specific entity), has expanded consumer options to the extent that commercial banking must now become an aggressively competing member of the financial services industry. In this new era, important marketing areas such as regulation, environment, product,†¦show more content†¦In making decisions concerning a bank s strategy, the institution must take into account several initial areas that may formulate a restraint in implementation. They are the economic and cultural environment, the competitive banking atmosphere (extremely competitive), the marketing strategy to be implemented, and the pricing/promotion that goes into the marketing function. (McMahon, 1986). III. Environment The foundation of a successful bank marketing scheme lies with a strong understanding of the environment in which the institution is located. The most important environmental variables are those operating outside the bank, known as the external environment and those within the organization, known as the internal environment. (McMahon, 1986). The external environment includes components such as legislation, prime rate, competition in the market, local business practices, and technology. Most importantly, the external environment must look and examine the effect of the national and regional economy as it pertains to the marketing situation. First, bank managers need to examine certain forms of legislation, such as the government s present deregulation process. (Hodges and Tillman, 1968). Due to stiff pressure from customers and the business community alike, bank regulators, federal and state, are releasing their tight grip over the previously controlled industry. The government s lasses-faire approach to the situation has enabled states toShow MoreRelated Bank Marketing Essay2444 Words   |  10 Pages I. Introduction nbsp;nbsp;nbsp;nbsp;nbsp; Within our society, financial institutions are becoming more abundant. Along with this present growth, the field of marketing financial services has also grown in size and scope with new entrants everyday. The relatively stable banking environment is being altered with innovation, opportunism, and government intervention. This era, marked by the government’s luminous hand of deregulation (defined as the act of removing regulations or restrictions fromRead MoreAlly Bank : Marketing And Advertising1197 Words   |  5 PagesAlly Bank has become more of a well-known name over the last few years. Being the â€Å"product† of a bailout hasn’t been easy for them, but they are finding unique ways to brand themselves. Ally Bank should use marketing and advertising to continue to promote their brand name and what they stand for, along with education about the bailout. Ally will continue to grow their market over the years if they remain innovative. Ally Bank faces some tou gh competition but has been able to out-do them with theirRead MoreTesco Bank Marketing Essay2806 Words   |  12 PagesCredit Card Poster Executive summary This assignment is commissioned to examine why Tesco Bank will be suitable to boost sales and why it is necessary to focus on this area for improvement. The research draws attention to the fact that the current product life cycle of Tesco Bank is in the Maturity stage. And now Tesco Bank is a question mark according to Boston’s Matrix. The target market of Tesco Bank includes Tesco Clubcard holders, families with children, pet and car. By Geographic segmentationRead MoreMarketing Plan for Bank of Melbourne5039 Words   |  21 Pages                                                                      P a g e   1                                     Bank   of   Melbourne    Marketing   Plan   for   Victoria    Alison   Chew,   Vicky   Bui,   Sunny   Sun,   Yikai   Zhang          2012    M K T G 1 0 0 0 1    –    P r i n c i p l e s    o f    M a r k e tRead MoreMarketing of Liability Products of Bank3370 Words   |  14 PagesAssignment Topic: Marketing of Bank’s Liability products Subject: Marketing Management – I Ankita Roy 11DM008 Neha Saha 11DM018 P.Santhosh 11DM028 Siddarth Singh 11DM038 Anirban Dhar 11DM048 Debabrata Subudhi 11DM058 Marketing of Liability Products Introduction; Commercial banks offers a varieties of product to the customers, these products are known as ‘’liability product† because they represent liability of the bank. Consumer generally know them as â€Å"deposit product†. There are numberRead MoreMarketing Question Bank16868 Words   |  68 Pages___________________________________________________________________________ 1. 2. 3. 4. 5. 6. Marketing is basically selling and advertising. True False According to the text, marketing means selling or advertising. True False Marketing, in the literal sense, means selling or advertising. True False Marketing means promotion and selling. True False Actually making goods or performing services is called marketing. True False Estimating what price consumers are willing to pay for a product and if theRead MoreService Marketing On Bank Of America1115 Words   |  5 PagesJaquelle, Silam, Silas, and Trang Service Marketing on Bank of America Based on service Intangibility Unlike goods, services are typically produced and consumed simultaneously. The service and provider are always linked together. Banking is one of industries need much more services. Employees, Email, ATM, Online banking are the vehicles that the bank uses to deliver their services to the customers. â€Å"Products are tangible while services are intangible in nature. Intangibility of services is derivedRead More7 Ps of Service Marketing for a Bank20620 Words   |  83 PagesMarketing  of  Banking  Services  (Case  of  Prime  Bank  Ltd.  )  Ã‚   CHAPTER 1 Introduction 1.1: BACKGROUND OF THE REPORT Internship program is a pre-requisite for acquiring BBA degree in UIU. Before completion of the program, every student must undergo the internship program or a research work. It is mandatory because classroom discussion alone cannot make a student outstanding in handling the real business scenario. This is an opportunity for a student to learn about the real life situation and how toRead MoreThe Marketing Plan For Chase Bank799 Words   |  4 Pagesseem to implement the marketing concept to its fullest potential. Among these companies are Chase Bank, Big Lots, Pulse Yoga and Fitness Studio, and Tecknowlogic. On the other hand are the companies that use the marketing concept to the fullest extent. These companies include Huntington Bank, Meijer’s, The YMCA, and eGreen Computers, Inc. When looking at Huntington Bank everything they market and everything they set out to do is to satisfy their customers’ needs. Their marketing is everywhere from pensRead MoreMarketing Mix of Common Wealth Bank2300 Words   |  10 PagesExecutive Summary Marketing Mix of Common Wealth Bank in the time of global financial crisis . Analysis of bank marketing mix in the time of GFC is the main objective of the study. The Theoretical framework presents a glimse of selected theories of marketing and crisis management used. To get empirical data, interview and secondary data research are used. Introduction The global financial crisis has its origin in the US, but its effect was seen all over the world. In fact in last eight decades

Wednesday, May 6, 2020

Final Exam Ec315 Free Essays

PART I. HYPOTHESIS TESTING PROBLEM 1 A certain brand of fluorescent light tube was advertised as having an effective life span before burning out of 4000 hours. A random sample of 84 bulbs was burned out with a mean illumination life span of 1870 hours and with a sample standard deviation of 90 hours. We will write a custom essay sample on Final Exam Ec315 or any similar topic only for you Order Now Construct a 95 confidence interval based on this sample and be sure to interpret this interval. Answer Since population standard deviation is unknown, t distribution can be used construct the confidence interval. ? The 95% confidence interval is given by ? X ? t? / 2,n ? 1 ? S S? , X ? ? /2,n ? 1 ? n n? Details Confidence Interval Estimate for the Mean Data Sample Standard Deviation Sample Mean Sample Size Confidence Level 90 1870 84 95% Intermediate Calculations Standard Error of the Mean 9. 819805061 Degrees of Freedom 83 t Value 1. 988959743 Interval Half Width 19. 53119695 Confidence Interval Interval Lower Limit 1850. 47 Interval Upper Limit 1889. 53 2 PROBLEM 2 Given the following data from two independent data sets, conduct a one -tail hypothesis test to determine if the means are statistically equal using alpha=0. 05. Do NOT do a confidence interval. 1 = 35 n2 = 30 xbar1= 32 xbar2 = 25 s1=7 s2 = 6 Answer H0:Â µ1=Â µ2 H1: Â µ1Â µ2 Test statistics used is t ? X1 ? X 2 S 2 (n1 ? 1) S12 ? (n2 ? 1) S2 n1n2 ~ tn1 ? n1 ? 2 where S ? n1 ? n2 ? 2 n1 ? n2 Decision rule : Reject the null hypothesis, if the calculated value of test statistic is greater than the critical value. Details t Test for Differences in Two Means Data Hypothesized Difference Level of Significance Population 1 Sample Sample Size Sample Mean Sample Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation 0 0. 05 35 32 7 30 25 6 Intermediate Calculations Population 1 Sample Degrees of Freedom 34 Population 2 Sample Degrees of Freedom 29 Total Degrees of Freedom 63 Pooled Variance 43. 01587 Difference in Sample Means 7 t Test Statistic 4. 289648 Upper-Tail Test Upper Critical Value p-Value Reject the null hypothesis 1. 669402 3. 14E-05 Conclusion: Reject the null hypothesis. The sample provides enough evidence to support the claim that means are different. 3 PROBLEM 3. A test was conducted to determine whether gender of a display model af fected the likelihood that consumers would prefer a new product. A survey of consumers at a trade show which used a female spokesperson determined that 120 of 300 customers preferred the product while 92 of 280 customers preferred the product when it was shown by a female spokesperson. Do the samples provide sufficient evidence to indicate that the gender of the salesperson affect the likelihood of the product being favorably regarded by consumers? Evaluate with a two-tail, alpha =. 01 test. Do NOT do a confidence interval. Answer H0: There no significant gender wise difference in the proportion customers who preferred the product. H1: There significant gender wise difference in the proportion customers who preferred the product. P ? P2 n p ? n p 1 The test Statistic used is Z test Z ? where p= 1 1 2 2 n1 ? n2 ?1 1? P(1 ? P) ? ? ? ? n1 n2 ? Decision rule : Reject the null hypothesis, if the calculated value of test statistic is greater than the critical value. Details Z Test for Differences in Two Proportions Data Hypothesized Difference Level of Significance Group 1 Number of Successes Sample Size Group 2 Number of Successes Sample Size 0 0. 01 Male 120 300 Female 92 80 Intermediate Calculations Group 1 Proportion 0. 4 Group 2 Proportion 0. 328571429 Difference in Two Proportions 0. 071428571 Average Proportion 0. 365517241 Z Test Statistic 1. 784981685 Two-Tail Test Lower Critical Value -2. 575829304 Upper Critical Value 2. 575829304 p-Value 0. 074264288 Do not reject the null hypothesis Conclusion: Fails to reject the null hypothesis. The sample does not provide enough evidence to support the claim that ther e significant gender wise difference in the proportion customers who preferred the product. 4 PROBLEM 4 Assuming that the population variances are equal for Male and Female GPA’s, test the following sample data to see if Male and Female PhD candidate GPA’s (Means) are equal. Conduct a two-tail hypothesis test at ? =. 01 to determine whether the sample means are different. Do NOT do a confidence interval. Male GPA’s Female GPA’s Sample Size 12 13 Sample Mean 2. 8 4. 95 Sample Standard Dev .25 .8 Answer H0: There is no significant difference in the mean GPA of males and Females H1: There is significant difference in the mean GPA of males and Females. Test Statistic used is independent sample t test. ? X1 ? X 2 S 2 (n1 ? 1) S12 ? (n2 ? 1) S2 n1n2 ~ tn1 ? n1 ? 2 where S ? n1 ? n2 ? 2 n1 ? n2 Decision rule: Reject the null hypotheses, if the calculated value of test statistic is greater than the critical value. Details t Test for Differences in Two Means Data Hypothesized Difference Level of Significance Population 1 Sample Sample Size Sample Mean Sampl e Standard Deviation Population 2 Sample Sample Size Sample Mean Sample Standard Deviation Intermediate Calculations Population 1 Sample Degrees of Freedom Population 2 Sample Degrees of Freedom Total Degrees of Freedom Pooled Variance 0. 05 12 2. 8 0. 25 13 4. 95 0. 8 11 12 23 0. 363804 5 Difference in Sample Means t Test Statistic -2. 15 -8. 90424 Two-Tail Test Lower Critical Value Upper Critical Value p-Value Reject the null hypothesis -2. 80734 2. 807336 0. 0000 Conclusion: Reject the null hypotheses. The sample provides enough evidence to support the claim that there is significant difference in the mean GP A score among the males and females. 6 PART II REGRESSION ANALYSIS Problem 5 You wish to run the regression model (less Intercept and coefficients) shown below: VOTE = URBAN + INCOME + EDUCATE Given the Excel spreadsheet below for annual data from1970 to 2006 (with the data for row 5 thru row 35 not shown), complete all necessary entries in the Excel Regression Window shown below the data. 1 2 3 4 A YEAR 1970 1971 1972 B VOTE C URBAN D INCOME E EDUCATE 49. 0 58. 3 45. 2 62. 0 65. 2 75. 0 7488 7635 7879 4. 3 8. 3 4. 5 36 37 38 2004 2005 2006 50. 1 92. 1 94. 0 95. 6 15321 15643 16001 4. 9 4. 7 5. 1 67. 7 54. 2 Regression Input OK Input Y Range: A1:A38 Input X Range: B1:E38 Cancel Help ? Labels Confidence Level: x X X Output options X Constant is Zero 95 % Output Range: New Worksheet Ply: New W orkbook Residuals Residuals Residual Plots Standardized Residuals Line Fit Plots Normal Probabilit y Normal Probability Plots 7 PROBLEM 6. Use the following regression output to determine the following: A real estate investor has devised a model to estimate home prices in a new suburban development. Data for a random sample of 100 homes were gathered on the selling price of the home ($ thousands), the home size (square feet), the lot size (thousands of square feet), and the number of bedrooms. The following multiple regression output was generated: Regression Statistics Multiple R 0. 8647 R Square . 7222 Adjusted R Square 0. 6888 Standard Error 16. 0389 Observations 100 Intercept X1 (Square Feet) X2 (Lot Size) X3 (Bedrooms) Coefficients -24. 888 0. 2323 11. 2589 15. 2356 Standard Error 38. 3735 0. 0184 1. 7120 6. 8905 t Stat -0. 7021 9. 3122 4. 3256 3. 2158 P-value 0. 2154 0. 0000 0. 0001 0. 1589 a. Why is the coefficient for BEDROOMS a positive number? The selling price increa se when the number of rooms increases. Thus the relationship is positive. b. Which is the most statistically significant variable? What evidence shows this? Most statistically significant variable is one with least p value. Here most statistically significant variable is Square feet. c. Which is the least statistically significant variable? What evidence shows this? Least statistically significant variable is one with high p value. Here least statistically significant variable is bedrooms d. For a 0. 05 level of significance, should any variable be dropped from this model? Why or why not? The variable bed rooms can be dropped from the model as the p value is greater than 0. 05. e. Interpret the value of R squared? How does this value from the adjusted R squared? The R2 gives the model adequacy. Here R2 suggest that 72. 22% variability can e explained by the model. Adjusted R2 is a modification of R2 that adjusts for the number of explanatory terms in a model. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance. f. Predict the sales price of a 1134-square-foot home with a lot size of 15,400 square feet and 2 bedrooms. Selling Price =-24. 888+ 0. 02323*1134+11. 2589*15400+15. 2356*2=173419 8 PART III SPECIFIC KNOWLEDGE SHORT-ANSWER QUESTIONS. Problem 7 Define Autocorrelation in the following terms: a. In what type of regression is it likely to occur? Regressions involving time series data . What is bad about autocorrelation in a regression? The standard error of the estimates will high. c. What method is used to determine if it exists? (Think of statistical test to be used) Durbin Watson Statistic is used determine auto correlation in a regression. d. If found in a regression how is it eliminated? Appropriate transformations can be adopted to eliminate auto correlation. Problem 8 Define Multicollinearity in the following terms: a) In what type of regression is it likely to occur? Multicollinearity occurs in multiple regressions when two or more independent variables are highly correlated. ) Why is multicollinearity in a regression a difficulty to be resolved? Multicollinearity in Regression Models is an unacceptably high level of intercorrelation among the independents, such that the effects of the independents cannot be separated. Under multicollinearity, estimates are unbiased but assessments of the relative strength of the explanatory variables and their joint effect are unreliable. c) How can multicollinearity be determined in a regression? Multicollinearity refers to excessive correlation of the predictor variables. When correlation is excessive (some use the rule of thumb of r 0. 90), tandard errors of the b and beta coefficients become large, making it difficult or impossible to assess the relative importance of the predictor variables. The measures Tolerance and VIF are commonly used to measure multicollinearity. Tolerance is 1 – R2 for the regression of that independent variable on all the other independents, ignoring the dependent. There will be as many tolerance coefficients as there are independents. The higher the inter-correlation of the independents, the more the tolerance wil l approach zero. As a rule of thumb, if tolerance is less than . 20, a problem with multicollinearity is indicated. When tolerance is close to 0 there is high multicollinearity of that variable with other independents and the b and beta coefficients will be unstable. The more the multicollinearity, the lower the tolerance, the more the standard error of the regression coefficients. d) If multicollinearity is found in a regression, how is it eliminated? Multicollinearity occurs because two (or more) variables are related – they measure essentially the same thing. If one of the variables doesn’t seem logically essential to your model, removing it may reduce or eliminate multicollinearity. How to cite Final Exam Ec315, Essay examples