本站提供专业的[留学生论文]定制业务,如需服务请,联系电话:13671516250.
如何撰写研究论文:文章介绍:文献综述、原理:假设进行测试、数据和汇总统计、结果:描述和解释、结论与启示、引用.
PLAN
A. How to write a research paper
B. Basic panel data regression analysis using Stata
A. How to write a research paper
I. Outline of a research project
1. Introduction
2. Literature review
3. Theory: hypotheses to be tested
4. Data and summary statistics
5. Results: description and interpretation
6. Conclusions and implications
7. Referencing 3
1. Introduction
It is a statement of the problem you are analyzing, its background, and its importance. It also provides a roadmap to what will be covered in the remaining part of the paper.
a. Make the reader aware of the GENERAL area of study.
Example: ―How enterprises are financed is an important question due to the fact that financial capital is essential to the formation and subsequent operation of an enterprise.‖ [Du, J., Guariglia, A., and A. Newman, (2009). “Does social capital affect the financing decisions of Chinese SMEs?” Mimeograph] 4
b. Also focus upon what is the SPECIFIC area of research.
Example: ―In recent years, empirical work on capital structure has been extended to the developing economies‘ context.‖ (Du et al., 2009)
c. Make clear what is the GAP in the literature that you are aiming to fill
Example: ―We contribute to the literature in two main ways. First, we examine the capital structure determinants of Chinese SMEs using a large dataset representative of firm activity across the whole country. To the best of our knowledge, no empirical study has been conducted to test the applicability of existing capital structure theories to SMEs in the Chinese context. Second, we consider social capital as a determinant of capital structure. Although previous work has found a relationship between social capital and firm performance, no work has been conducted on the relationship between social capital and capital structure in the Chinese context.‖ (Du et al., 2009)
d. Clearly define the research question that you are addressing
Example: ―Analyzing the role of social capital as a determinant of capital structure of Chinese SMEs.‖ (Du et al., 2009).
e. Explain why this research question is worthy of investigation
Example: ―It is new and may help to explain why traditional theories of capital structure cannot be applied in the Chinese context.‖ (Du et al., 2009).
f. Provide an outline of what comes next
Example: ―The remainder of the paper is organized as follows. In Section 2, we describe a theoretical background about capital structure theories. In Section 3, we present our baseline specification, and discuss our estimation methodology. Section 4 describes our data and presents some descriptive statistics. In section 5, we illustrate and discuss our results. Finally, in section 6, we provide some conclusions.‖ (Du et al., 2009).
Note: in the introduction, only refer to a few of the key papers which have triggered your interest, and which you are building upon. Do not provide a full literature review in the introduction. The literature review is the object of the next section.
2. Literature review
It should present a focussed and a carefully structured outline of what others (academics/researchers) have done in your topic/problem area.
You need to properly organise the literature
Example:
Empirical studies on capital structure that focussed on the US and the UK
Empirical studies on capital structure that focused on developing and transition countries
Empirical studies on capital structure that focused on China
Be selective. DO NOT:
simply provide a list of as many articles and names of scholars as possible.
offer a reference to each and every piece of literature in the area.
3. Theory: hypotheses to be tested
This section is aimed at
deriving and/or motivating your empirical work
clarifying your idea in readers‘ minds
State the hypotheses that you are testing
Example:
H1: There will be a positive relationship between firm size and short-term leverage (STL), long-term leverage (LTL), and total leverage (TL).
H2: There will be a positive relationship between firm age and STL, LTL and TL.
H3: There will be a positive relationship between asset structure and LTL and a negative relationship with STL and TL. H4: There will be a negative relationship between profitability and STL, LTL and TL.
H5: There will be a positive relationship between social capital and STL, LTL and TL. (Du et al., 2009)
Obviously, you will need to motivate the intuition why you expect each of
these hypotheses to hold.
4. Data and summary statistics
•Describe your data, what they consist in, where they come from. Lengthier if you are using a novel data set; shorter if data are well known
•Descriptive statistics: can provide preliminary evidence for what you are testing
Example:
Foreigncap
=0%
(1)
0%<
Foreigncap<50%
(2)
50%
Foreigncap<
100%
(3)
Foreigncap
=100%
(4)
ROA
0.037
0.056
0.060
0.046
ROS 0.025 0.042 0.047 0.032
PROD 0.058 0.112 0.185 0.091
TFP 0.027 0.033 0.037 0.028
Notes: Foreigncap represents the fraction of the firm‘s capital paid in by foreign investors. ROA represents the firm‘s returns to assets and is given byits net income over its total assets. ROS represents the firm‘s returns to sales and is given by its net income over its total sales. PROD represents labor productivity, i.e. the ratio of the firm‘s net income to its number of employees. TFP is total factor productivity
―We can see that ROA, ROS, PROD, and TFP, all increase with the degree of foreign ownership, but decline for those observations that are 100% foreign owned. This suggests that joint-ventures perform better than foreign owned and purely domestic firms, and may reflect the fact that both the domestic and the foreign parties of a joint-venture bring in attributes essential to achieving high performance.‖ [Greenaway, D., Guariglia, A., and Z. Yu (2009). “The more the better? Foreign ownership and corporate performance in China”. Leverhulme Centre for Research on Globalization and Economic Policy, Research Paper 09/05.]
5. Results: description and interpretation
The results must be discussed at length, providing interpretations
Example:
First state the result: ―The employment of the centrally affiliated firms is not influenced by financial factors, whereas both local affiliated and non-affiliated firms are subject to financial constrains to their employment. Any percent increase in cash flow can significantly induce 0.18 and 0.17 percent increases in employment in locally affiliated and non-affiliated private firms respectively.
Then, provide the interpretations: This provides evidence that affiliation to high level of government does mitigate the adverse effects of financial factors to firms‘ employment, and suggests that political connections could give firms better access to bank loans. [Chen, M. and A. Guariglia (2009). “Do Financial Factors Affect Firms’ Employment? Evidence from Chinese Manufacturing Firms.” Mimeograph]
•Stress/discuss the original results; spend little time on standard results.
•Link your results to the hypotheses you developed in the previous section.
•Provide various tests for the robustness of your results (e.g. see if your results hold for different regions or different industrial groups) 17
6. Conclusions and implications
•Provide a summary of what you did in the paper
•Show what you have added to the literature
•If possible and relevant, provide a discussion of policy implications
Example:
―As for the policy implications that arise from this study, policy makers in China need to recognize the importance of improving the ability of privately-owned SMEs to access bank financing, especially in the long-term. This might be done through the development of effective credit-rating and guarantee schemes.
Informal financing mechanisms based on social capital might have supported the growth of Chinese SMEs until the present day, but are arguably not appropriate if China is to develop world-class private enterprises able to compete globally. The development of effective financing mechanisms is especially important in times of economic crisis as we are experiencing today, when informal credit on offer to SMEs has dried up.‖ (Du et al., 2009) 19
•Say something about future research possibilities
Example:
―Our research could be extended in several directions. First, it would be interesting to see whether our results hold for other investment models, such as the error-correction or the Euler-equation model. Second, other proxies for investment opportunities could be developed in the context of small businesses.
Third, it would be interesting to assess whether our results hold for different countries, characterized by different degrees of financial development. These extensions are in the agenda for future research.‖ [D’Espallier, B. and A. Guariglia (2009). “Does the investment opportunities bias affect the investment-cash flow sensitivities of unlisted SMEs?” Mimeograph]
7. Referencing
Providing full and accurate references to your sources is a very important part of presenting your work. There are two aspects of this:
o citations that point to references (e.g. Keynes (1936), p. 383);
o the bibliography, that contains information about the references themselves.
Here are some rules:
i. You must always include direct quotations from other people‘s work
— published or unpublished — in inverted commas: ‗‗ ‘‘. Failure to so is a serious academic offence.
Always follow a quotation with the relevant citation. Example:
Many commentators believe that policy makers are pragmatic and not much influenced by ideas. Keynes disagreed: ‗‗Practical men … areusually the slaves of some defunct economist. Madmen in authority, who hear voices in the air, are distilling their frenzy from some academicscribbler of a few years back.‘‘ (Keynes, 1936, p. 383)
The citation, Keynes, 1936 in the example, should point to exactly one reference in the bibliography, which appears at the end of your paper.
ii. Citations should also appear when you refer to the work of others without direct quotation.
Example:
… In their model of commodity prices, Deaton and Laroque (1992) postulate the existence of a single threshold price, above which stocks of the commodity have been driven to zero. …
In this example, the citation Deaton and Laroque (1992) alerts the reader to the source of the work being discussed.
iii. The bibliography is a list of references that appears at the end of your paper. The following information should always be included: author; date of publication; title of the work. For a book you should also include the edition, place of publication and publisher. For an article you should include the journal or book in which the article appears as well as page numbers and, if possible, the volume number.
For unpublished works, you will have to use your discretion but always make clear the origin of the work (i.e. from where it can be obtained). List the references in alphabetical order by author.
Examples:
Deaton, A. S. and G. Laroque (1992), ―On the behaviour of commodity prices.‖ Review of Economic Studies, vol. 59, pp. 1–23. Keynes, J. M. (1936), The General Theory of Employment, Interest and Money. London: Macmillan. Krugman, P. (1999) ―Thinking about the liquidity trap.‖ (unpublished) URL:, December 1999. 26
Symeonidis, G. (1999), ―Price and non-price competition with endogenous market structure.‖ (unpublished) University of Essex Economics Discussion Paper Series, No. 501, August 1999.
Notice that the Krugman (1999) reference is to a paper available on the www. In this case it is conventional to provide the URL (i.e. the address) between angle brackets, < >.
iv. You do have discretion in terms of how you present your citations and bibliography. That is, you are not required rigidly to adhere to the style outlined above.
v. You may come across non-standard cases which do not fit into the above categories, in which case try to be as systematic as you can. For instance, if there is no author such as for a newspaper article, give the reference by title.
Example:
The Economist (2000), ―The ECB heads for turbulence.‖ January 29 2000, pp. 105–6.
vi. Two important rules:
o For every citation, there must be exactly one reference in the bibliography.
o For every reference in the bibliography, there must be at least one citation.
Never include references in the bibliography that are not cited in your paper.
II. Last words
Make sure that you spell-check the final version of your dissertation before you submit it.
Make sure that you re-read the final draft of your dissertation at least seven times before you hand it in. If you do not do so, it is likely that many errors and inaccuracies will remain in it and you will lose marks. You will be surprised at how many errors you will find each time you re-read your draft.
C. Basic panel data regression analysis using Stata
1. Definition
2. Pooled Ordinary Least Squares (OLS)
3. Fixed effects
1. Definition
Panel data (longitudinal data): pooling of observations on a cross section of economic agents over several time periods.
2. Pooled OLS regression
Suppose you want to run the following regression:
I it /K i(t-1) = a 0 + a 1 CF it /K i(t-1) + v i + v t + e it
where i indexes firms and t indexes time.
I denotes the firm‘s investment
K its capital stock ; CF its cash flow 32
The error term in the Equation above contains 3 components:
o v i : firm specific component that includes all firm characteristics that do not vary with time, but affect investment. An example of these characteristics could be the firm‘s managers‘ attitude towards risk (i.e. whether managers are risk lovers or risk averse). Another example could be managerial quality.
o v t : time specific component, accounted for by including time dummies in the regression.
o e it : idiosyncratic component of the error term. 33
o To estimate your investment equation by OLS, you would type the following command:
reg ik cfk, robust
o The ―robust‖ option gives error terms that are robust to heteroskedasticity.
o Including time dummies [y12-y18; obtained by typing: tab year, gen (y1)] as well, the command and output look as follows: reg ik cfk y12-y18, robust
reg ik cfk y12-y18, robust
Linear regression Number of obs = 50792
F( 8, 50783) = 17.03
Prob > F = 0.0000
R-squared = 0.4084
Root MSE = 566.22
------------------------------------------------------------------------------
| Robust
ik | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cfk | -.8851148 .0909891 -9.73 0.000 -1.063454 -.7067753
y12 | -21.02766 12.14177 -1.73 0.083 -44.82565 2.770336
y13 | 2.974283 3.123081 0.95 0.341 -3.14699 9.095555
y14 | 2.07409 2.603769 0.80 0.426 -3.029325 7.177504
y15 | -34.49331 14.08698 -2.45 0.014 -62.10394 -6.882679
y16 | 2.856203 4.645294 0.61 0.539 -6.248622 11.96103
y17 | .4052324 4.083605 0.10 0.921 -7.598676 8.409141
y18 | 6.429211 3.698681 1.74 0.082 -.8202434 13.67867
_cons | 42.25229 3.247488 13.01 0.000 35.88718 48.6174
------------------------------------------------------------------------------
How to read this Table?
A coefficient is statistically significant at the 5% level if:
o its t-statistic is above 1.96 or below -1.96
o its p-value is below 0.05 36
A coefficient is statistically significant at the 10% level if:
o its t-statistic is above 1.65 or below -1.65
o its p-value is below 0.10 37
You can include a lagged dependent variable into your equation. This leads to
a dynamic model.
In this case, the command is: reg ik l.ik cfk y12-y18, robust
This is equivalent to estimating the equation:
I it /K i(t-1) = a 0 + a 1 I i(t-1) /K i(t-2) + a 2 CF it /K i(t-1) + v i + v t + e it
. reg ik l.ik cfk y12-y18, robust
Linear regression Number of obs = 44189
F( 8, 44180) = 12.66
Prob > F = 0.0000
R-squared = 0.5145
Root MSE = 478.16
------------------------------------------------------------------------------
| Robust
ik | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ik |
L1. | -.0054137 .002202 -2.46 0.014 -.0097298 -.0010977
cfk | -.870642 .1004969 -8.66 0.000 -1.067618 -.6736662
y12 | (dropped)
y13 | 10.92086 3.821735 2.86 0.004 3.430196 18.41153
y14 | 9.873364 3.350431 2.95 0.003 3.306459 16.44027
y15 | -26.49544 14.21423 -1.86 0.062 -54.35558 1.364707
y16 | 9.551217 5.129863 1.86 0.063 -.5034045 19.60584
y17 | 8.486034 4.634428 1.83 0.067 -.5975283 17.5696
y18 | 14.35822 4.277006 3.36 0.001 5.975212 22.74123
_cons | 33.56754 4.443573 7.55 0.000 24.85806 42.27702
------------------------------------------------------------------------------
You can also run your regression for various sub-groups of firms, for instance for exporters:
reg ik cfk y12-y18 if expdum==1, robust
or for state-owned firms etc. (soek is a dummy equal to 1 for state-owned firms, and 0 otherwise)
reg ik cfk y12-y18 if soek==1, robust
Problems with OLS:
o it does not take into account the v i component of the error term (unobserved heterogeneity)
o it does not take into account the fact that cash flow and investment may be contemporaneously determined, i.e. that cash flow may be endogenous.
o In a dynamic setting, I i(t-1) /K i(t-2) is correlated with the v i component of the error term inconsistent (upward biased) estimates of the lagged dependent variable coefficient.
3. Fixed effects estimator (also called within groups estimator)
It accounts for the v i component of the error term, by transforming the equation in differences of each variable from its mean. In other words, it controls for ―unobserved heterogeneity‖. The equation it estimates is:
I it /K i(t-1) – (I i /K i )*= a 1 [(CF it /K i(t-1) )- (CF i /K i )*] +(v t – v t *) + (e it -e i *)
where * indicates mean values, i.e.
(I i /K i )* = (1/T)[ (I i2 /K i1 ) + (I i3 /K i2 ) +... + (I iT /K i(T-1) )]
The command will be:
xtreg ik cfk y12-y18, fe
The output for the static model would look as follows:
Fixed-effects (within) regression Number of obs = 50792
Group variable: number Number of groups = 6529
R-sq: within = 0.4205 Obs per group: min = 4
between = 0.3286 avg = 7.8
overall = 0.4084 max = 8
F(8,44255) = 4013.54
corr(u_i, Xb) = -0.0592 Prob > F = 0.0000
------------------------------------------------------------------------------
ik | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cfk | -.9094596 .0050779 -179.10 0.000 -.9194123 -.8995069
y12 | -20.73012 10.50903 -1.97 0.049 -41.32801 -.1322387
y13 | 2.968461 10.51051 0.28 0.778 -17.63232 23.56924
y14 | 2.508857 10.50884 0.24 0.811 -18.08866 23.10638
y15 | -34.47516 10.53183 -3.27 0.001 -55.11774 -13.83258
y16 | 3.200465 10.50785 0.30 0.761 -17.3951 23.79603
y17 | .8482673 10.5075 0.08 0.936 -19.74662 21.44315
y18 | 7.108059 10.50874 0.68 0.499 -13.48926 27.70537
_cons | 43.03448 7.879796 5.46 0.000 27.58995 58.47902
-------------+----------------------------------------------------------------
sigma_u | 222.82496
sigma_e | 559.72247
rho | .13680205 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(6528, 44255) = 1.18 Prob > F = 0.0000
Rho tells you what is the percentage of the total variance of your dependent variable captured by v i (in this case, this percentage is 14%).
Problem with the fixed effects estimator: it still does not account for the possible endogeneity of cash flow.
Also, if you estimate a dynamic model, then:
[I i(t-1) /K i(t-2) – (I i /K i )*] will obviously be correlated with (e it -e i *),
as (I i /K i )* is correlated with e i * inconsistent (downward biased) estimates of the lagged dependent variable coefficient. 45
4. Arellano and Bond (1991) Generalized Methods of Moments (GMM) estimator
Suppose we wish to estimate the following dynamic model:
I it /K i(t-1) = a 0 + a 1 I i(t-1) /K i(t-2) + a 2 CF it /K i(t-1) + v i + v t + e it
The GMM estimator is the best estimator, as it accounts both for unobserved heterogeneity and for the possible endogeneity of the regressors.
o It accounts for unobserved heterogeneity by estimating the equation in first-differences, i.e.
(I it /K i(t-1) )–(I i(t-1) /K i(t-2) ) = a 1 [(I i(t-1) /K i(t-2) )-(I i(t-2) /K i(t-3) )] +
+a 2 [(CF it /K i(t-1) )-(CF i(t-1) /K i(t-2) )] +(v t – v t-1 ) + (e it -e i(t-1) )
o It accounts for endogeneity by instrumenting the endogenous regressors with two or more lags of themselves.
Instruments have to be lagged twice or more to ensure that they are not correlated with the idiosyncratic component of the error term, (e it -e i(t-1) )
Example:
o t=4
(I i4 /K i3 )-(I i3 /K i2 )=a 1 [(I i3 /K i2 )-(I i2 /K i1 )]+a 2 [(CF i4 /K i3 )-(CF i3 /K i2 )]+(v 4 –v 3 )+(e i4 -e i3 )
I i2 /K i1 is a valid instrument since highly correlated with [(I i3 /K i2 )-(I i2 /K i1 )] and uncorrelated with (e i4 -e i3 )
o t=5
(I i5 /K i4 )-(I i4 /K i3 )=a 1 [(I i4 /K i3 )-(I i3 /K i2 )]+a 2 [CF i5 /K i4 -(CF i4 /K i3 )]+(v 5 –v 4 )+(e i5 -e i4 )
I i3 /K i2 and I i2 /K i1 are valid instruments since highly correlated with [(I i4 /K i3 )-(I i3 /K i2 )] and uncorrelated with (e i5 -e i4 ) 48
o t=T: I i2 /K i1 , I i3 /K i2 , ..., I i(T-2) /K i(T-2-1) are valid instruments.
The command to be used to estimate a model with GMM is xtabond2. Before using it, you need to install it on your computer by typing:
ssc install xtabond2
The command to estimate your investment equation should read:
xtabond2 ik l.ik cfk y12-y18 , gmm(cfk ik, laglimits(2 2)) iv(y11-y18 ) noleveleq robust small nomata
Instruments used in the estimation are cfk and ik lagged two periods (cfk t-2, k t-2 ).
[Note: laglimits (x y) indicated that the latest instrument is lagged x times and the earliest one is lagged y times]
The time dummies (not lagged) are also included in the instrument set.
The output will look as follows:
Arellano-Bond dynamic panel-data estimation, one-step difference GMM results
------------------------------------------------------------------------------
Group variable: number Number of obs = 37593
Time variable : year Number of groups = 6529
Number of instruments = 18 Obs per group: min = 1
F(7, 6528) = 3.98 avg = 5.76
Prob > F = 0.000 max = 6
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ik |
L1. | -.001168 .0005246 -2.23 0.026 -.0021964 -.0001396
cfk | .0979312 .055524 1.76 0.078 -.0109139 .2067763
y13 | 4.150606 2.939667 1.41 0.158 -1.612105 9.913316
y14 | -14.72795 18.90624 -0.78 0.436 -51.79038 22.33448
y15 | -36.94477 13.79228 -2.68 0.007 -63.98215 -9.907388
y16 | .7523111 4.252736 0.18 0.860 -7.584444 9.089066
y17 | -3.451998 3.622561 -0.95 0.341 -10.5534 3.649408
y18 | -5.579046 3.673513 -1.52 0.129 -12.78033 1.622242
------------------------------------------------------------------------------
Hansen test of overid. restrictions: chi2(10) = 16.16 Prob > chi2 = 0.095
Arellano-Bond test for AR(1) in first differences: z = -1.46 Pr > z = 0.143
Arellano-Bond test for AR(2) in first differences: z = -0.82 Pr > z = 0.412
------------------------------------------------------------------------------ 51
You can use instruments lagged two, three, and four times (cfk t-2 , cfk t-3 , cfk t-4;
ik t-2 , ik t-3 , ik t-4 ) instead of only two times. In this case, the command looks like:
xtabond2 ik l.ik cfk y12-y18 , gmm(cfk ik , laglimits(2 4)) iv(y11-y18 ) noleveleq robust small nomata
The output will be:
Arellano-Bond dynamic panel-data estimation, one-step difference GMM results
------------------------------------------------------------------------------
Group variable: number Number of obs = 37593
Time variable : year Number of groups = 6529
Number of instruments = 36 Obs per group: min = 1
F(7, 6528) = 6.28 avg = 5.76
Prob > F = 0.000 max = 6
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ik |
L1. | -.0027787 .0013407 -2.07 0.038 -.005407 -.0001504
cfk | -.300748 .2566093 -1.17 0.241 -.8037862 .2022902
y13 | 6.520779 3.304253 1.97 0.048 .0433607 12.9982
y14 | -5.01961 13.56572 -0.37 0.711 -31.61287 21.57365
y15 | -32.96227 14.00683 -2.35 0.019 -60.42025 -5.504294
y16 | 4.055528 4.972904 0.82 0.415 -5.692992 13.80405
y17 | 1.124406 4.707044 0.24 0.811 -8.102941 10.35175
y18 | 2.390772 6.206069 0.39 0.700 -9.775155 14.5567
------------------------------------------------------------------------------
Hansen test of overid. restrictions: chi2(28) = 46.22 Prob > chi2 = 0.017
Arellano-Bond test for AR(1) in first differences: z = -1.73 Pr > z = 0.084
Arellano-Bond test for AR(2) in first differences: z = -0.55 Pr > z = 0.584
------------------------------------------------------------------------------
o Although more instruments would be valid in this setting (cfk t-5 , ik t-5 , cfk t-6 , ik t-6 etc.), it is not a good idea to use too many instruments, as this leads to an overfitting bias.
o Note: in the gmm(…) command, you should never put lagged variables.
o The estimated coefficient on the lagged dependent variable should fall between the upward biased OLS coefficient and the downward biased fixed-effects coefficient.
The Sargan and m2 tests
In order to evaluate whether your model is correctly specified and whether your instruments are valid, two criteria are frequently used: The J statistic and the test for second order serial correlation of the residuals in the differenced equation (m2).
o The former is the Sargan/Hansen test for overidentifying restrictions. If the model is correctly specified, the variables in the instrument set should be uncorrelated with the idiosyncratic component of the error term e it .
o The m2 test provides a further check on the specification of the model and on the legitimacy of variables dated t-2 as instruments.
o In order for the instruments to be acceptable, the p-values for the Sargan test and the m2 test should both be greater than 0.05.
More help
Baum (2006), Chapters 4.1, 9.1.1, 9.1.2, 9.3
help reg
help xtreg
help xtabond2
Last point: Dealing with outliers
Before running regressions, you should drop the outliers for all regression variables.
Outliers are extreme observations, and leaving them in the sample can bias the results. Typically, we deal with this problem by dropping observations below the 1st percentile and above the 99th percentile for all regression variables.
Do NOT drop outliers for variables that you do not use in regressions. 57
The code for dropping outliers is:
foreach var of varlist ik cfk wkk1 invwkk1 assetsgr srgrowth collateral
leverage empg prod leverage expratio {
egen per99`var'=pctile(`var') , p(99)
egen per1`var'=pctile(`var') ,p(1)
quietly drop if `var'
quietly drop if `var'>per99`var' & `var'!=.
drop per1`var' per99`var'
―cfk wkk1 invwkk1 assetsgr srgrowth collateral leverage empg prod leverage expratio‖ are the variables I had in my regressions. Please substitute to these the variables that you have in your own regressions.
And finally ….
Data can be found at the site:
The main dataset is called ―panelasiafinforstudents‖. It contains data for China and other East Asian countries up to 2009. 59
The variables that you will need to use are all clearly labelled
Note, however, that there are several variables in the dataset that you will not understand. Just ignore them
If the focus of your dissertation is to only look at China, then you need to type: keep if country==―China‖
The source of your data is Thomson Financial‘s Worldscope database
如需定做,留学生论文请联系我们专家定制团队,QQ337068431,热线咨询电话:021-62170626
相关文章:
