What Is a Clinical Study?

A clinical study involves research using human volunteers (also called participants) that is intended to add to medical knowledge. There are two main types of clinical studies: clinical trials (also called interventional studies) and observational studies. ClinicalTrials.gov includes both interventional and observational studies.

  • Clinical Trials

    In a clinical trial, participants receive specific interventions according to the research plan or protocol created by the investigators. These interventions may be medical products, such as drugs or devices; procedures; or changes to participants' behavior, such as diet. Clinical trials may compare a new medical approach to a standard one that is already available, to a placebo that contains no active ingredients, or to no intervention. Some clinical trials compare interventions that are already available to each other. When a new product or approach is being studied, it is not usually known whether it will be helpful, harmful, or no different than available alternatives (including no intervention). The investigators try to determine the safety and efficacy of the intervention by measuring certain outcomes in the participants. For example, investigators may give a drug or treatment to participants who have high blood pressure to see whether their blood pressure decreases.

    Clinical trials used in drug development are sometimes described by phase. These phases are defined by the Food and Drug Administration (FDA).

    Some people who are not eligible to participate in a clinical trial may be able to get experimental drugs or devices outside of a clinical trial through an Expanded Access Program. See more information on expanded access from the National Library of Medicine.

  • Observational Studies

    In an observational study, investigators assess health outcomes in groups of participants according to a research plan or protocol. Participants may receive interventions (which can include medical products such as drugs or devices) or procedures as part of their routine medical care, but participants are not assigned to specific interventions by the investigator (as in a clinical trial). For example, investigators may observe a group of older adults to learn more about the effects of different lifestyles on cardiac health.

Who Conducts Clinical Studies?

Every clinical study is led by a principal investigator, who is often a medical doctor. Clinical studies also have a research team that may include doctors, nurses, social workers, and other health care professionals.

Clinical studies can be sponsored, or funded, by pharmaceutical companies, academic medical centers, voluntary groups, and other organizations, in addition to Federal agencies such as the National Institutes of Health, the U.S. Department of Defense, and the U.S. Department of Veterans Affairs. Doctors, other health care providers, and other individuals can also sponsor clinical research.

Where Are Clinical Studies Conducted?

Clinical studies can take place in many locations, including hospitals, universities, doctors' offices, and community clinics. The location depends on who is conducting the study.

How Long Do Clinical Studies Last?

The length of a clinical study varies, depending on what is being studied. Participants are told how long the study will last before they enroll.

Reasons for Conducting Clinical Studies

In general, clinical studies are designed to add to medical knowledge related to the treatment, diagnosis, and prevention of diseases or conditions. Some common reasons for conducting clinical studies include:

  • Evaluating one or more interventions (for example, drugs, medical devices, approaches to surgery or radiation therapy) for treating a disease, syndrome, or condition

  • Finding ways to prevent the initial development or recurrence of a disease or condition. These can include medicines, vaccines, or lifestyle changes, among other approaches.

  • Evaluating one or more interventions aimed at identifying or diagnosing a particular disease or condition

  • Examining methods for identifying a condition or the risk factors for that condition

  • Exploring and measuring ways to improve the comfort and quality of life through supportive care for people with a chronic illness

Participating in Clinical Studies

A clinical study is conducted according to a research plan known as the protocol. The protocol is designed to answer specific research questions and safeguard the health of participants. It contains the following information:

  • The reason for conducting the study

  • Who may participate in the study (the eligibility criteria)

  • The number of participants needed

  • The schedule of tests, procedures, or drugs and their dosages

  • The length of the study

  • What information will be gathered about the participants

Who Can Participate in a Clinical Study?

Clinical studies have standards outlining who can participate. These standards are called eligibility criteria and are listed in the protocol. Some research studies seek participants who have the illnesses or conditions that will be studied, other studies are looking for healthy participants, and some studies are limited to a predetermined group of people who are asked by researchers to enroll.

Eligibility. The factors that allow someone to participate in a clinical study are called inclusion criteria, and the factors that disqualify someone from participating are called exclusion criteria. They are based on characteristics such as age, gender, the type and stage of a disease, previous treatment history, and other medical conditions.

How Are Participants Protected?

Informed consent is a process used by researchers to provide potential and enrolled participants with information about a clinical study. This information helps people decide whether they want to enroll or continue to participate in the study. The informed consent process is intended to protect participants and should provide enough information for a person to understand the risks of, potential benefits of, and alternatives to the study. In addition to the informed consent document, the process may involve recruitment materials, verbal instructions, question-and-answer sessions, and activities to measure participant understanding. In general, a person must sign an informed consent document before joining a study to show that he or she was given information on the risks, potential benefits, and alternatives and that he or she understands it. Signing the document and providing consent is not a contract. Participants may withdraw from a study at any time, even if the study is not over. See the Questions to Ask section on this page for questions to ask a health care provider or researcher about participating in a clinical study.

Institutional review boards. Each federally supported or conducted clinical study and each study of a drug, biological product, or medical device regulated by FDA must be reviewed, approved, and monitored by an institutional review board (IRB). An IRB is made up of doctors, researchers, and members of the community. Its role is to make sure that the study is ethical and that the rights and welfare of participants are protected. This includes making sure that research risks are minimized and are reasonable in relation to any potential benefits, among other responsibilities. The IRB also reviews the informed consent document.

In addition to being monitored by an IRB, some clinical studies are also monitored by data monitoring committees (also called data safety and monitoring boards).

Various Federal agencies, including the Office of Human Subjects Research Protection and FDA, have the authority to determine whether sponsors of certain clinical studies are adequately protecting research participants.

Relationship to Usual Health Care

Typically, participants continue to see their usual health care providers while enrolled in a clinical study. While most clinical studies provide participants with medical products or interventions related to the illness or condition being studied, they do not provide extended or complete health care. By having his or her usual health care provider work with the research team, a participant can make sure that the study protocol will not conflict with other medications or treatments that he or she receives.

Considerations for Participation

Participating in a clinical study contributes to medical knowledge. The results of these studies can make a difference in the care of future patients by providing information about the benefits and risks of therapeutic, preventative, or diagnostic products or interventions.

Clinical trials provide the basis for the development and marketing of new drugs, biological products, and medical devices. Sometimes, the safety and the effectiveness of the experimental approach or use may not be fully known at the time of the trial. Some trials may provide participants with the prospect of receiving direct medical benefits, while others do not. Most trials involve some risk of harm or injury to the participant, although it may not be greater than the risks related to routine medical care or disease progression. (For trials approved by IRBs, the IRB has decided that the risks of participation have been minimized and are reasonable in relation to anticipated benefits.) Many trials require participants to undergo additional procedures, tests, and assessments based on the study protocol. These requirements will be described in the informed consent document. A potential participant should also discuss these issues with members of the research team and with his or her usual health care provider.

Questions to Ask

Anyone interested in participating in a clinical study should know as much as possible about the study and feel comfortable asking the research team questions about the study, the related procedures, and any expenses. The following questions may be helpful during such a discussion. Answers to some of these questions are provided in the informed consent document. Many of the questions are specific to clinical trials, but some also apply to observational studies.

  • What is being studied?

  • Why do researchers believe the intervention being tested might be effective? Why might it not be effective? Has it been tested before?

  • What are the possible interventions that I might receive during the trial?

  • How will it be determined which interventions I receive (for example, by chance)?

  • Who will know which intervention I receive during the trial? Will I know? Will members of the research team know?

  • How do the possible risks, side effects, and benefits of this trial compare with those of my current treatment?

  • What will I have to do?

  • What tests and procedures are involved?

  • How often will I have to visit the hospital or clinic?

  • Will hospitalization be required?

  • How long will the study last?

  • Who will pay for my participation?

  • Will I be reimbursed for other expenses?

  • What type of long-term follow-up care is part of this trial?

  • If I benefit from the intervention, will I be allowed to continue receiving it after the trial ends?

  • Will results of the study be provided to me?

  • Who will oversee my medical care while I am participating in the trial?

  • What are my options if I am injured during the study?

저작자 표시 비영리 변경 금지

'Clinical Research > Clinicaltrials.gov' 카테고리의 다른 글

What Is a Clinical Study?  (0) 2016.09.29

Chapter 7 단면조사 연구 및 코호트 연구의 설계

단면조사 연구(cross-sectional studies)

단면조사 연구의 장점과 단점 

- 단면조사 연구는 발병률이 아닌 유병률을 측정하므로, 어떤 질병의 원인, 예후, 자연적 경과에 대한 추론을 도출할 때 조심해야 한다. 

- 질병의 유병률과 연관된 요인은 그 질병의 원인일 수도 있지만, 그저 그 질병의 과정과 연관된 것일 수도 있다. 

순차적 설문조사

코호트 연구(cohort studies)

전향적 코호트 연구(prospective cohort studies)

전향적 코호트 연구의 장점과 단점 

후향적 코호트 연구(retrospective cohort studies)

후향적 코호트 연구의 장점과 단점

다중 코호트 연구(multiple-cohort studies)와 외부 대조군(external controls)

코호트 연구에 대한 통계학적 접근

기존 코호트 연구와 관련된 문제들 

표7.3 추적관찰 동안 손실을 취소화하기 위한 전략


 1. 다음과 같은 손실 가능성을 배제 

    a) 이사 계획이 있는 경우 

    b) 회신의 의지가 불분명한 경우 

    c) 연구 질문과 연관이 없는 병약 상태이거나 치명적인 질병이 있는 경우 

2. 향후 추적이 가능한 정보를 취합 

    a) 대상자의 주소, 전화번호, 이메일 주소 

    b) 주민등록번호/의료보험증 번호 

    c) 대상자와 동거 중이지 않은 친척이나 친구 한 두명의 이름, 주소, 전화번호, 이메일 주소 

    d) 주치의의 이름, 이메일, 주소, 전화번호 


 1. 정보를 수집하고, 결과를 제공하며, 관심을 표현하기 위해 대상자와 주기적으로 접촉 

    a) 전화: 주말과 저녁에 전화요망 

    b) 편지: 이메일, 우편, 반송용 카드를 이용하여 반복전달 

    c) 기타: 소식지, 상품권 

2. 전화나 우편으로 연락이 닿지 않는 경우 

    a) 친구, 친척, 주치의와 연락 

    b) 우체국에서 새주소 파악 

    c) 전화번호부, 인터넷, 신용조사기관등의 공적 자료를 통한 주소 파악 

    d) 의료보험 혜택을 받는 대상자의 경우, 사회보장국을 통해 병원 퇴원 기록을 수집 

    e) 보건복지부나 통계청 국가사망기록 등을 통해 생존여부 확인 


 1. 감사와 친절, 존경의 마음으로 연구대상자들을 대하고, 그들이 연구에 성공적인 파트너로 참여를 원할 수 있게 연구 질문을 이해하는 데 도움을 주어야 한다. 

저작자 표시 비영리 변경 금지

5. 표본크기 산출을 위한 준비: 가설과 기본 원칙


좋은 가설의 특징

단순함 vs 복잡함

특정함 vs 모호함

사전(in-advance) vs 사후(after-the-fact)

귀무가설과 대립가설 

기본통계의 원칙 


대립가설의 측면

통계적 검정의 종류 

추가적으로 검토할 사항 


다중, 사후가설 

1차 및 2차 가설 


1. 분석적 및 서술적 연구에서 표본 크기를 정하는 것은 중요한 과정이다. 연구 설계과정 초기에 표본크기를 정해야 적절히 필요한 수정을 할 수 있다. 

2. 분석적 연구와 실험에서는 주요 예측변수와 결과변수간의 예상되는 연관성에 대하여 명시하는 가설을 수립해야 통게적 검정을 수행할 수 있다. 완전히 서술적인 연구는 비교기법이 없으므로 가설이 필요없다. 

3. 좋은 가설은 상세하게 모집단으로부터의 표본추출방법, 변수 측정법을 명시해야하며, 단순한 가설로(단 하나의 예측변수와 단 하나의 결과변수만 있는 경우), 사전에 미리 수립해야한다.

4. 귀무가설은 예측변수와 결과변수가 서로 연관되어있지 않다고 가정하며 통계적 유효도 검정의 기반이 된다. 대립가설은 예측 변수와 결과변수가 서로 연관되어있다고 가정한다. 통계적 검정을 통해 연관성이 없음을 가정하는 귀무가설을 부정하고 연관성이 있음을 주장하는 대립가설을 수용하려 시도하게 된다. 

5. 대립가설은 단측(연관성의 한 쪽 방향만 검정함) 혹은 양측(양쪽 방향을 모두 검정함)일 수 있다. 단측가설은 연관성의 한 쪽 방향만 임상적으로나 생물학적으로 의미가 있는 매우 특이한 경우에만 사용해야 한다. 

6. 분석적 연구 및 실험에 있어서 표본크기란, 주어진 효과크기(effect size)와 분산(variability) 상황에서 제 1종(위양성) 및 제 2종(위음성) 오류를 범할 가능성을 일정부분 지닌 채, 연관성을 찾아내기 위해 필요한 피험자의 수이다. 제 1종 오류를 범할 최대 확률을 α 라고 부르며, 제 2종 오류를 범할 최대 확율은 β 라고 한다. 1에서 β를 뺀 값(1-β)을 검정력(power)이라 하며 이는 모집단 내에 실제로 연관성이 있을 경우, 표본에서 주어진 효과크기 혹은 그 이상의 연관성을 찾아낼 수 있는 가능성을 의미한다.

7. 하나이상의 가설을 미리 수립하면 바람직한 경우가 많다. 그러나 연구자는 단일한 1차 가설을 정하여 표본 크기는 이를 기준으로 산출해야 한다. 데이터로부터 얻어지는 예상치 못했던 결과물을 포함하여 표본 내의 다중 가설을 검정하여 얻는 결과물을 분석하는 과정은, 그 결과물들이 모집단 내에서 일어나는 현상을 설명하는 사전확률(prior probability)에 대한 판단에 근거한다. 

6. 표본 크기의 산출과 검정력: 응용과 사례 

분석적 연구 및 실험을 위한 표본 크기의 기법

t 검정

카이제곱 검정


기타 고려사항 및 특수한 문제


범주형 변수

생존 분석



다변량 조정 및 기타 특수한 통계분석 

동등성 시험 및 비열등성 시험 

서술적 연구를 위한 표본크기 기법 


이분형 변수 

표본크기가 고정되어 있는 경우 

표본 크기를 최소화하고 검정력을 최대화하기 위한 전략 

연속형 변수의 사용

쌍체 측정법(paired measurements)의 사용

간략한 기술적 언급

정밀도가 높은 변수의 사용

크기가 동일하지 않은 집단의 사용

발현율이 높은 결과를 사용

정보가 충분치 않을 때 표본 크기를 산출하는 방법 

피해야 할 흔한 실수들 


1. 분석적 연구에서 표본크기를 구할 때는 다음 단계를 수행하라

a) 귀무가설과 대립가설을 수립하라. 양측인지 단측인지 명시하라

b) 데이터를 분석할 때 쓸 수 있는 통계적 검정을 선택하라, 예측 변수와 결과 변수의 종류에 근거한다. 

c) 제 1종 및 제 2종 오류를 범하지 않기 위한 중요도를 감안하여 α와 β를 정하라 

2. 분석적 연구에서 표본 크기를 산출할 때 고려할 사항으로는 발생 가능한 탈락분에 대한 조정, 범주형 변수 처리 전략, 생존분석, 군집표본, 다변량 조정, 동등성연구가 있다. 

3. 가설이 필요치 않은 서술적 연구에서 표본크기를 산출하는 방법은 다음 단계를 따른다.

a) 이분적 결과물을 갖는 피험자의 비율이나 연속적 결과물의 표준편차를 구한다.

b) 원하는 정밀도(신뢰구간의 간격)을 정한다.

c) 신뢰도(예를들어 95%)를 정한다. 

4. 표본크기가 미리 결정되어 있다면, 역방향으로 작업하여 검출가능한 효과크기를 산출한다. 혹은 비교적 드문 경우지만 검정력을 산출한다.

5. 연속변수, 보다 정밀한 측정법, 쌍체측정법, 크기가 동일하지 않은 집단, 빈도가 높은 결과물을 활용하면 필요한 표본 크기를 최소화할 수 있다.

6. 표본크기를 산출하기 위한 정보가 부족한 경우, 관련분야의 문헌을 검토하고 동료들에게 자문을 구하여 임상적으로 의미있는 효과크기를 선택하여야 한다.

7. 피해야할 오류들은 다음과 같다. 표본크기를 너무 늦게 산출하는 것; 백분율로 표시된 비율을 연속형으로 잘못 해석하는 것; 빠진 피험자와 데이터를 고려하지 않는 것; 군집 데이터와 쌍을 이룬 데이터를 적절히 설명하지 않는 것. 

저작자 표시 비영리 변경 금지

UCSF의 Biostatistics department에서 내놓은 임상연구디자인 4판. 

과거 날림으로 읽었던 책인데 오늘부터 시간날 때 마다 조금씩 정리해보기로.. 

ref: http://www.dcr-4.net

챕터 1: 임상연구의 모든 것

연구의 구성 

- 연구질문 

- 배경과 중요도 

- 설계 

- 연구대상자 

- 변수 

- 통계적 이슈

연구의 운영  

: 임상연구의 목표는 자연현상에 대한 연구에서 얻어지는 발견사항으로부터 결론을 이끌어 내는 것 

1) 내적타당성 (internal validity) - 연구자가 실제 연구의 결과들로부터 올바른 결과를 이끌어내는 정도 

2) 외적타당성 (external validity) - 이끌어낸 결론이 연구대상 이외의 일반 대중 및 사건들에 적절하게 적용될 수 있는 정도 (일반화가능성 generalizability)

: 추론과정을 위협하는 무작위 오류(우연)와 계통 오류 (치우침) 을 제어할 수 있도록 설계하는 것이 중요 

- 연구설계 

- 연구실행 

- 인과관계 추론 (Causal Inference)

- 연구의 오류


- 연구계획 

- 손익 분석 (Trade-offs)

챕터 2: 연구질문 및 연구계획 

연구 질문의 시작 

- 기존 연구에 대한 완벽한 이해

- 새로운 아이디어와 기술에 대한 적극적인 수용 

- 늘 상상력을 발휘하라 

- 멘토 구하기 

훌륭한 연구 질문의 특성 (FINER)

- 실행가능성 (feasible)

- 흥미 (interest)

- 참신성 (novel)

- 윤리성 (ethical)

- 적절성 (relevant)

연구질문 및 계획안의 개발 

: 초기에 연구질문을 1쪽 분량의 연구개요로 작성해야한다. 연구개요는 필요한 피험자의 수, 피험자 선정방법, 측정항목 등을 상세하게 기술한다. 

- 문제점과 접근법 

- 일차질문과 이차질문 

중계연구 (Translational research)

- 실험실 연구에서 임상연구로의 중계 

- 임상연구에서 모집단연구로의 중계 

저작자 표시 비영리 변경 금지


서론에서 작성해야할 가장 중요한 사항들은 아래와 같다. 

  • establish the context, background and/or importance of the topic
  • indicate an issue, problem, or controversy in the field of study
  • define the topic or key terms
  • state the purpose of the essay/writing
  • provide an overview of the coverage and/or structure of the writing

연구논문은 전반적으로 짧게 적는 것이 바람직하지만 여러 요소들이 꼭 포함되어야하며 이는 다음과 같다. 
  • establishing the context, background and/or importance of the topic
  • giving a brief synopsis of the relevant literature
  • indicating a problem, controversy or a knowledge gap in the field of study
  • establishing the desirability of the research
  • listing the research questions or hypotheses
  • providing a synopsis of the research method(s)
  • explaining the significance or value of the study
  • defining certain key terms
  • providing an overview of the dissertation or report structure
  • explaining reasons for the writer’s personal interest in the topic

Establishing the importance of the topic for the world or society

X is fundamental to …

X has a pivotal role in …

X is frequently prescribed for …

X is fast becoming a key instrument in …

X plays a vital role in the metabolism of …

X plays a critical role in the maintenance of …

Xs have emerged as powerful platforms for …

X is essential for a wide range of technologies.

X can play an important role in addressing the issue of …

Xs are the most potent anti-inflammatory agents known.

There is evidence that X plays a crucial role in regulating …

X is a common condition which has considerable impact on …

In the new global economy, X has become a central issue for …

Evidence suggests that X is among the most important factors for …

X is important for a wide range of scientific and industrial processes.

Xs are one of the most widely used groups of antibacterial agents and …

There is a growing body of literature that recognises the importance of …

X is an important component in the climate system, and plays a key role in Y.

In the history of development economics, X has been thought of as a key factor in …

Xs are one of the most widely used groups of Y and have been extensively used for …

Establishing the importance of the topic for the discipline

A key aspect of X is …

X is of interest because …

X is a classic problem in …

A primary concern of X is …

X is a dominant feature of …

X is an important aspect of …

X is a fundamental property of …

The concepts of X and Y are central to …

X is at the heart of our understanding of …

Investigating X is a continuing concern within …

X is a major area of interest within the field of …

X has been studied by many researchers using …

X has been an object of research since the 1960s.

X has been the subject of many classic studies in …

X has been instrumental in our understanding of …

The theory of X provides a useful account of how …

Central to the entire discipline of X is the concept of …

X is an increasingly important area in applied linguistics.

The issue of X has received considerable critical attention.

X has long been a question of great interest in a wide range of fields.

Establishing the importance of the topic (time frame given)

Recently, there has been renewed interest in …

Traditionally, Xs have subscribed to the belief that …

One of the most important events of the 1970s was …

In recent years, there has been an increasing interest in …

Recent developments in X have heightened the need for …

The last two decades have seen a growing trend towards …

Recently, researchers have shown an increased interest in …

Over the past century, there has been a dramatic increase in …

Recent trends in X have led to a proliferation of studies that …

X proved an important literary genre in the early Y community.

The past decade has seen the rapid development of X in many …

Since it was reported in 2005, X has been attracting a lot of interest.

Recently, a considerable literature has grown up around the theme of …

Recent developments in the field of X have led to a renewed interest in …

The past thirty years have seen increasingly rapid advances in the field of …

The changes experienced by X over the past decade remain unprecedented.

In light of recent events in X, it is becoming extremely difficult to ignore the existence of …

Synopsis of literature

Recent evidence suggests that …

Previous studies have reported …

Several studies have documented …

Studies of X show the importance of …

Several attempts have been made to …

A number of researchers have reported …

Previous research comparing X and Y has found …

Existing research recognizes the critical role played by …

Recently investigators have examined the effects of X on Y.

Surveys such as that conducted by Smith (1988) showed that …

Factors found to be influencing X have been explored in several studies.

A considerable amount of literature has been published on X. These studies …

In the past two decades, a number of researchers have sought to determine …

The first serious discussions and analyses of X emerged during the 1970s with …

There have been a number of longitudinal studies involving X that have reported …

Xs were reported in the first studies of Y (e.g., Smith, 1977; Smith and Patel, 1977).

What we know about X is largely based upon empirical studies that investigate how …

Smith (1984: 217) shows how, in the past, research into X was mainly concerned with …

Highlighting a problem

One of the main obstacles …

One of the greatest challenges …

A key issue is the safe disposal of …

The main disadvantage of X is that …

X is associated with increased risk of …

X is a common disorder characterised by …

It is now well established that X can impair …

X is a common, chronic disease of childhood.

X has led to the declines in the populations of …

X is a growing public health concern worldwide.

X is one of the most frequently stated problems with …

The main challenge faced by many experiments is the …

Lack of X has existed as a health problem for many years.

X is a major public health problem, and the main cause of …

Xs are one of the most rapidly declining groups of insects in …

X is the leading cause of death in western-industrialised countries.

Despite its long clinical success, X has a number of problems in use.

Exposure to X has been shown to be related to adverse effects in …

There is increasing concern that some Xs are being disadvantaged …

There is an urgent need to address the safety problems caused by …


X may cause …

X is limited by …

X suffers from …

X is too expensive to be used for …

X has accentuated the problem of …

the performance of X is limited by …

X could be a contributing factor to …

the synthesis of X remains a major challenge.

X can be extremely harmful to human beings.

research has consistently shown that X lacks …

a major problem with this kind of application is …

the determination of X is technically challenging.

current methods of X have proven to be unreliable.

these rapid changes are having a serious effect on …

X can be adversely affected under certain conditions.

observations have indicated a serious decline in the population of …

Highlighting a controversy in the field of study

A much debated question is whether …

One major issue in early X research concerned …

To date there has been little agreement on what …

The issue has grown in importance in light of recent …

In the literature on X, the relative importance of Y is debated.

One observer has already drawn attention to the paradox in …

Questions have been raised about the use of animal subjects in …

In many Xs, a debate is taking place between Ys and Zs concerning …

Debate continues about the best strategies for the management of …

This concept has recently been challenged by X studies demonstrating …

The debate about X has gained fresh prominence with many arguing that …

Scholars have long debated the impact of X on the creation and diffusion of …

More recently, literature has emerged that offers contradictory findings about …

One of the most significant current discussions in legal and moral philosophy is …

One major theoretical issue that has dominated the field for many years concerns …

The controversy about scientific evidence for X has raged unabated for over a century.

The issue of X has been a controversial and much disputed subject within the field of …

The causes of X have been the subject of intense debate within the scientific community.

In the literature on X, the relative importance of Y has been subject to considerable discussion.

General reference to previous research or scholarship: highlighting paucity of research

There is little published data on …

No previous study has investigated X.

The use of X has not been investigated.

There has been no detailed investigation of …

There has been little quantitative analysis of …

Data about the efficacy and safety of X are limited.

Up to now, far too little attention has been paid to …

A search of the literature revealed few studies which …

The impact of X on Y is understudied, particularly for …

Few studies have investigated X in any systematic way …

In addition, no research has been found that surveyed …

So far, however, there has been little discussion about …

So far, very little attention has been paid to the role of X.

Surprisingly, the effects of X have not been closely examined.

In contrast to X, there is much less information about effects of …

A systematic understanding of how X contributes to Y is still lacking.

Despite the importance of X, there remains a paucity of evidence on …

There have been no controlled studies which compare differences in …

To date, the problem has received scant attention in the research literature

To date, there are few studies that have investigated the association between …




 some research has been carried out on X,

no single study exists which …

no studies have been found which …

no controlled studies have been reported.

only two studies have attempted to investigate …

there have been few empirical investigations into …

there is still very little scientific understanding of …

the mechanism by which … has not been established.

Highlighting inadequacies of previous studies

Previous studies of X have not dealt with …

Researchers have not treated X in much detail.

Such expositions are unsatisfactory because they …

Most studies in the field of X have only focused on …

Such approaches, however, have failed to address …

Previous published studies are limited to local surveys.

Half of the studies evaluated failed to specify whether …

The research to date has tended to focus on X rather than Y.

Previously published studies on the effect of X are not consistent.

Smith’s analysis does not take account of …, nor does she examine …

The existing accounts fail to resolve the contradiction between X and Y.

Most studies in X have only been carried out in a small number of areas.

However, much of the research up to now has been descriptive in nature …

The generalisability of much published research on this issue is problematic.

Research on the subject has been mostly restricted to limited comparisons of …

However, few writers have been able to draw on any systematic research into …

Short-term studies such as these do not necessarily show subtle changes over time …

Although extensive research has been carried out on X, no single study exists which …

However, these results were based upon data from over 30 years ago and it is unclear if …

The experimental data are rather controversial, and there is no general agreement about …

Highlighting a knowledge gap in the field of study

Very little is known about X in …

… much less is known about X.

It is still not known whether …

The nature of X remains unclear.

What is not yet clear is the impact of X on …

The response of X to Y is not fully understood.

Causal factors leading to X remain speculative.

To date, there has been no reliable evidence that …

The neurobiological basis of this X is poorly understood.

Little is known about X and it is not clear what factors …

Much uncertainty still exists about the relationship between …

To date, studies investigating X have produced equivocal results.

The evidence that X and Y are associated with Z is weak and inconclusive.

This indicates a need to understand the various perceptions of X that exist among …

Some studies have shown the beneficial effects of …, but others showed a deterioration in …


 very little is known about X in…

few studies have investigated …

the nature of X remains unclear.

much less is known about how …

the use of X has not been investigated.

far too little attention has been paid to …

the behaviour of X has not yet been investigated.

the evidence for this relationship is inconclusive …

much uncertainty still exists about the relation between …

there have been no controlled studies which compare differences in …

Apart from Smith (2014), there is a general lack of research in …

Despite this, very few studies have investigated the impact of X on …

Despite the importance of X, there remains a paucity of evidence on …

Several studies have produced estimates of X (Smith, 2002; Jones, 2003), but there is still insufficient data for …

Indicating the focus, aim, argument of a short paper

In this paper, I argue that …

This paper attempts to show that …

The central thesis of this paper is that …

In the pages that follow, it will be argued that …

In this essay, I attempt to defend the view that …

The aim of this essay is to explore the relationship between …

The purpose of this paper is to review recent research into the …

This paper 

 argues that …

gives an account of …

discusses the case of …

analyses the impact of …

attempts to show that …

contests the claim that …

provides an overview of …

reviews the evidence for …

reports on a study which …

traces the development of …

explores the ways in which …

assesses the significance of …

highlights the importance of …

considers the implications of …

critically examines the view that …

proposes a new methodology for …

examines the relationship between …

compares the different ways in which …

investigates the factors that determine …

describes the design and implementation of …

Stating the purpose of research

The specific objective of this study was to …

An objective of this study was to investigate …

This thesis will examine the way in which the …

This study set out to investigate the usefulness of …

This dissertation seeks to explain the development of …

This case study seeks to examine the changing nature of …

The objectives of this research are to determine whether …

This prospective study was designed to investigate the use of …

This research examines the emerging role of X in the context of …

This study systematically reviews the data for…, aiming to provide …

Drawing upon two strands of research into X, this study attempts to …

This thesis intends to determine the extent to which … and whether …

This dissertation aims to unravel some of the mysteries surrounding …

This study therefore set out to assess the effect of X …, and the effect of …

The main aim of this study is to investigate the differences between X and Y.

Part of the aim of this project is to develop software that is compatible with …

There are two primary aims of this study: 1. To investigate … 2. To ascertain …

This study seeks to obtain data which will help to address these research gaps.

One purpose of this study was to assess the extent to which these factors were …

The purpose of this investigation is to explore the relationship between X and Y.

Synopsis of the research design, method, source(s) of data

Data for this study were collected using …

Five works will be examined, all of which …

This investigation takes the form of a case-study of the …

This study was exploratory and interpretative in nature.

This study uses a qualitative case study approach to investigate …

The research data in this thesis is drawn from four main sources: …

The approach to empirical research adopted for this study was one of …

This dissertation follows a case-study design, with in-depth analysis of …

By employing qualitative modes of enquiry, I attempt to illuminate the …

Qualitative and quantitative research designs were adopted to provide …

Both qualitative and quantitative methods were used in this investigation.

A holistic approach is utilised, integrating X, Y and Z material to establish …

The study was conducted in the form of a survey, with data being gathered via …

The methodological approach taken in this study is a mixed methodology based on …

A combination of quantitative and qualitative approaches was used in the data analysis.

Indicating significance

This research sheds new light on …

This study provides new insights into …

The study offers some important insights into …

The present study fills a gap in the literature by …

Understanding the link between X and Y will help …

This is the first study to undertake a longitudinal analysis of …

The present research explores, for the first time, the effects of …

The findings should make an important contribution to the field of ….

This study provides an exciting opportunity to advance our knowledge of …

This study aims to contribute to this growing area of research by exploring …

This project provided an important opportunity to advance the understanding of …

Therefore, this study makes a major contribution to research on X by demonstrating …

There are several important areas where this study makes an original contribution to …

Indicating limitations

The thesis does not engage with …

This study is unable to encompass the entire …

It is beyond the scope of this study to examine the …

A full discussion of X lies beyond the scope of this study.

The reader should bear in mind that the study is based on …

Another potential problem is that the scope of my thesis may be too broad.

Due to practical constraints, this paper cannot provide a comprehensive review of…

Giving reasons for personal Interest*

I became interested in Xs after reading …

I have worked closely with X for many years and …

My personal experience of X has prompted this research.

My main reason for choosing this topic is personal interest.

It is my experience of working with X that has driven this research.

This project was conceived during my time working for X. As a medical advisor, I witnessed …

* sometimes found in the humanities, and the applied human sciences

Outlining the structure

This paper begins by … It will then go on to …

The first section of this paper will examine…

My thesis is composed of four themed chapters.

The essay has been organised in the following way.

The remaining part of the paper proceeds as follows: …

The main issues addressed in this paper are: a), b) and c).

This paper first gives a brief overview of the recent history of X.

This paper has been divided into four parts. The first part deals with …

The overall structure of the study takes the form of six chapters, including …

Chapter Two begins by laying out the theoretical dimensions of the research, and looks at how …

The third chapter is concerned with the methodology used for this study.

The fourth section presents the findings of the research, focusing on the three key themes that …

Chapter 6 analyses the results of interviews and focus group discussions undertaken during …

Explaining Keywords (refer to Defining Terms)

Throughout this paper, the term X will refer to …

According to Smith (2002), X can be defined as follows: ‘ … ’

In this article, the abbreviation XYZ will be used to refer to …

Throughout this dissertation, the term X will be used to refer to …

The term X is a relatively new name for …, commonly referred to as …

While a variety of definitions of the term X have been suggested, this paper will use the definition first suggested by Smith (1968) who saw it as …



저작자 표시 비영리 변경 금지

'Writing a paper' 카테고리의 다른 글

논문작성: 서론  (0) 2016.09.27

First let me say that I am a huge huge fan of Freesurfer.  It makes my life easier in so many ways by 1) creating surfaces that we can display fMRI results on; 2) giving beautiful cortical and subcortical segmentations for use in the upcoming (soon, really) pediatric Atlas that I’ve been working on; 3) useful for measures of brain volume and cortical thickness.  If I had two small complaints about Freesurfer, it’s exactly what you would expect: It’s slow (24-hours per subject on my SUPER Mac) and the Group Analysis tools aren’t always easy to interact with.

Well I’ve posted previously about how to run Freesurfer jobs in parallel so that if you have an 8-core Mac you can process 8 subjects simultaneously.  Today I’m going to show you how you can use AFNI’s tools, like 3dttest++, to get the same information out of cortical thickness measures as you do using the Freesurfer tools!  That’s right, by the end you will see how the two softwares give you identical results like these (forgive the tilt and colors being a bit off):

Let’s start by saying you obviously need AFNI and Freesurfer installed on your system.  I also find it very useful to make a SUMA folder for your fsaverage subject.  This will come in handy later for visualizing the results in SUMA:

cd $SUBJECTS_DIR/fsaverage

@SUMA_Make_Spec_FS -sid fsaverage

You’ll also need to process all of your participants through the Freesurfer pipeline, preferably with the -qcache option added onto the end:

cd /path/to/subjects/datafiles

for aSubject in Subject01 Subject02 Subject03


    recon-all -s $aSubject -i $aSubject/inputNifti.nii.gz -all -qcache


Of course you can batch these using Parallel if you want to speed things up.  Once you complete all of the processing, you will find that each subject has their Surface data stored in the aptly named ‘surf’ folder.  If you are doing cortical thickness measurements, you’ll want to locate the lh.thickness and rh.thickness files, which are the unsmoothed thickness measurements.  If you used @SUMA_Make_Spec, these are converted to GIFTI datasets by the script, and of course there are the std.141.?h.thickness.niml.dset files representing the standard mesh, again without any blurring.

At this point you have a choice, you can use AFNI/SUMA’s SurfSmooth to smooth your thickness files, or you can instead locate the files that Freesurfer has been kind enough to both resample to the fsaverage brain with different levels of smoothness and use those instead!  Those files are called lh.thickness.fwhm10.fsaverage.mgh and rh.thickness.fwhm10.fsaverage (where fwhm ranges from 0 to 25 in increments of 5).  If we want to convert these to GIFTI datasets that AFNI/SUMA can use, we simply need to use a built in Freesurfer tool (the same one used by @SUMA_Make_Spec_FS) to convert the files, called mris_convert.

mris_convert -c ./Subject01/surf/lh.thickness.fwhm10.fsaverage.mgh \

$SUBJECTS_DIR/fsaverage/surf/lh.white \


If you do this for each hemisphere and each subject, you will end up with a folder full of thickness files already smoothed to some FWHM. You can then use ANY of the AFNI tools to perform group analysis!

3dttest++ -prefix lh.Group1_vs_Group2.gii  \

-setA Group1/*.lh*.gii \

-setB Group2/*.lh*.gii

You can then view the results in SUMA:

suma -spec /usr/local/freesurfer/subjects/fsaverage/SUMA/fsaverage_lh.spec

And then load up the surface controller and Load Dset for the output of your t-test, correlation, mixed model, the sky is the limit.  And as you can see from the above figure, the results coming out of AFNI’s tools are nearly identical to those processed directly in Freesurfer’s mri_glmfit (or qdec).

If you’re wondering why you might want to go through this effort to get identical output, beyond just the ease of using AFNI tools and the speed improvement, I’ll remind you that you can use any AFNI tool now with your Freesurfer data including 3dMVM and 3dLME!


Hello AFNI/SUMA gurus! 
I wanted to know if there was a suggested or validated way to use a combination of suma and afni (with assistance from R perhaps) to conduct a cortical thickness analysis of Freesurfer derived segmentations yielding vertex-wise cortical thickness. My specific problem is that I want to conduct a vertex-wise RM-ANOVA on cortical thicknesses and neither Freesurfer's longitudinal analysis nor surfstat offers this type of analysis. 
Thank you for your advice, 

Hi Tim, 

So you want to run 3dANOVA2 or 3 on the cortical thickness 
measures across surface nodes? If so, those programs should 
take .niml.dset formatted thickness measurement files directly. 
If they are on standard mesh surfaces, they should be ready to go. 

- rick

One thing to consider is that the thickness.niml.dset files that @SUMA_Make_Spec_FS outputs are not smoothed. I believe you could use SurfSmooth to fix this. Alternatively, within the Freesurfer subject's surf directory there are files that are already smoothed and resampled to the fsaverage. To use these within the AFNI toolchain, you can use mris_convert: 

mris_convert -c ./Subject01/surf/lh.thickness.fwhm10.fsaverage.mgh \

$SUBJECTS_DIR/fsaverage/surf/lh.white \


And then use any of the AFNI tools (like 3dANOVA2/3) on the resulting GIFTI files and view the results on fsaverage: 

suma -spec /usr/local/freesurfer/subjects/fsaverage/SUMA/fsaverage_lh.spec

In my tests, the statistics on these converted datasets give you identical results to those in Freesurfer's own tools. 

It turns out I have a related follow-up question. I just started using suma this last week. How would I make the average volume spec file to load the results onto? What files in the fsaverage directory should I use? 

If you use the method I talked about with mris_convert, then you can use the fsaverage as your spec to overlay results onto.  

cd $SUBJECTS_DIR/fsaverage

@SUMA_Make_Spec_FS -sid fsaverage

suma -spec $SUBJECTS_DIR/fsaverage/SUMA/fsaverage_both.spec

This will give you SUMA displays that look almost identical to the Freesurfer views (like in qdec or Freeview or tksurfer).

Hi Peter, 
Thanks again for your help. I am still running into issues as I cannot seem to open the .gii file generated as a -bucket file from the 3dANOVA in suma. Do I need to convert it first or am I missing a fundamental command to view gii files on the fsaverage I generated? 

Hi Tim, 

Just to make sure, you're getting all of the Left Hemisphere files and running an ANOVA on those and then the same steps for the right hemisphere? You should then have two outputs from 3dANOVA, one being something like lh.bucket.gii and the other being rh.bucket.gii.  

Once you open SUMA with the fsaverage_both.spec (or either the fsaverage_lh.spec/fsaverage_right.spec), right click on the brain and press Control+s to open the Surface controller to select your bucket file. You may have to change the default wildcards in the open dialogue (I would change it from the defaults to *), then you should be able to just open your output GIFTI (.gii) file from 3dANOVA. Select the appropriate sub-bricks as you would in AFNI and adjust the slider.  

If you're still confused, maybe tell us your 3dANOVA command and the steps you're using! 


Hi Peter, 
Thanks again. It was the opening dialog box. it would only detect .dset files so I just was able to type in the resulting ANOVA file name and it opened just fine. 





저작자 표시 비영리 변경 금지

'Neuroimaging' 카테고리의 다른 글

Freesurfer Cortical Thickness Analysis with AFNI/SUMA tools  (0) 2016.09.26
Afni, FreeSurfer Installation  (0) 2016.09.26

# AFNI Installation

Here are the steps:

  1. Install XQuartz from xquartz.org (Allows GUIs to run from Unix shells; the "X" symbol that pops up in your dock when you first run AFNI)

  2. Install XCode from the Apple Store

  3. Install Homebrew (a package manager for Mac) using the following command: 

    For bash: 

    ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" 

    For tcsh: 

    curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install | ruby

  4. Use homebrew to get the following: 

    The GNU Compiler Collection (GCC) with: 

    brew install gcc --with-all-languages --without-multilib 

    Pyqt, which you'll need for the .py scripts in AFNI: 

    brew install pyqt 

    GLib, low-level libraries that take care of the little things under the hood: 

    brew install glib

  5. Link libgomp to the correct location using the following: 

    ln -s /usr/local/Cellar/gcc/5.3.0/lib/gcc/5/libgomp.1.dylib /usr/local/lib/libgomp 

    Note that the version is continually being updated, so this command may change; for example, replace the 5.2.0 part with 5.3.0. Check the path to /usr/local/Cellar/gcc to see if the path exists. If you link the wrong path, rerun the command with the correct path and the -sf option.

  6. Download the latest AFNI package here, or type the following into your terminal (assuming you are installing version 10.7; replace with whatever version you want to download):  


mkdir abin

curl -O http://afni.nimh.nih.gov/pub/dist/tgz/macosx_10.7_Intel_64.tgz tar -xzf macosx_10.7_Intel_64.tgz 

mv macosx_10.7_Intel_64 abin

rm macosx_10.7_Intel_64.tgz

Paste the following commands from the AFNI install webpage into your terminal (for tcsh shell):

echo 'set path = (/usr/local/bin $path $HOME/abin)' >> .cshrc

echo 'setenv DYLD_FALLBACK_LIBRARY_PATH $HOME/abin' >> .cshrc

echo 'setenv PYTHONPATH /usr/local/lib/python2.y/site-packages' >> .cshrc

source .cshrc


# FreeSurefer Installation

FreeSurfer System Requirements

Operating System: Linux, Mac OS X, Windows (via VirtualBox)

Processor Speed: 2GHz at least

RAM: 8GB recommended (minimum)

Graphics card: 3D graphics card with its own graphics memory & accelerated OpenGL drivers

Size of installation package: 8.5GB

Typical size of a processed subject: 370MB

Tutorial dataset size: 18GB

Other requirements: Matlab (only needed to run FS-FAST, the fMRI analysis stream)


Linux: Installing FreeSurfer on Linux systems involves simply extracting the contents of the .tar.gz file somewhere on your machine. Installing into the directory /usr/local is recommended. For example:

$> tar -C /usr/local -xzvf freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gz

Mac: Installing FreeSurfer on Mac systems involves simply double clicking the .dmg file and clicking thru the steps. The default installation location is in the /Applications directory. For more detailed instructions, please see the following step-by-step.

Setup & Configuration

To begin using FreeSurfer, you need to open a terminal window and define and environment variable called FREESURFER_HOME which is set to the location FreeSurfer was installed, and then source the setup script. Sourcing FreeSurfer needs to be done every time you open a new terminal window. Or, you can add the two lines below to your default setup file (.bashrc or .cshrc) and FreeSurfer will be sourced automatically everytime you open a new window.


# bash

> export FREESURFER_HOME=/usr/local/freesurfer

> source $FREESURFER_HOME/SetUpFreeSurfer.sh

# tcsh

> setenv FREESURFER_HOME /usr/local/freesurfer

> source $FREESURFER_HOME/SetUpFreeSurfer.csh


> export FREESURFER_HOME=/Applications/freesurfer

> source $FREESURFER_HOME/SetUpFreeSurfer.sh

If done correctly, you should see output similar to this:

Setting up environment for FreeSurfer/FS-FAST (and FSL)

FREESURFER_HOME /usr/local/freesurfer

FSFAST_HOME     /usr/local/freesurfer/fsfast


SUBJECTS_DIR    /usr/local/freesurfer/subjects

MNI_DIR         /usr/local/freesurfer/mni


A license key must be obtained to make the FreeSurfer tools operational. Obtaining a license is free and comes in the form of a license.txt file. Once you obtain the license.txt key file, copy it to your FreeSurfer installation directory. This is also the location defined by the FREESURFER_HOME environment variable.

Follow this link to obtain a license key.

Test your FreeSurfer Installation

FreeSurfer comes with two sample data files (sample-001.mgz and sample-002.mgz) as well as a fully recon-ed subject named bert. These data files can be used to test that your FreeSurfer installation was done properly. To test your installation, please try the following examples:

Example 1: Convert the sample-001.mgz to nifti format.

> cd $FREESURFER_HOME/subjects

> mri_convert sample-001.mgz sample-001.nii.gz


reading from sample-001.mgz...

TR=7.25, TE=3.22, TI=600.00, flip angle=7.00

i_ras = (-0, -1, -0)

j_ras = (-0, 0, -1)

k_ras = (-1, 0, 0)

writing to sample-001.nii.gz...

Example 2: View the output volumes, surfaces and subcortical segmentation of fully recon-ed subject bert.

> cd $FREESURFER_HOME/subjects

> freeview -v \

    bert/mri/T1.mgz \

    bert/mri/wm.mgz \

    bert/mri/brainmask.mgz \

    bert/mri/aseg.mgz:colormap=lut:opacity=0.2 \

    -f \

    bert/surf/lh.white:edgecolor=blue \

    bert/surf/lh.pial:edgecolor=red \

    bert/surf/rh.white:edgecolor=blue \


The freeview above command will open the freeview GUI and should look similar to the image below.

저작자 표시 비영리 변경 금지

'Neuroimaging' 카테고리의 다른 글

Freesurfer Cortical Thickness Analysis with AFNI/SUMA tools  (0) 2016.09.26
Afni, FreeSurfer Installation  (0) 2016.09.26

. use http://www.ats.ucla.edu/stat/stata/dae/manova, clear

. summarize difficulty useful importance

    Variable |        Obs        Mean    Std. Dev.       Min        Max


  difficulty |         33    5.715152    2.017598        2.4      10.25

      useful |         33     16.3303    3.292461       11.9       24.3

  importance |         33    6.475758    3.985131         .2       18.8

. tabulate group

      group |      Freq.     Percent        Cum.


  treatment |         11       33.33       33.33

  control_1 |         11       33.33       66.67

  control_2 |         11       33.33      100.00


      Total |         33      100.00

. tabstat difficulty useful importance, by(group)

Summary statistics: mean

  by categories of: group 

    group |  diffic~y    useful  import~e


treatment |  6.190909  18.11818  8.681818

control_1 |  5.581818  15.52727  5.109091

control_2 |  5.372727  15.34545  5.636364


    Total |  5.715152   16.3303  6.475758


. correlate useful difficulty importance


             |   useful diffic~y import~e


      useful |   1.0000

  difficulty |   0.0978   1.0000

  importance |  -0.3411   0.1978   1.0000

. manova difficulty useful importance = group

                       Number of obs =         33

                       W = Wilks' lambda      L = Lawley-Hotelling trace

                       P = Pillai's trace     R = Roy's largest root

              Source | Statistic        df    F(df1,     df2) =   F   Prob>F


               group |W   0.5258         2      6.0     56.0     3.54 0.0049 e

                     |P   0.4767                6.0     58.0     3.02 0.0122 a

                     |L   0.8972                6.0     54.0     4.04 0.0021 a

                     |R   0.8920                3.0     29.0     8.62 0.0003 u


            Residual |                  30


               Total |                  32


                       e = exact, a = approximate, u = upper bound on F

manova를 통하여 전체적인 그룹간 차이가 있다는 것을 알게 되었다. 이 때 어느 항목에서 차이가 있는지에 대한 post hoc test를 시행해야 한다. 이를 위하여 manova, showorder 커맨드로 각 집단의 순번을 일단 정해준다. 

. manovatest, showorder

 Order of columns in the design matrix
      1: (group==1)
      2: (group==2)
      3: (group==3)
      4: _cons

. matrix c1=(2,-1,-1,0)

. manovatest, test(c1)

 Test constraint
 (1)    2*1.group - 2.group - 3.group = 0

                       W = Wilks' lambda      L = Lawley-Hotelling trace
                       P = Pillai's trace     R = Roy's largest root

              Source | Statistic        df    F(df1,     df2) =   F   Prob>F
          manovatest |W   0.5290         1      3.0     28.0     8.31 0.0004 e
                     |P   0.4710                3.0     28.0     8.31 0.0004 e
                     |L   0.8904                3.0     28.0     8.31 0.0004 e
                     |R   0.8904                3.0     28.0     8.31 0.0004 e
            Residual |                  30
                       e = exact, a = approximate, u = upper bound on F

위의 테스트에서는 치료군인 group 1 과 나머지 두 집단과의 비교를 시행한 것이다. matrix롤 순번을 정해주고 manovatest로 분석을 실시한다. 결과에 유의한 차이가 있음을 알 수 있다. 아래는 대조군 1과 대조군 2를 비교한 것이다. 유의한 차이가 없음을 알 수 있다. 

. matrix c2=(0,1,-1,0)

. manovatest, test(c2)

 Test constraint
 (1)    2.group - 3.group = 0

                       W = Wilks' lambda      L = Lawley-Hotelling trace
                       P = Pillai's trace     R = Roy's largest root

              Source | Statistic        df    F(df1,     df2) =   F   Prob>F
          manovatest |W   0.9932         1      3.0     28.0     0.06 0.9785 e
                     |P   0.0068                3.0     28.0     0.06 0.9785 e
                     |L   0.0068                3.0     28.0     0.06 0.9785 e
                     |R   0.0068                3.0     28.0     0.06 0.9785 e
            Residual |                  30
                       e = exact, a = approximate, u = upper bound on F

이제 margins 커맨드를 이용하여 각 집단의 adjusted predicted values를 구해본다. 살펴보고자 하는 각 변수마다 다 시행해준다. 

. margins group, predict(equation(difficulty))

Adjusted predictions                            Number of obs     =         33

Expression   : Linear prediction, predict(equation(difficulty))


             |            Delta-method

             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]


       group |

  treatment  |   6.190909   .6186184    10.01   0.000     4.927522    7.454296

  control_1  |   5.581818   .6186184     9.02   0.000     4.318431    6.845206

  control_2  |   5.372727   .6186184     8.69   0.000      4.10934    6.636115


. margins group, predict(equation(useful))

Adjusted predictions                            Number of obs     =         33

Expression   : Linear prediction, predict(equation(useful))


             |            Delta-method

             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]


       group |

  treatment  |   18.11818   .9438243    19.20   0.000     16.19064    20.04573

  control_1  |   15.52727   .9438243    16.45   0.000     13.59973    17.45482

  control_2  |   15.34545   .9438243    16.26   0.000     13.41791      17.273


. margins group, predict(equation(importance))

Adjusted predictions                            Number of obs     =         33

Expression   : Linear prediction, predict(equation(importance))


             |            Delta-method

             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]


       group |

  treatment  |   8.681818   1.136676     7.64   0.000     6.360415    11.00322

  control_1  |   5.109091   1.136676     4.49   0.000     2.787688    7.430494

  control_2  |   5.636364   1.136676     4.96   0.000     3.314961    7.957766


위의 추정값을 살펴볼 때 대조군 1,2는 서로간에 유사하다는 것을 알 수 있고 치료집단은 이들과 수치가 다르다는 것을 알 수 있다. 아래는 치료군과 비교하여 대조군이 유의한 차이가 있는지를 변수별로 살펴본 것이다. 

. margins, dydx(group) predict(equation(difficulty))

Conditional marginal effects                    Number of obs     =         33

Expression   : Linear prediction, predict(equation(difficulty))

dy/dx w.r.t. : 2.group 3.group


             |            Delta-method

             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]


       group |

  control_1  |  -.6090908   .8748585    -0.70   0.492     -2.39579    1.177609

  control_2  |  -.8181818   .8748585    -0.94   0.357    -2.604881    .9685176


Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(group) predict(equation(useful))

Conditional marginal effects                    Number of obs     =         33

Expression   : Linear prediction, predict(equation(useful))

dy/dx w.r.t. : 2.group 3.group


             |            Delta-method

             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]


       group |

  control_1  |  -2.590909   1.334769    -1.94   0.062    -5.316871    .1350535

  control_2  |  -2.772727   1.334769    -2.08   0.046     -5.49869    -.046765


Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(group) predict(equation(importance))

Conditional marginal effects                    Number of obs     =         33

Expression   : Linear prediction, predict(equation(importance))

dy/dx w.r.t. : 2.group 3.group


             |            Delta-method

             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]


       group |

  control_1  |  -3.572727   1.607503    -2.22   0.034    -6.855686    -.289768

  control_2  |  -3.045454   1.607503    -1.89   0.068    -6.328414    .2375048


Note: dy/dx for factor levels is the discrete change from the base level.

. foreach vname in difficulty useful importance {

  2.   anova `vname' group

  3. }

                         Number of obs =         33    R-squared     =  0.0305

                         Root MSE      =    2.05173    Adj R-squared = -0.0341

                  Source | Partial SS         df         MS        F    Prob>F


                   Model |  3.9751512          2   1.9875756      0.47  0.6282


                   group |  3.9751512          2   1.9875756      0.47  0.6282


                Residual |  126.28728         30   4.2095759  


                   Total |  130.26243         32   4.0707009  

                         Number of obs =         33    R-squared     =  0.1526

                         Root MSE      =    3.13031    Adj R-squared =  0.0961

                  Source | Partial SS         df         MS        F    Prob>F


                   Model |  52.924238          2   26.462119      2.70  0.0835


                   group |  52.924238          2   26.462119      2.70  0.0835


                Residual |  293.96544         30   9.7988481  


                   Total |  346.88968         32   10.840303  

                         Number of obs =         33    R-squared     =  0.1610

                         Root MSE      =    3.76993    Adj R-squared =  0.1051

                  Source | Partial SS         df         MS        F    Prob>F


                   Model |  81.829694          2   40.914847      2.88  0.0718


                   group |  81.829694          2   40.914847      2.88  0.0718


                Residual |   426.3709         30   14.212363  


                   Total |  508.20059         32   15.881268  

저작자 표시 비영리 변경 금지

Regression with Random Intercepts

xtmixed 명령어를 위해서 2004년 미국 대선의 지역분석 데이터셋을 살펴보자. 이 선거에서 조지부쉬는 50.7%의 득표율로 당선이 되었고 존케리는 48.3% 그리고 랄프네이더는 0.4%를 득표했다. 이 선거결과에서 놀라운 점은 선거결과의 지역 분포다. 케리는 서부해안과 북동부 그리고 오대호주변에서 승리를 했고 부쉬는 다른 모든 지역을 다 가져갔다. 부쉬는 대부분의 시골지역에서 승리하였고 케리는 도시지역에서 승리하였다. 이 데이터셋은 2004년 대선결과 데이터이며 관련된 다른 요인들의 데이터를 포함하고 있다. 데이터셋은 9개 지역으로 구분되어있다 (cendiv). 총투표수 (votes) 부쉬의 투표율 (bush) 해당지역이 얼마나 시골에 가까운지를 보여주는 인구밀도의 로그변환 (logdens) 그리고 소수민족의 분포 (minority)와 대학진학율 (colled)가 있다. 

. use "/Users/leetaey/Desktop/sws12/election_2004i.dta"

(US counties -- 2004 election (Robinson 2005))

. describe

Contains data from /Users/leetaey/Desktop/sws12/election_2004i.dta

  obs:         3,054                          US counties -- 2004 election (Robinson 2005)

 vars:            11                          2 Jul 2012 06:11

 size:       219,888                          

----------------------------------------------------------------------------------------------------------------              storage   display    value

variable name   type    format     label      variable label


fips            long    %9.0g                 FIPS code

state           str20   %20s                  State name

state2          str2    %9s                   State 2-letter abbreviation

region          byte    %9.0g      region     Region (4)

cendiv          byte    %15.0g     division   Census division (9)

county          str24   %24s                  County name

votes           float   %9.0g                 Total # of votes cast, 2004

bush            float   %9.0g                 % votes for GW Bush, 2004

logdens         float   %9.0g                 log10(people per square mile)

minority        float   %9.0g                 % population minority

colled          float   %9.0g                 % adults >25 w/4+ years college


Sorted by: fips

. graph twoway scatter bush logdens, msymbol(Oh) || lfit bush logdens, lwidth(medthick)

위 결과를 살펴볼 때 인구밀도가 높을 수록 (도시지역에 가까울 수록) 부쉬의 지지율이 떨어지는 것을 알 수 있다. 

. graph twoway scatter bush logdens [fw=votes], msymbol(Oh) || lfit bush logdens, lwidth(medthick) || , xlabel(-1 "0.1" 0 "1" 1 "10" 2 "100" 3 "1,000" 4 "10,000", grid) legend(off) xtitle("Population per square mile") ytitle("Percent vote for GW Bush")

해당지역의 투표수 즉 인구의 숫자만큼 weight를 준 그래프다.

이러한 패턴을  통계적으로 입증하기 위하여 회귀분석을 시행한다. 이 때 대학진학율과 소수민족 비율 등도 함께 포함해보았다. 

. regress bush logdens minority colled

      Source |       SS           df       MS      Number of obs   =     3,041

-------------+----------------------------------   F(3, 3037)      =    345.39

       Model |  122345.617         3  40781.8725   Prob > F        =    0.0000

    Residual |  358593.826     3,037  118.075017   R-squared       =    0.2544

-------------+----------------------------------   Adj R-squared   =    0.2537

       Total |  480939.443     3,040  158.203764   Root MSE        =    10.866


        bush |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]


     logdens |  -5.457462   .3031091   -18.00   0.000    -6.051781   -4.863142

    minority |   -.251151   .0125261   -20.05   0.000    -.2757115   -.2265905

      colled |  -.1811345   .0334151    -5.42   0.000     -.246653    -.115616

       _cons |   75.78636   .5739508   132.04   0.000     74.66099    76.91173


분석결과 모든 요인들이 다 유의하게 결과에 영향을 미치고 있음을 알 수 있다. 아래는 고정효과만 포함하여 혼합효과분석을 실시하였다. 

. xtmixed bush logdens minority colled

Mixed-effects ML regression                     Number of obs     =      3,041

                                                Wald chi2(3)      =    1037.53

Log likelihood = -11567.783                     Prob > chi2       =     0.0000


        bush |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]


     logdens |  -5.457462   .3029097   -18.02   0.000    -6.051154    -4.86377

    minority |   -.251151   .0125179   -20.06   0.000    -.2756856   -.2266164

      colled |  -.1811345   .0333931    -5.42   0.000    -.2465838   -.1156852

       _cons |   75.78636   .5735732   132.13   0.000     74.66217    76.91054



  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]


                sd(Residual) |   10.85908   .1392419      10.58958    11.13545


Maximum likelihood (ML)이 혼합분석에서 기본모델로 설정되어있다. 필요에 따라서 Restricted maximum likelihood (REML)을 모델로 선택할 수도 있다. 하지만 이러한 고정효과모형 분석에서는 지지율의 지형적 요인을 살펴보는데 한계가 있다. 따라서 이러한 요인을 살펴보기 위해서는 일반적인 고정효과모형이 아닌 모델이 필요하다. 

. xtmixed bush logdens minority colled || cendiv:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood =  -11339.79  

Iteration 1:   log likelihood =  -11339.79  (backed up)

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =      3,041

Group variable: cendiv                          Number of groups  =          9

                                                Obs per group:

                                                              min =         67

                                                              avg =      337.9

                                                              max =        616

                                                Wald chi2(3)      =    1161.96

Log likelihood =  -11339.79                     Prob > chi2       =     0.0000


        bush |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]


     logdens |   -4.52417   .3621775   -12.49   0.000    -5.234025   -3.814316

    minority |  -.3645394   .0129918   -28.06   0.000    -.3900029   -.3390758

      colled |  -.0583942   .0357717    -1.63   0.103    -.1285053     .011717

       _cons |   72.09305   2.294039    31.43   0.000     67.59682    76.58929



  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]


cendiv: Identity             |

                   sd(_cons) |   6.617135   1.600467      4.119006    10.63035


                sd(Residual) |   10.00339   .1284657      9.754742    10.25837


LR test vs. linear model: chibar2(01) = 455.99        Prob >= chibar2 = 0.0000

xtmixed는 랜던효과를 직접 살펴볼 수 없다. 따라서 우리는 이 경우 xtmixed 분석 뒤 predict를 이용하여 선형적으로 이를 구할 수 있다. 

. predict randint0, reffects

. graph hbar (mean) randint0, over(cendiv) ytitle("Random intercepts by census division")

저작자 표시 비영리 변경 금지

Sensitivity Analysis 

  • Generally, an assessment of how systematic or random errors affect an effect estimates' representativeness of the actual effect (the validity of the effect estimate).

  • Misclassification error is a primary inhibitor of validity and can be difficult to correct for.

  • Executed by adjusting model parameters over a reasonable range, and observing the results.

Sensitivity Analysis Applied to Misclassification: Example

  • You: "The relative risk (RR) of coronary heart disease (CHD) for second-hand (passive) smoke exposure in non-smokers is between 1.15 and 1.3"

  • They: "Maybe some of your 'non-smokers' are actually smokers, and your RR is too high."

  • You: “I can do a sensitivity analysis. Assume 5% of my exposed ‘non-smoker’ cases are just misclassified smokers. In that case, the effect of CHD on active smokers would have to exhibit a RR of 7.0 in order to entirely account for the difference in RR. But, since the RR of CHD in smokers is 2.0, you’re wrong.”
  • They: “Try 10%.”

. episensi 610 610 410 640, study(cs) dseca(c(1)) dspca(c(1)) dsenc(c(1)) dspnc(c(1)) 

Se|Cases   : Constant(1)

Sp|Cases   : Constant(1)

Se|No-Cases: Constant(1)

Sp|No-Cases: Constant(1)

Observed Risk Ratio [95% Conf. Interval]= 1.23 [1.14, 1.32]

Deterministic sensitivity analysis for misclassification of the exposure

   External adjusted Risk Ratio = 1.23

   Percent bias =  -0%

. episensi 610 610 410 640, study(cs) dseca(c(1)) dspca(c(.95)) dsenc(c(1)) dspnc(c(1)) 

Se|Cases   : Constant(1)

Sp|Cases   : Constant(.95)

Se|No-Cases: Constant(1)

Sp|No-Cases: Constant(1)

Observed Risk Ratio [95% Conf. Interval]= 1.23 [1.14, 1.32]

Deterministic sensitivity analysis for misclassification of the exposure

   External adjusted Risk Ratio = 1.17

   Percent bias =   5%

. episensi 610 610 410 640, study(cs) dseca(c(1)) dspca(c(.90)) dsenc(c(1)) dspnc(c(1)) 

Se|Cases   : Constant(1)

Sp|Cases   : Constant(.9)

Se|No-Cases: Constant(1)

Sp|No-Cases: Constant(1)

Observed Risk Ratio [95% Conf. Interval]= 1.23 [1.14, 1.32]

Deterministic sensitivity analysis for misclassification of the exposure

   External adjusted Risk Ratio = 1.11

   Percent bias =  11%

Sensitivity Analysis Applied to Vaccine Effectiveness: Example


  • A “quantitative approach for systematically assessing the results of previous research in order to arrive at conclusions about the body of research (Petitti)”. 

  • Unit of analysis is the study, rather than a group or individual. Study selection is similar to the selection of subjects in a study. 

  • In a meta-analysis study of the relationship of major depression to socioeconomic class, only 51 studies were chosen out of a 743 found.

. input ecase econtrol ucase ucontrol

         ecase   econtrol      ucase   ucontrol
  1. 610 610 410 640
  2. 620 615 400 680
  3. 330 340 195 290
  4. 1222 1195 831 1201
  5. end

. metan ecase econtrol ucase ucontrol

           Study     |     RR    [95% Conf. Interval]     % Weight
1                    |  1.280       1.165     1.407         22.07
2                    |  1.355       1.232     1.491         21.38
3                    |  1.225       1.072     1.399         11.33
4                    |  1.236       1.158     1.320         45.22
M-H pooled RR        |  1.270       1.215     1.328        100.00

  Heterogeneity chi-squared =   2.75 (d.f. = 3) p = 0.433
  I-squared (variation in RR attributable to heterogeneity) =   0.0%

  Test of RR=1 : z=  10.61 p = 0.000

Meta-analysis Styles

  • Some prefer the Mantel-Haenszel method of weighting study estimate results by the power of the study (the “fixed-effects model”). 
  • Others like a “random-effects model”, which takes into account both within-study variance and cross-study variance. 
  • The random-effects model is more conservative. The real difference is in the generalizability: fixed-effects is limited to the included studies, while random-effects can be applied to a hypothetical “population of studies.” 
  • Neither model is advisable when the direction of the studies is not consistent. 
  • Still some argument about the effectiveness of meta-analysis, given the differences in participant selection and data collection methods across studies. 
  • EPI 810 tip: Prospective meta-analysis can be used as an agreement across research groups to avoid some of these pitfalls.

Szklo and Nieto's Epidemiology: Beyond the Basics, EPI 811 Lecture Note.

저작자 표시 비영리 변경 금지

+ Recent posts

티스토리 툴바