<?xml version="1.0" encoding="utf-8"?>
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:bib="http://bibtexml.sf.net/" xmlns:q="http://cnx.rice.edu/qml/1.0" id="id35498277" module-id="" cnxml-version="0.6">
<title>Monte Carlo Simulations of Allele Frequencies</title>
<metadata xmlns:md="http://cnx.rice.edu/mdml/0.4">
  <!-- WARNING! The 'metadata' section is read only. Do not edit below.
       Changes to the metadata section in the source will not be saved. -->
  <md:content-id>m13540</md:content-id>
  <md:title>Monte Carlo Simulations of Allele Frequencies</md:title>
  <md:version>1.4</md:version>
  <md:created>2006/03/22 15:11:52 US/Central</md:created>
  <md:revised>2009/03/17 11:06:38.414 GMT-5</md:revised>
  <md:authorlist>
    <md:author id="qnguyen">
        <md:firstname>Quoclinh</md:firstname>
        <md:surname>Nguyen</md:surname>
        <md:fullname>Quoclinh Nguyen</md:fullname>
        <md:email>qnguyen5@ucmerced.edu</md:email>
    </md:author>
    <md:author id="masakatsu_w">
        <md:firstname>Masakatsu</md:firstname>
        <md:surname>Watanabe</md:surname>
        <md:fullname>Masakatsu Watanab e</md:fullname>
        <md:email>mwatanabe@ucmerced.edu</md:email>
    </md:author>
  </md:authorlist>
  <md:maintainerlist>
    <md:maintainer id="qnguyen">
        <md:firstname>Quoclinh</md:firstname>
        <md:surname>Nguyen</md:surname>
        <md:fullname>Quoclinh Nguyen</md:fullname>
        <md:email>qnguyen5@ucmerced.edu</md:email>
    </md:maintainer>
    <md:maintainer id="masakatsu_w">
        <md:firstname>Masakatsu</md:firstname>
        <md:surname>Watanabe</md:surname>
        <md:fullname>Masakatsu Watanab e</md:fullname>
        <md:email>mwatanabe@ucmerced.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  <md:license href="http://creativecommons.org/licenses/by/2.0/"/>
  <md:licensorlist>
    <md:licensor id="qnguyen">
        <md:firstname>Quoclinh</md:firstname>
        <md:surname>Nguyen</md:surname>
        <md:fullname>Quoclinh Nguyen</md:fullname>
        <md:email>qnguyen5@ucmerced.edu</md:email>
    </md:licensor>
    <md:licensor id="masakatsu_w">
        <md:firstname>Masakatsu</md:firstname>
        <md:surname>Watanabe</md:surname>
        <md:fullname>Masakatsu Watanab e</md:fullname>
        <md:email>mwatanabe@ucmerced.edu</md:email>
    </md:licensor>
  </md:licensorlist>
  <md:keywordlist>
    <md:keyword>Allele, computational biology, modeling, python</md:keyword>
  </md:keywordlist>
  <md:subjectlist>
    <md:subject>Science and Technology</md:subject>
  </md:subjectlist>
  <md:abstract>In this assignment you will apply Monte Carlo computer simulations to study the effects of genetic drift and selection pressure on a single locus with two alleles: “A” and “a”.  You will be running these simulations on a computer using a simple simulation program written in Python (see appendix if this is not on your computer).  Note that you do not need to know how to program in Python in order to do this lab.  Each exercise will involve editing one or two lines at the top of the file and then running the simulation for a few minutes and analyzing the output</md:abstract>
  <md:language>en</md:language>
  <!-- WARNING! The 'metadata' section is read only. Do not edit above.
       Changes to the metadata section in the source will not be saved. -->
</metadata>
<content>
<para id="element-16"><term>BIS 1 COMPUTER ASSIGMENT #2</term></para><section id="id35500104">
<title>Monte Carlo Simulations of Allele Frequencies</title>
<para id="id35595284">IMPORTANT: You must generate you own answers
to all questions.</para>
<para id="id35389270">In this assignment you will apply Monte Carlo
computer simulations to study the effects of genetic drift and
selection pressure on a single locus with two alleles: “A” and “a”.
You will be running these simulations on a computer using a simple
simulation program written in Python (see appendix if this is not
on your computer). Note that you do not need to know how to program
in Python in order to do this lab. Each exercise will involve
editing one or two lines at the top of the file and then running
the simulation for a few minutes and analyzing the output.</para>
<para id="id35498003">The simulation software carries out the
simulation shown in the following flowchart. The initial population
is 100% heterozygous (Aa). Each generation the population is
exactly replaced by new individuals and all of the old individuals
die off. In the simulations in Part II, in which some part of the
population is eliminated by selection, the population is restored
to the original population size in the next generation. This might
seem artificial, but it actually is not a bad model of species that
produce a huge number of offspring only a fraction of which survive
to reproduce due to limitations in food supply or space.</para>
<figure id="id35432474"><media id="id16976523" alt=""><image src="bis1-ws3-2.png" mime-type="image/png"/></media>
</figure>
<para id="id35392627">Figure 1 
<emphasis>Flowchart for simulation program. The program will
simulate 100 different populations for each set of conditions you
specify and simulate each population for 125
generations.</emphasis></para>
<para id="id35596292">In most cases the output will be a histogram
showing the distribution of allele A frequencies across the set of
different populations simulated. For example, the following
histogram is the result of simulating 100 separate populations. The
number ranges on x-axis give the range of allele A frequencies and
the length of the bar reflects how many populations had allele A
percentages in this range. So, of the 100 simulated populations, 25
ended up with allele A percentages between 50 and 60 percent, while
just 1 population had an allele A percentage less than 10
percent.</para>
<para id="id35389434">
<figure id="id35595118">
<media id="id5144825" alt=""><image src="Graphic2.png" mime-type="image/png"/></media>
</figure>
</para>
<para id="id35393573">Figure 2 
<emphasis>sample output</emphasis></para>
<para id="id35393230">General instructions for using the simulation
software:</para>
<para id="id35393260">Step 1: First, Go to 
<link url="http://bioinformatics.ucmerced.edu/resources/biological_sciences_1">
http://bioinformatics.ucmerced.edu/resources/biological_sciences_1</link>. Then, download
“Population Allele Simulation file” of Assignment 2 on your
Desktop.</para>
<para id="id35389106">Step 2:Then, start the Python interpreter:
Start::Programs::Python2.4::IDLE(Python GUI). This will start a
Python Shell window. (This location might vary slightly for
different versions of Python.) If you don’t have Python in your
computer, please refer to Appendix 1 of this assignment to install
Python.</para>
<para id="id35500351">Step 3:In the File menu on the Python Shell,
select Open and then navigate to Desktop and open the file called
Population Allele Simulation (which you downloaded in step 2). This
will open a new window showing the simulation program</para>
<para id="id35389598">Note: You may see Python icon with your saved
files, so just double-click the icon to run the program.</para>
<figure id="id35497426">
<media id="id4509245" alt=""><image src="Graphic3.png" mime-type="image/png"/></media>
</figure>
<para id="id35595336">Figure 3 
<emphasis>python icon</emphasis></para>
<para id="element-66">Step 4:Then, two windows will open: console and graphic
user interface (GUI) windows.</para><para id="id35390362"><figure id="id35391046"><media id="id12875047" alt=""><image src="bis1-ws3-1.png" mime-type="image/png"/></media>
	</figure></para>
<para id="id35499830">Figure 4. 
<emphasis>This Python program opens console and graphic user
interface windows.</emphasis></para>
<para id="id35392765">Step 5: Now select your BIS 1 section from
the menu bar.
<emphasis>This is very important. Otherwise, the “Run Simulation”
button in the dotted circle of the above figure cannot be turned
on.</emphasis></para>
<para id="id35389489">Step 6:You can run a simulation by clicking
Run Simulation button from the GUI window. The simulation will now
start running. In the Python console window, you will see a series
of numbers printing out on the first line—this is just an indicator
of the number of populations completed—since we’re running 100
populations, you’ll be done shortly after this number reaches
90.</para>
</section>
<section id="id35497756">
<title>Part I: Modeling Genetic Drift</title>
<para id="id35392841">In this set of experiments, you will be
testing the effect of population size of the genetic drift. This
will involve running a number of simulations with different
population sizes without selection or bottlenecks. Since genetic
drift arises from random fluctuations in allele frequencies, it’s
not surprising that the size of the population is a key parameter
in the rate of genetic drift.</para>
<para id="id35500433">Step 1:Open the Population Allele Simulation
as described above. Check to make sure that the simulation type is
set to be “Drift”.</para>
<para id="id35498028">Step 2: Set the population size to 1000:
Next, run the simulation, recording in the table below, the mean
(Ave) and standard deviation (SD) for each generation are printed
on the Python GUI, for example:</para>
<para id="id35500569">
<code>Number of generation: 125</code>
</para>
<para id="id35390719">
<code>Mean: 0.51</code>
</para>
<para id="id35641580">
<code>Standard deviation: 0.24</code>
</para>
<para id="id35599987">Be sure to record the data for every
generation including the final case (generation 125)
<emphasis>– you can check histograms, Means, and Standard
deviations of the different generations by sliding the “Generation”
scale on the GUI.</emphasis></para>
<para id="id35392259">Step 3:Run three more simulations with
different population sizes (500, 200, 100, 50, and 25) and then
fill in the table below with means and standard deviations of the
allele distributions at each of the generations printed. If you
feel like it, you can run some additional simulations with other
populations, but note that populations over 1000 will start to take
a lot of computer time. Note: After you are done the simulation of
each population size, be sure to save a histogram of each
generation by selecting Print:Save as a Postscript in the menu.
Then, you will have a record of your results and histograms. To
print these postscripts files, please read Appendix 2 or 3.</para>
<table id="id35602203" summary="">
<tgroup cols="13">
<colspec colnum="1" colname="c1"/>
<colspec colnum="2" colname="c2"/>
<colspec colnum="3" colname="c3"/>
<colspec colnum="4" colname="c4"/>
<colspec colnum="5" colname="c5"/>
<colspec colnum="6" colname="c6"/>
<colspec colnum="7" colname="c7"/>
<colspec colnum="8" colname="c8"/>
<colspec colnum="9" colname="c9"/>
<colspec colnum="10" colname="c10"/>
<colspec colnum="11" colname="c11"/>
<colspec colnum="12" colname="c12"/>
<colspec colnum="13" colname="c13"/>
<tbody>
<row>
<entry>Pop</entry>
<entrytbl namest="c2" nameend="c13" cols="6">
<colspec colnum="1" colname="c1"/>
<colspec colnum="2" colname="c2"/>
<colspec colnum="3" colname="c3"/>
<colspec colnum="4" colname="c4"/>
<colspec colnum="5" colname="c5"/>
<colspec colnum="6" colname="c6"/>
<tbody>
<row>
<entry namest="c1" nameend="c6">Generation</entry>
</row>
<row>
<entry>0</entry>
<entry>25</entry>
<entry>50</entry>
<entry>75</entry>
<entry>100</entry>
<entry>125</entry>
</row>
</tbody>
</entrytbl>
</row>
<row>
<entry/>
<entry>Ave</entry>
<entry>SD</entry>
<entry>Ave</entry>
<entry>SD</entry>
<entry>Ave</entry>
<entry>SD</entry>
<entry>Ave</entry>
<entry>SD</entry>
<entry>Ave</entry>
<entry>SD</entry>
<entry>Ave</entry>
<entry>SD</entry>
</row>
<row>
<entry>1000</entry>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry>500</entry>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry>250</entry>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry>100</entry>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry>50</entry>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry>25</entry>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
<row>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
<entry/>
</row>
</tbody>
</tgroup>
</table>
</section>
<section id="id35642596">
<title>Part II. Selection Pressure</title>
<para id="id35642608">In the genetic drift simulations above, all
genotypes (AA, Aa, aA, and aa) had the same probability of
surviving and reproducing. In this set of simulations you will
adjust the “fitness” of different genotypes and observe the effects
on the allele distributions. Note that in this study the population
is set to 500.</para>
<para id="id35642628">Step 1:Start the simulation program using the
same steps you did above.</para>
<para id="id35642644">Step 2: Now set the simulation type to
“Selection”</para>
<para id="id35642657">Step 3:First run a “control” simulation with
all fitnesses set to 1.0 (Default). Record the results in the table
below.</para>
<para id="id35642676">Step 4: Set the survival rates to slightly
disfavor the homozygous recessive genotype aa and record the
results in the table below</para>
<para id="id35642694">
<code># Set survival rates for different genotypes</code>
</para>
<para id="id35642702">
<code>
selection={('A','A'):1.,('A','a'):1.,('a','A'):1.,('a','a'):0.98}</code>
</para>
<para id="id35642711">Also run this simulation with the aa fitness
set to 0.95.</para>
<para id="id35642716">Step 5:Set the survival rates to slightly
disfavor both homozygous genotypes and record the results in the
table below</para>
<para id="id35642737">
<code># Set survival rates for different genotypes</code>
</para>
<para id="id35642746">
<code>
selection={('A','A'):.95,('A','a'):1.,('a','A'):1.,('a','a'):0.95}</code>
</para>
<para id="id35642754">Step 6:Choose four other allele fitness
combinations and run simulations on them and record the results in
the table.</para>
<para id="id35642773">Results table for Selection Simulations (Part
II)</para>
<para id="id35642782">
<table id="id35642802" summary="">
<tgroup cols="3">
<colspec colnum="1" colname="c1"/>
<colspec colnum="2" colname="c2"/>
<colspec colnum="3" colname="c3"/>
<tbody>
<row>
<entry>Allele Fitnesses</entry>
<entry>Allele A frequencyMean</entry>
<entry>Allele A frequencyStandard Deviation</entry>
</row>
<row>
<entry>AA=1; Aa=aA=1; aa=1 (control)</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=1; Aa=aA=1; aa=0.98</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=1; Aa=aA=1; aa=0.95</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=0.95; Aa=aA=1; aa=0.95</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=___; Aa=aA=___; aa=___</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=___; Aa=aA=___; aa=___</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=___; Aa=aA=___; aa=___</entry>
<entry/>
<entry/>
</row>
<row>
<entry>AA=___; Aa=aA=___; aa=___</entry>
<entry/>
<entry/>
</row>
</tbody>
</tgroup>
</table>
</para>
<section id="id35644285">
<title>Appendix 1 – Python</title>
<para id="id35644301">To do this assignment, you need to have
Python in your computer. Python is an interpreted, interactive,
object-oriented programming language. If Python is not installed in
your computer yet, please follow the instructions below to install
Python. The installation of Python is very simple and easy:</para>
<para id="id35644335">1 If you don’t have Python in your computer,
go to 
<link url="http://www.python.org/download/">
http://www.python.org/download/</link></para>
<para id="id35644360">2 Select appropriate Python software for your
computer.</para>
<para id="id35644368">For Windows users, select “Python
2.4.2.</para>
<para id="id35644382">Windows installer (Windows Binary
version)”.</para>
<para id="id35644391">For Mac users, select “Python 2.3 OS X 10.2
installer”.</para>
<para id="id35644406">3 Save the installer file on your local
machine, and then double-click python-2.4.2.msi</para>
<para id="id35644421">for Windows users or Macpython-OSX-2.3-1.dmg
for Mac users.</para>
<para id="id35644434">For Windows users, double-clicking of MSI
file doesn’t work. Go to 
<link url="http://www.python.org/2.4.2/">
http://www.python.org/2.4.2/</link>. Then, follow the instructions
in the section of “Download the release”.</para>
<para id="id35644455">4 Done! You are ready to go.</para>
<para id="id35644465">As a general computer language, Python
combines remarkable programming power with very clear syntax. If
you want to learn more about Python programming, there is a book
called, “How to Think Like a Computer Scientist: Learning with
Python”, available for free on the web. You can go to 
<link url="http://greenteapress.com/thinkpython/html/">
http://greenteapress.com/thinkpython/html/</link>and read this
web-book. It is also available as a pdf file for download at: 
<link url="http://greenteapress.com/thinkpython/thinkCSpy.pdf">
http://greenteapress.com/thinkpython/thinkCSpy.pdf</link>.</para>
</section>
<section id="id35644521">
<title>Appendix 2 – Printing Postscript files @ UC Merced Library
computer labs</title>
<para id="id35644534">In this assignment, you have saved all
histograms in a postscript format. Postscript is a programming
language optimized for printing graphics and text. To print these
files at the UCM library computer lab, please follow these
steps:</para>
<list id="id35644542" list-type="enumerated">
<item>Double-click the postscript file.</item>
<item>Then you may have the following “Open With” window. Now
select “ArcPress extension for Arc/Info” and press OK. (You may not
see this window. If this is a case, go to Step 3)</item>
</list>
<figure id="id35644568">
<media id="id7298277" alt=""><image src="Graphic4.png" mime-type="image/png"/></media>
</figure>
<para id="id35644592">Figure 5 
<emphasis>how to postscript</emphasis></para>
<list id="id35644613" list-type="enumerated">
<item>Then, you will see “Print” windows. Now select “Postscript”
from the Driver menu. Then, press “Print..” to print the
file.</item>
</list>
<para id="id35644633">
<figure id="id35644646"><media id="id4047447" alt=""><image src="bis1-ws3-3.png" mime-type="image/png"/></media>
</figure>

</para>
<para id="id35644864">Figure 6 
<emphasis>how to postscript</emphasis></para>
</section>
<section id="id35644882">
<title>Appendix 3 – Printing and Viewing Postscript files from your
computer</title>
<para id="id35644895">You may need to have Postscript interpreter
in your computer to view or print postscript files at your
computer. The installation of Postscript interpreter is very simple
and easy:</para>
<para id="id35644912">For Windows users,</para>
<list id="id35644917" list-type="enumerated">
<item>Go to 
<link url="http://www.rops.org/download/rops65c.exe">
http://www.rops.org/download/rops65c.exe</link>and save the
executable file on your desktop.</item>
<item>Double-click Rops executable to install the program.</item>
<item>Done! You are ready to go.</item>
</list>
</section><exercise id="element-32"><problem id="id17874875">
		<para id="element-287">
			In Part I, what do your results qualitatively (i.e. non-mathematically) indicate about the rate of genetic drift in different size population?
		</para>
	</problem>

	<solution id="id4008837">
		<para id="element-14">
			Insert Solution Text Here
		</para>
	</solution>
</exercise><exercise id="element-725"><problem id="id17379816">
		<para id="element-901">
			Look at the histograms you saved out in Part I.  Do they seem to follow a Gaussian distribution (i.e. a bell-shaped curve)?
		</para>
	</problem>

	<solution id="id15042907">
		<para id="element-713">
			Insert Solution Text Here
		</para>
	</solution>
</exercise><exercise id="element-551"><problem id="id16892648">
		<para id="element-30">
			On a piece of graph paper (or using Excel on the computer).  Graph the mean and standard deviations in allele frequencies versus the population size.  Does the mean change allele frequency change with the size of the population?  Does the standard deviation?  Is there a simple mathematical relationship you can see?  (Feel free to run more simulations to test this.)
		</para>
	</problem>

	<solution id="id4151024">
		<para id="element-668">
			Insert Solution Text Here
		</para>
	</solution>
</exercise><exercise id="element-512"><problem id="id4136614">
		<para id="element-152">
			In Part II, do your simulation results make intuitive sense? (explain in a few words why or why not)
		</para>
	</problem>

	<solution id="id7371560">
		<para id="element-929">
			Insert Solution Text Here
		</para>
	</solution>
</exercise><exercise id="element-383"><problem id="id4433845">
		<para id="element-45">
			Insert Problem Text Here
		</para>
	</problem>

	<solution id="id4297657">
		<para id="element-670">
			How big an effect was caused by changing the selection pressure against genotype “aa” from 0.99 to 0.95?  How would you expect this to change if you had a much larger population (your simulation was of a population of 500 individuals).
		</para>
	</solution>
</exercise>
</section>
</content>
</document>

