This is an old revision of the document!
Table of Contents
Perl Exercises
read_and_write
Read from an input file and write the content to an output file. Assume the two filenames are provided in the command line.
Hint: you need to know about filehandle before attempting this exercise.
$ cat input.txt George Washington John Adams Thomas Jefferson James Madison James Monroe $ perl read_and_write.pl input.txt output.txt $ cat output.txt George Washington John Adams Thomas Jefferson James Madison James Monroe
See a sample answer here.
count_first_name
Read an input file that contains several names (assuming one name in each line, the first name and the last name are separate by a space). Count how the number of times each first name appeared in the input. Print the result to STDOUT, sort the names alphabetically.
Hint: you need to know about filehandle, array, hash, and sorting before attempting this exercise.
$ cat input.txt George Washington John Adams Thomas Jefferson James Madison James Monroe $ perl count_first_name.pl input.txt George appeared 1 times James appeared 2 times John appeared 1 times Thomas appeared 1 times
See a sample answer here.
unwrap_fasta
Read a fasta file, unwrap the sequences (i.e., remove all extra line breaks), and save the result to an output file.
Hint: learn about the $/
variable (input record separator) before attempting this exercise.
cat coding.fasta >NP_414542 ATGAAACGCATTAGCACCACCATT accaccaccatcaccattaccacaggta ACGGTGCGGGCTGA >NP_414617 ATGACTCACATCGTTCGCTTTA TCGGTCTACTA ctactaaacgcatcttctttgcgcggta GACGAGTGAGCGGCATCCAGCATTAA $ perl unwrap_fasta.pl coding.fasta unwrap.fasta $ cat unwrap.fasta >NP_414542 ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA >NP_414617 ATGACTCACATCGTTCGCTTTATCGGTCTACTACTACTAAACGCATCTTCTTTGCGCGGTAGACGAGTGAGCGGCATCCAGCATTAA
See a sample answer here.
display_codon
Read a fasta file that contain protein-coding sequences. Re-format the sequences to show codons (10 codons per line) in the output file.
Hint: learn about the substr
function or regular expression before attempting this exercise.
$ cat coding.fasta >NP_414542 ATGAAACGCATTAGCACCACCATT accaccaccatcaccattaccacaggta ACGGTGCGGGCTGA >NP_414617 ATGACTCACATCGTTCGCTTTA TCGGTCTACTA ctactaaacgcatcttctttgcgcggta GACGAGTGAGCGGCATCCAGCATTAA $ perl display_codon_1.pl coding.fasta codon.fasta $ cat codon.fasta >NP_414542 ATG AAA CGC ATT AGC ACC ACC ATT ACC ACC ACC ATC ACC ATT ACC ACA GGT AAC GGT GCG GGC TGA >NP_414617 ATG ACT CAC ATC GTT CGC TTT ATC GGT CTA CTA CTA CTA AAC GCA TCT TCT TTG CGC GGT AGA CGA GTG AGC GGC ATC CAG CAT TAA
count_EcoRI_site
Read a fasta file and count the number of EcoRI restriction sites in each sequence.
Hint: learn about the regular expression before attempting this exercise.
$ cat EcoRI.fasta >Seq_1 nnnGAA TTCnnnGAATTCnnnGaattCnnn >Seq_2 nnnGAATTCnnngAa TtCnnn $ perl count_EcoRI_site.pl EcoRI.fasta Seq_1 has 3 EcoRI sites Seq_2 has 2 EcoRI sites
See a sample answer here.