Perl Exercises

read_and_write

Read from an input file and write the content to an output file. Assume the two filenames are provided in the command line.

Hint: learn about filehandle before attempting this exercise.

$ cat input.txt 
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe
$ perl read_and_write.pl input.txt output.txt
$ cat output.txt 
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe

See a sample answer here.

parse_first_name

Read from an input file (containing a list of full names) and write the first names to an output file. Assume the two filenames are provided in the command line.

Hint: learn about filehandle and array (or regular expression) before attempting this exercise.

$ cat input.txt 
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe
$ perl parse_first_name.pl input.txt first_name.txt
$ cat first_name.txt 
George
John
Thomas
James
James

See a sample answer here.

count_first_name

Read an input file that contains several names (assuming one name in each line, the first name and the last name are separate by a space). Count how the number of times each first name appeared in the input. Print the result to STDOUT, sort the names alphabetically.

Hint: learn about filehandle, array, hash, and sorting before attempting this exercise.

$ cat input.txt 
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe
$ perl count_first_name.pl input.txt 
George appeared 1 times
James appeared 2 times
John appeared 1 times
Thomas appeared 1 times

See a sample answer here.

write_lines_to_files

Obtain an input file and an output directory from the command line. Produce one output file for each line in the input file, use the line number (i.e., 1, 2, 3, etc) as the filenames.

Hint: learn about filehandle and directory operation before attempting this exercise.

$ cat input.txt 
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe
$ perl write_lines_to_files.pl input.txt output/
$ head output/*
==> output/1 <==
George Washington
 
==> output/2 <==
John Adams
 
==> output/3 <==
Thomas Jefferson
 
==> output/4 <==
James Madison
 
==> output/5 <==
James Monroe

See a sample answer here.

combine_files

The opposite of write_lines_to_files; obtain an input directory and an output file from the command line, read the files in the input directory and write their content to one single output file. Note that we want to exclude hidden files in the input directory.

Hint: learn about filehandle, directory operation, and pattern matching before attempting this exercise.

$ head output/*
==> output/1 <==
George Washington
 
==> output/2 <==
John Adams
 
==> output/3 <==
Thomas Jefferson
 
==> output/4 <==
James Madison
 
==> output/5 <==
James Monroe
$ perl combine_files.pl output/ combined.txt
$ cat combined.txt 
George Washington
John Adams
Thomas Jefferson
James Madison
James Monroe

See a sample answer here.

unwrap_fasta

Read a fasta file, unwrap the sequences (i.e., remove all extra line breaks), and save the result to an output file.

Hint: learn about the $/ variable (input record separator) before attempting this exercise.

$ cat coding.fasta 
>NP_414542
ATGAAACGCATTAGCACCACCATT
accaccaccatcaccattaccacaggta
ACGGTGCGGGCTGA
>NP_414617
ATGACTCACATCGTTCGCTTTA
TCGGTCTACTA
ctactaaacgcatcttctttgcgcggta
GACGAGTGAGCGGCATCCAGCATTAA
$ perl unwrap_fasta.pl coding.fasta unwrap.fasta
$ cat unwrap.fasta 
>NP_414542
ATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGA
>NP_414617
ATGACTCACATCGTTCGCTTTATCGGTCTACTACTACTAAACGCATCTTCTTTGCGCGGTAGACGAGTGAGCGGCATCCAGCATTAA

See a sample answer here.

display_codon

Read a fasta file that contain protein-coding sequences. Re-format the sequences to show codons (10 codons per line) in the output file.

Hint: learn about the substr function or regular expression before attempting this exercise.

$ cat coding.fasta 
>NP_414542
ATGAAACGCATTAGCACCACCATT
accaccaccatcaccattaccacaggta
ACGGTGCGGGCTGA
>NP_414617
ATGACTCACATCGTTCGCTTTA
TCGGTCTACTA
ctactaaacgcatcttctttgcgcggta
GACGAGTGAGCGGCATCCAGCATTAA
$ perl display_codon_1.pl coding.fasta codon.fasta 
$ cat codon.fasta 
>NP_414542
ATG AAA CGC ATT AGC ACC ACC ATT ACC ACC
ACC ATC ACC ATT ACC ACA GGT AAC GGT GCG
GGC TGA 
>NP_414617
ATG ACT CAC ATC GTT CGC TTT ATC GGT CTA
CTA CTA CTA AAC GCA TCT TCT TTG CGC GGT
AGA CGA GTG AGC GGC ATC CAG CAT TAA

See sample answer 1 here and sample answer 2 here.

count_EcoRI_site

Read a fasta file and count the number of EcoRI restriction sites in each sequence.

Hint: learn about the regular expression before attempting this exercise.

$ cat EcoRI.fasta 
>Seq_1
nnnGAA
TTCnnnGAATTCnnnGaattCnnn
>Seq_2
nnnGAATTCnnngAa
TtCnnn
$ perl count_EcoRI_site.pl EcoRI.fasta 
Seq_1 has 3 EcoRI sites
Seq_2 has 2 EcoRI sites

See a sample answer here.

Kuo Lab Wiki

Table of Contents

Perl Exercises

read_and_write

parse_first_name

count_first_name

write_lines_to_files

combine_files

unwrap_fasta

display_codon

count_EcoRI_site