'Finding motifs and position of motif in FASTA file - Perl

Can someone help me with this Perl code? When I run it, nothing happens. No errors or anything which is weird to me. It reads in and opens the file just fine. I believe the problem is in the while loop or the foreach loop cause I honestly don't think I understand them. I'm very new at this with a pretty shit teacher.

Instructions: Declare a scalar variable called motif and make it AAA. Declare an array variable called locations, which is where the locations of the motif will be stored. Place the gene in a scalar variable. Now search for that motif in the amborella gene. The code should print the position of the motif and the motif found. You will need to write a while loop that searches for the motif and includes push, pos, and –length commands in order to save and report locations. Then you will need a foreach loop to print the locations and the motif. (If it only reports locations in the first line of the gene, remember that is because the gene is in a scalar variable that will only read the first line. That is acceptable.

My code so far:

#!/usr/bin/perl
use warnings;
use strict;

#Declare a scalar variable called motif and make it AAA.
my$motif="AAA";

#Declare an array variable called locations, which is where the
#locations of the motif will be stored.
my@locations=();
my$foundMotif="";
my$position=();

#Place the gene in a scalar variable.
my$geneFileName = 'amborella.txt';
open(GENEFILE, $geneFileName) or die "Can't read file!";
my$gene = <GENEFILE>;

#Now search for that motif in the amborella gene.
#The code should print the position of the motif and the motif
#found. You will need to write a while loop that searches for the
#motif and includes push, pos, and –length commands in order to
#save and report locations.

while($foundMotif =~ m/AAA/g) {
$position=(pos($foundMotif)-3);
push (@locations, $position);
}

#Then you will need a foreach loop to print the locations and the motif.
foreach $position (@locations){
print "\n Found motif: ", $motif, "\n at position: ", $position;
}

#close the file
close GENEFILE;

exit;


Solution 1:[1]

Your program is fine, it's a simple mix-up.

You are matching against an empty string.

while($foundMotif =~ m/AAA/g) {
  $position = (pos($foundMotif)-3);
  push (@locations, $position);
}

You're looking for AAA in $foundMotif. But that's an empty string because you just declared it further up. Your gene string (disclaimer: I know nothing about bio informatics) is $gene. That's what you need to match.


Let's go through it step by step. I've simplified your code and put in an example string. I'm aware that isn't what genes look like, but that doesn't matter. This is already fixed.

use strict;
use warnings;

my $motif = "AAA";

my @locations  = ();

# ... skip reading the file
my $gene = "ABAABAAABAAAAB\n";

while ($gene =~ m/$motif/g) {                     # 1, 2
    my $position = (pos($gene) - length($motif)); # 3, 4
    push(@locations, $position);
}

foreach $position (@locations) {
    print "\n Found motif: ", $motif, "\n at position: ", $position;
}

If you run this, the code now produces meaningful output.

 Found motif: AAA
 at position: 5
 Found motif: AAA
 at position: 9

I've made four changes:

  1. You need to search in $gene
  2. Your variable $motif is meaningless if you don't use it to search. That way, your program becomes dynamic.
  3. Again, you need to use the pos() in $gene
  4. To make it dynamic, you shouldn't hard-code the length

You don't need the $foundMotif variable at all. The $position can actually be lexical to the block it's in. That means, it will be a different variable each time the loop is run, which is simply good practice. In Perl, you want to always use the smallest scope possible for variables, and declare them only when you need them, not in advance.

Since this is a learning exercise, it makes sense to iterate the array separately. In a real life program, you could eliminate the foreach loop and the array and output the positions directly if you were not to use them later on.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1