regex - For a set of sequences store sequence in hash and its open reading frames as values -
as follow (find multiple matches of , nucleotide sequence)
i want add each orf (as atg...tag or atg...taa) hash each sequence sequence have orfs attached values. have far -
#!/usr/bin/perl use warnings; use strict; @file = qw(atgccccccccccccctagatgaaaaaaaaaataaatgaaaaatagatgccccccccccccccc atgcgcgctatatatgcgcgggctaatatat atatgaggtcgtagctagcaaacacaaataaa ); %hash; foreach (@file){ @match = ($_ =~ /(atg\w+?ta[ag])/g); # make %hash sequence key , orfs values)... }
can me?
building on code: (i've changed nucleotide sequence make easier see stop , start codons, but work same way sequences...) i've stored matched sequences in array within hash of arrays follows:
#!/usr/bin/perl use warnings; use strict; use data::dumper; @file = qw(atgcgcgcgcgcgcgtaaatgatatatatatatag atgccccccccctaagggggggggatgttttttttttttag atatgaggggatagaaaatttttttctttct); (@match, %hash, @sequence, $line); $line_number = 0; foreach (@file){ push @match, /(atg\w+?ta[ag])/g; push @sequence, @file 0 .. $#match; } push @ { $hash{$sequence[$_]}}, [$match[$_] ] 0 .. $#match; # hasho of arrays $key (sort keys %hash){ $orf (@ { $hash{$key}}){ ($match) = @$orf; print "sequence:$key contains orfs: $match\n"; } }
output:
sequence:atgccccccccctaagggggggggatgttttttttttttag contains orfs: atgatatatatatatag sequence:atgccccccccctaagggggggggatgttttttttttttag contains orfs: atgaggggatag sequence:atgcgcgcgcgcgcgtaaatgatatatatatatag contains orfs: atgcgcgcgcgcgcgtaa sequence:atgcgcgcgcgcgcgtaaatgatatatatatatag contains orfs: atgttttttttttttag sequence:atatgaggggatagaaaatttttttctttct contains orfs: atgccccccccctaa
Comments
Post a Comment