[Rosalind] BA1B - Find the Most Frequent Words in a String

Python study/Rosalind

[Rosalind] BA1B - Find the Most Frequent Words in a String

김쿼드 2022. 8. 2. 18:01

BA1B - Find the Most Frequent Words in a String

*문제 링크 : https://rosalind.info/problems/ba1b/

두 번째 문제입니다. 주어진 정수 k에 대해서, 가장 많이 등장하는 k-mer를 찾아 반환하면 되는 문제입니다. 아직까진 할 만 하니 가볍게 살펴볼까요.

import pandas as pd
file = pd.read_table("../../Downloads/rosalind_ba1b.txt", header= None)

일단 불러오는 것 부터 시작합니다. 파일 경로는 알맞게 바꿔주시고요.

sequence = file[0][0]
number = file[0][1]

함수의 input으로 들어갈 sequence와 number를 file 변수에서 찾아 명시해줍니다.

def FrequentWords(Text, Number) :
    tempdict = {}
    for i in range(len(Text)-int(Number)):
        if Text[i : i+int(Number)] not in tempdict :
            tempdict[Text[i : i+int(Number)]] = 1
        else :
            tempdict[Text[i : i+int(Number)]] += 1
        
        max_value = max(tempdict.values())
        max_pattern = {key for key, value in tempdict.items() if value == max_value}
        
    return list(max_pattern)

일단 함수는 이런 논리로 작성했습니다.

1. 빈 dictionary를 작성

2. loop를 돌려 string이 dictionary에 있는지 확인

3. 조건에 맞는 값는 dictionary key 및 value 처리

4. max 값을 찾고, 그 값을 가지는 패턴을 반환

첫번째로 빈 dictionary를 만들어 준 뒤, 0 부터 Text - Number의 길이까지만 도는 for loop를 만들어 줍니다. 그리고 loop를 돌릴때마다 나오는 슬라이싱 된 Text (즉 k-mer의 string)이 dictionary에 포함 되어있는지의 유무를 not in으로 판별합니다. 만약 dictionary에 해당 슬라이싱 된 Text가 없다면 key를 새로 넣어주고, 이미 있다면 value를 1씩 올려주는 방식으로 작성하면 됩니다. 그 후, 가장 높은 value를 가지는 key를 리스트로 반환하면 됩니다.

tmp = FrequentWords(sequence, number)
tmp
['CAGGTATGGTACCG', 'ACAGGTATGGTACC', 'GATTAGGGGTACCG', 'AGATTAGGGGTACC']

이번 문제도 할만하네요. 다음 문제로 넘어가 봅시다.

'Python study > Rosalind' 카테고리의 다른 글

[Rosalind] BA1D - Find All Occurrences of a Pattern in a String (0)	2022.08.04
[Rosalind] BA1C - Find the Reverse Complement of a String (0)	2022.08.03
[Rosalind] BA1A - Compute the Number of Times a Pattern Appears in a Text (0)	2022.08.02

현재글[Rosalind] BA1B - Find the Most Frequent Words in a String

바이오인포

bioinformatics, Rosalind, Stability AI, 배아모델, 파이썬, 프로그래머스코딩테스트, embryo model, 리스트내포, 알고리즘공부, Audiocraft, Google, 프로그래머스문제풀이, Algorithm, Stable Audio, semaglutide, 파이썬알고리즘풀이, 파이썬코딩테스트, 파이썬알고리즘연습, meta, R,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Quad in Bioinformatics