Tensorflow

케라스: 시퀀스 to 시퀀스 모델을 적용한 덧셈연산 구현

카카오그래놀라 2020. 11. 12. 22:22

시퀀스 to 시퀀스 모델을 적용한 덧셈연산

Sequence to sequence learning for performing number addition

 

 

Author: Smerity and others
Date created: 2015/08/17
Last modified: 2020/04/17
Description: A model that learns to add strings of numbers, e.g. "535+61" -> "596".

 

- Keras

- Github

- Colab

 

소개

이 예제에서 우리는 문자로 주어진 숫자열을 바탕으로 두 수의 덧셈을 이해하는 모델을 학습시키겠습니다.

In this example, we train a model to learn to add two numbers, provided as strings.

 

Example:

  • Input: "535+61" # 문자열로 이루어진 숫자
  • Output: "596" # 문자열로 이루어진 숫자

입력은 선택적으로 반전 될 수 있으며 다음과 같은 많은 작업에서 성능을 향상시키는 것으로 나타났습니다.
- Learning to Execute and Sequence to Sequence Learning with Neural Networks
이론적으로 시퀀스 순서 반전은이 문제에 대해 source와 target사이에 shorter term dependencies을 도입합니다.

 

Input may optionally be reversed, which was shown to increase performance in many tasks in: 

Learning to Execute and Sequence to Sequence Learning with Neural Networks

Theoretically, sequence order inversion introduces shorter term dependencies between source and target for this problem.

 

 

 

 

Results:

For two digits (reversed):

  • One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs

Three digits (reversed):

  • One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs

Four digits (reversed):

  • One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs

Five digits (reversed):

  • One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs

 

라이브러리 로드

from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# 모델 파라미터 및 데이터에 적용될 파라미터 지정
TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = True

# 최대 길이는 '+' 기호를 포함하기에 1을 더해줍니다. ex. '135+345' 3자리 + 덧셈기호 + 3자리: 7자리
MAXLEN = DIGITS + 1 + DIGITS

 

데이터셋 준비하기

class CharacterTable:
    """Given a set of characters:
    + Encode them to a one-hot integer representation
    + Decode the one-hot or integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
	"""
    주어진 문자열 셋.
    원핫인코더로 인코딩하기
    원핫인코더 또는 정수를 문자열로 디코딩하기
    문자열 확률 벡터로 디코딩
    
    """
	
    def __init__(self, chars):
        """Initialize character table.
        # Arguments
            chars: Characters that can appear in the input.
            # input에 등장할 수 있는 문자열 "0", "1"...."9", "+", " "
        """
        self.chars = sorted(set(chars)) # 중복없애고, 정렬하기
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        # 각 문자를 key, 해당 인덱스를 value로 가지는 딕셔너리만들기
        
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))
		# 문자의 인덱스를 key, 해당 문자를 value로 가지는 딕셔너리 만들기

    def encode(self, C, num_rows):
        """One-hot encode given string C.
        # Arguments
            C: 인코딩 될 문자
            num_rows: Number of rows in the returned one-hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        """Decode the given vector or 2D array to their character output.
        # Arguments
            x: A vector or a 2D array of probabilities or one-hot representations;
                or a vector of character indices (used with `calc_argmax=False`).
            calc_argmax: Whether to find the character index with maximum
                probability, defaults to `True`.
        """
        if calc_argmax:
            x = x.argmax(axis=-1)
        return "".join(self.indices_char[x] for x in x)


# All the numbers, plus sign and space for padding.
chars = "0123456789+ "
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()
print("Generating data...")
while len(questions) < TRAINING_SIZE:
    f = lambda: int(
        "".join(
            np.random.choice(list("0123456789"))
            for i in range(np.random.randint(1, DIGITS + 1))
        )
    )
    a, b = f(), f()
    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)
    # Pad the data with spaces such that it is always MAXLEN.
    q = "{}+{}".format(a, b)
    query = q + " " * (MAXLEN - len(q))
    ans = str(a + b)
    # Answers can be of maximum size DIGITS + 1.
    ans += " " * (DIGITS + 1 - len(ans))
    if REVERSE:
        # Reverse the query, e.g., '12+345  ' becomes '  543+21'. (Note the
        # space used for padding.)
        query = query[::-1]
    questions.append(query)
    expected.append(ans)
print("Total questions:", len(questions))

# Generating data...
# Total questions: 50000

 

데이터를 벡터화합니다.

print("Vectorization...")
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print("Training Data:")
print(x_train.shape) # (45000, 7, 12)
print(y_train.shape) # (45000, 4, 12)

print("Validation Data:")
print(x_val.shape) # (5000, 7, 12)
print(y_val.shape) # (5000, 4, 12)

 

모델 만들기

print("Build model...")
num_layers = 1  # 더 많은 LSTM layers를 시도해보세요!

model = keras.Sequential()

# "Encode" the input sequence using a LSTM, producing an output of size 128.
# 만약 input sequences의 길이가 다른 경우라면,
# input_shape=(None, num_feature) 옵션을 사용하세요.

model.add(layers.LSTM(128, input_shape=(MAXLEN, len(chars))))

# As the decoder RNN's input, repeatedly provide with the last output of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.

model.add(layers.RepeatVector(DIGITS + 1))

# The decoder RNN could be multiple layers stacked or a single layer.

for _ in range(num_layers):
    # return_sequences=True로 설정하면 마지막 출력뿐만 아니라 
    # 지금까지의 모든 출력을 (num_samples, timesteps, output_dim) 형식으로 반환합니다. 
    # 아래의 TimeDistributed는 첫 번째 차원이 시간이기에 return_sequences=True로 설정합니다.
    
    model.add(layers.LSTM(128, return_sequences=True))

# 입력의 모든 시간 슬라이스에 dense layer를 적용합니다. 
# 출력 시퀀스의 각 단계에 대해 선택할 문자를 결정합니다.

model.add(layers.Dense(len(chars), activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

# Build model...
# Model: "sequential"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# lstm (LSTM)                  (None, 128)               72192     
# _________________________________________________________________
# repeat_vector (RepeatVector) (None, 4, 128)            0         
# _________________________________________________________________
# lstm_1 (LSTM)                (None, 4, 128)            131584    
# _________________________________________________________________
# dense (Dense)                (None, 4, 12)             1548      
# =================================================================
# Total params: 205,324
# Trainable params: 205,324
# Non-trainable params: 0
# _________________________________________________________________

 

모델 훈련하기

epochs = 30
batch_size = 32

# 데이터 제너레이터에 의해 모델을 훈련하고,
# validation 데이터셋에서의 예측값들을 보여주겠습니다.

for epoch in range(1, epochs):
    print()
    print("Iteration", epoch)
    model.fit(
        x_train,
        y_train,
        batch_size=batch_size,
        epochs=1,
        validation_data=(x_val, y_val),
    )
    # validation셋에서 랜덤하게 10개의 예측값을 보겠습니다.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = np.argmax(model.predict(rowx), axis=-1)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print("Q", q[::-1] if REVERSE else q, end=" ")
        print("T", correct, end=" ")
        if correct == guess: 
            print("☑ " + guess)
        else:
            print("☒ " + guess)

 

30에폭안에 validation에 대해 정확도 99%를 달성할 수 있습니다.

You'll get to 99+% validation accuracy after ~30 epochs.

# Iteration 1
# 1407/1407 [==============================] - 8s 6ms/step
# loss: 1.7622 - accuracy: 0.3571 - val_loss: 1.5618 - val_accuracy: 0.4175
#
# Q 99+580  T 679  ☒ 905 
# Q 800+652 T 1452 ☒ 1311
# Q 900+0   T 900  ☒ 909 
# Q 26+12   T 38   ☒ 22  
# Q 8+397   T 405  ☒ 903 
# Q 14+478  T 492  ☒ 441 
# Q 59+589  T 648  ☒ 551 
# Q 653+77  T 730  ☒ 601 
# Q 10+35   T 45   ☒ 11  
# Q 51+185  T 236  ☒ 211 

# ...

# Iteration 29
# 1407/1407 [==============================] - 8s 6ms/step
# loss: 0.0262 - accuracy: 0.9929 - val_loss: 0.0643 - val_accuracy: 0.9804
#
# Q 128+86  T 214  ☑ 214 
# Q 20+494  T 514  ☑ 514 
# # Q 34+896  T 930  ☑ 930 
# Q 372+15  T 387  ☑ 387 
# Q 466+63  T 529  ☑ 529 
# Q 327+9   T 336  ☑ 336 
# Q 458+85  T 543  ☑ 543 
# Q 134+431 T 565  ☑ 565 
# Q 807+289 T 1096 ☑ 1096
# Q 100+60  T 160  ☑ 160