개발 19~25일차, 티스토리 이전 테스트 진행

728x90

티스토리 블로그 이전을 위해 다양한 방법을 시도해 봤습니다.

1. 이미지 파일 변환

우선 티스토리를 백업받은 다음, 폴더 안의 이미지를 jpg와 png에서 webP 파일로 바꿨습니다.

관련 내용은 이전 글에 기록하였습니다.

https://act2.tistory.com/132

티스토리 블로그 백업 데이터의 이미지를 webP형식으로 한 번에 바꾸기

티스토리 블로그를 백업하고 궁금했던 점 한 가지는...'이미지가 jpg와 png 형식으로 되어있는데, 이걸 webp 형식으로 변환할 수 없을까?' 하는 것이었습니다. 글이 약 6,800여 개가 되니, 이걸 하나

act2.tistory.com

2. 이미지 주소 변경

그다음에는 html '이미지 주소'와 변환한 파일이 일치하도록 html의 이미지 주소를 수정했습니다.

import os
from pathlib import Path
import re

def update_img_src_to_webp(html_file):
    with open(html_file, 'r', encoding='utf-8') as file:
        content = file.read()

    # 정규식을 사용하여 img 태그의 src 속성을 찾고 webp로 변경
    updated_content = re.sub(r'(src=")(.*?)(\.(jpg|png|jpeg))(")', r'\1\2.webp\5', content, flags=re.IGNORECASE)

    return updated_content

# 백업 폴더 경로 설정
base_dir = Path('.')  # 현재 디렉토리를 기준으로 설정

# 각 폴더의 HTML 파일을 순회하며 변경
for folder in base_dir.iterdir():
    if folder.is_dir():
        for file in folder.iterdir():
            if file.suffix.lower() == '.html':
                updated_content = update_img_src_to_webp(file)
                
                # 변경된 내용을 파일에 다시 쓰기
                with open(file, 'w', encoding='utf-8') as updated_file:
                    updated_file.write(updated_content)
                
                print(f"Updated: {file}")

print("HTML 파일 업데이트 완료")

3. 구글 블로그에 업로드하기

구글 블로그에는 '이메일로 게시하기' 기능이 있습니다.

그래서 저는 티스토리 백업 폴더의 이미지를 webP 형식으로 바꾸고, html의 이미지 주소도 바꾼 후,

이메일을 통해 블로그에 글을 게시했습니다.

구글 블로그의 경우, 하루에 최대 100개의 글만 쓸 수 있어, 이메일을 통해서도 하루 100개의 글만 게시할 수 있었습니다.

관련 코드는 다음과 같습니다.

import os
import time
from pathlib import Path
from bs4 import BeautifulSoup
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders
import smtplib

def send_email(subject, body, to_email, from_email, password, attachments):
    msg = MIMEMultipart()
    msg['From'] = from_email
    msg['To'] = to_email
    msg['Subject'] = subject

    # 텍스트 본문 추가 (인코딩 명시)
    text_part = MIMEText(body.encode('utf-8'), 'plain', 'utf-8')
    msg.attach(text_part)

    # 디버깅: 본문 내용 출력
    print(f"Email body: {body[:100]}...")  # 처음 100자만 출력

    # 첨부파일 추가
    for attachment in attachments:
        try:
            with open(attachment, 'rb') as file:
                part = MIMEBase('application', 'octet-stream')
                part.set_payload(file.read())
                encoders.encode_base64(part)
                part.add_header('Content-Disposition', f'attachment; filename={os.path.basename(attachment)}')
                msg.attach(part)
        except FileNotFoundError:
            print(f"첨부 파일을 찾을 수 없습니다: {attachment}")

    try:
        with smtplib.SMTP('smtp.gmail.com', 587) as server:
            server.starttls()
            server.login(from_email, password)
            server.send_message(msg)
            print(f"이메일 '{subject}'이 성공적으로 전송되었습니다.")
    except smtplib.SMTPAuthenticationError:
        print("인증 오류: 이메일 주소 또는 비밀번호를 확인하세요.")
    except smtplib.SMTPException as e:
        print(f"SMTP 오류 발생: {e}")
    except Exception as e:
        print(f"이메일 전송 중 오류 발생: {e}")

def prepare_email_content(base_dir):
    base_dir = Path(base_dir)
    emails = []

    if not base_dir.exists() or not base_dir.is_dir():
        print(f"지정된 디렉토리가 존재하지 않습니다: {base_dir}")
        return emails

    for folder in sorted(base_dir.iterdir()):
        if folder.is_dir():
            html_files = list(folder.glob("*.html"))
            img_folder = folder / "img"

            if not html_files or not img_folder.exists():
                print(f"폴더 {folder} 건너뛰기, HTML 파일 또는 img 폴더 누락.")
                continue

            html_file = html_files[0]  # 첫 번째 HTML 파일 사용

            try:
                with open(html_file, 'r', encoding='utf-8') as file:
                    html_content = file.read()
            except UnicodeDecodeError:
                print(f"파일 인코딩 오류: {html_file}")
                continue

            soup = BeautifulSoup(html_content, 'html.parser')
            text_content = soup.get_text(separator='\n', strip=True)

            # Add '#end' to the email body
            email_body = f"{text_content}\n#end"

            # 디버깅: 본문 내용 확인
            print(f"Prepared email body for {html_file.stem}: {email_body[:100]}...")  # 처음 100자만 출력

            emails.append({
                "title": html_file.stem,
                "body": email_body,
                "images": list(img_folder.glob("*"))  # Attach all files in the img folder
            })

    return emails

def post_to_blogger(base_dir, to_email, from_email, password):
    emails = prepare_email_content(base_dir)

    for email in emails:
        subject = email['title']
        body = email['body']
        attachments = [str(img) for img in email['images']]

        send_email(subject, body, to_email, from_email, password, attachments)
        print(f"Post '{subject}' sent successfully.")

        # Wait 3 minutes between emails to avoid being flagged as spam
        time.sleep(120)

if __name__ == "__main__":
    base_dir = "."  # Replace with your backup folder path
    to_email = "abc.sitename@blogger.com"  # Replace with the recipient's Blogger email address
    from_email = "my-email@gmail.com"  # Replace with your Gmail address
    password = "abcdefghijklmn"  # Replace with your Gmail app password

    post_to_blogger(base_dir, to_email, from_email, password)

코드 마지막 부분에 나오는 to_email, from_emial 부분과 password 부분은 자신에게 맞게 설정하면 되는데, 이때 패스워드 관련해서는 GPT에게 물어보신 후, 알려주는 대로 구글에서 설정하시면 됩니다.

제가 운영하는 야호펫 사이트의 글이 거의 7,000개 가까이 되니까... 하루 100개씩이면... 세월이군요.

그래서 다른 방법을 찾아봤습니다.

4. Hashnode로 이전하기

티스토리에서 구글블로그로 이메일을 사용해 이전하려면... 적어도 2달 이상 소요된다는 답답함.

그래서 다른 방법을 찾던 중, Hashnode라는 플랫폼을 발견하게 되었습니다.

해시노드는 마크다운 형태로 글을 작성하는데, 고맙게도 마크다운 형태의 파일을 import 해 주는군요.

그래서 이번에는 바로 웹사이트에서 마크다운 파일을 만들고, 이 파일들을 압축해 하나의 압축파일로 만들어,

해시노드에 업로드했습니다.

다음은 웹사이트에서 마크다운 파일, 그리고 압축파일을 만드는 코드입니다.

import requests
from bs4 import BeautifulSoup
import os
import zipfile
import time
from datetime import datetime
import uuid

def get_page_content(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    return None

def html_to_markdown(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    title = soup.find('h1').text.strip() if soup.find('h1') else 'Untitled'
    content = soup.find('div', class_='entry-content') or soup.find('article') or soup.find('main')
    if not content:
        return f"---\ntitle: \"{title}\"\ndate: \"{datetime.now().isoformat()}\"\nslug: \"{title.lower().replace(' ', '-')}\"\nimages: []\n---\n\n본문 내용을 찾을 수 없습니다.\n\n"

    images = content.find_all('img')
    image_urls = [img['src'] for img in images if 'src' in img.attrs]

    markdown_content = ''
    for element in content.find_all(['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'img']):
        if element.name == 'img' and 'src' in element.attrs:
            markdown_content += f"![Image]({element['src']})\n\n"
        else:
            markdown_content += element.get_text().strip() + '\n\n'

    return f"""---
title: "{title}"
date: "{datetime.now().isoformat()}"
slug: "{title.lower().replace(' ', '-')}"
images: {image_urls}
---

{markdown_content.strip()}"""

def save_markdown(content, filename):
    with open(filename, 'w', encoding='utf-8') as f:
        f.write(content)

def create_zip(directory, zip_filename):
    with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(directory):
            for file in files:
                zipf.write(os.path.join(root, file), file)

def generate_unique_filename(base_name, extension):
    return f"{base_name}_{uuid.uuid4().hex[:8]}.{extension}"

# 메인 실행 코드
base_url = "https://abc.tistory.com"
output_dir = "markdown_output"
os.makedirs(output_dir, exist_ok=True)

page_number = 1
empty_page_count = 0
max_empty_pages = 10
pages_per_zip = 100
current_zip_pages = 0

while True:
    url = f"{base_url}/{page_number}"
    content = get_page_content(url)

    if not content:
        print(f"페이지 {page_number}에서 내용을 찾을 수 없습니다.")
        empty_page_count += 1

        if empty_page_count >= max_empty_pages:
            print(f"{max_empty_pages}개의 연속된 빈 페이지를 발견했습니다. 크롤링을 종료합니다.")
            break
    else:
        empty_page_count = 0
        markdown_content = html_to_markdown(content)
        save_markdown(markdown_content, os.path.join(output_dir, f"page_{page_number}.md"))
        print(f"페이지 {page_number} 처리 완료")

        current_zip_pages += 1

        if current_zip_pages == pages_per_zip:
            zip_filename = generate_unique_filename("markdown_archive", "zip")
            create_zip(output_dir, zip_filename)
            print(f"{zip_filename} 생성 완료")

            # 기존 파일 삭제
            for file in os.listdir(output_dir):
                os.remove(os.path.join(output_dir, file))

            current_zip_pages = 0

    page_number += 1
    time.sleep(1)

# 마지막 남은 페이지들도 압축
if current_zip_pages > 0:
    zip_filename = generate_unique_filename("markdown_archive", "zip")
    create_zip(output_dir, zip_filename)
    print(f"{zip_filename} 생성 완료")

print("모든 작업이 완료되었습니다.")

처음부터 100페이지 단위로 읽어와 압축파일 하나를 만든 후, 해시노드에 import 시켰습니다.

티스토리를 구글로 옮기려면 최소 2달 이상이 소요됐을 텐데, 해시노드로 이전하려고 히니... 말도 안 되게 시간이 줄어드는군요.

그래도 100개 단위로 파일을 압축해 수작업으로 올리려고 하니, 시간이 많이 소요되긴 하는군요.

해시노드는 개발자를 위한 개발 플랫폼이라고 보시면 되는데, 오늘 서비스를 찬찬히 읽어보니, 제 사이트 성격과 맞는지 의문이 들긴 합니다. 일반사용자도 이용할 수 있다고 하는데, 조금 염려는 되네요.

하지만 백앤드와 프런트앤드가 구분되어 있고, 풀스택을 구현할 수도 있고, 다양한 서비스를 자유자재로 입맛에 맞게 구현할 수 있어... 제가 본 서비스 중에는 최고의 서비스인 것 같습니다.

지금 열심히 파일을 옮기고 있는데, 다 옮기고 나서 다른 기능들을 구현해 봐야겠습니다. 무엇보다 SEO가 뛰어나다는 점이 마음에 듭니다.

5. 마지막으로 남은 테스트, 워드프레스로의 이전

해시노드 서비스가 마음에 드는데, 서비스 주 고객과 제 사이트의 성격이 안 맞아, 마지막으로 워드프레스로 글을 옮겨볼까 합니다.

예전에 워드프레스를 사용해 봤는데, 무엇보다 문제는 이미지 용량이 커질수록 비용이 증가한다는 점이었습니다.

그런데 이미지를 webP 형식으로 변환도 했겠다. 글을 작성할 때는 이미지를 링크로 연결할 거라,

최초 백업 폴더에 있는 용량 정도만 업로드하면 서비스 이용에 문제는 없을 것 같습니다.

구글 블로그, 해시노드, 워드프레스...

글을 옮기면서 이전에 작성했던 글들을 살펴봤는데, 그 가치를 저도 모르고 있었다는 생각을 합니다.

한 땀 한땀 써 내려간 글들의 가치... 이제야 비로소 그 가치를 알겠네요.

서비스를 이전하면, 사이트 주소체계가 무너지는 단점이 있지만,

SEO를 얻는다는 이점이 있습니다.

해시노드나 워드프레스, 어떤 선택을 하든 티스토리에 갇혔던 제 글들이 빛을 보게 될 것 같습니다.

완성이 되는 날, 이 개발일지는 마무리할 예정입니다.

만약 해시노드를 선택한다면, 백앤드와 프런트앤드 공부를 계속하면서,

사이트를 발전시켜 나갈 것 같고,

워드프레스를 선택한다면, 콘텐츠 쪽으로 무게중심이 이동할 것 같습니다.

마지막 남은 테스트, 워드프레스로의 이전, 잘 마무리되었으면 좋겠습니다.

728x90

코딩 스토리

개발 19~25일차, 티스토리 이전 테스트 진행

1. 이미지 파일 변환

2. 이미지 주소 변경

3. 구글 블로그에 업로드하기

4. Hashnode로 이전하기

5. 마지막으로 남은 테스트, 워드프레스로의 이전

티스토리툴바

개발 19~25일차, 티스토리 이전 테스트 진행

1. 이미지 파일 변환

2. 이미지 주소 변경

3. 구글 블로그에 업로드하기

4. Hashnode로 이전하기

5. 마지막으로 남은 테스트, 워드프레스로의 이전

관련글

티스토리툴바