Notice
Recent Posts
Recent Comments
ยซ   2024/12   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
Archives
Today
In Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

A Joyful AI Research Journey๐ŸŒณ๐Ÿ˜Š

[4] 241104 Data Preprocessing, Word Cloud, ChatGPT API [Goorm All-In-One Pass! AI Project Master - 4th Session, Day 4] ๋ณธ๋ฌธ

๐ŸŒณAI & Quantum Computing Bootcamp 2024โœจ/AI Lecture Revision

[4] 241104 Data Preprocessing, Word Cloud, ChatGPT API [Goorm All-In-One Pass! AI Project Master - 4th Session, Day 4]

yjyuwisely 2024. 11. 4. 11:53

241104 Mon 4th class

์˜ค๋Š˜ ๋ฐฐ์šด ๊ฒƒ ์ค‘ ๊ธฐ์–ตํ•  ๊ฒƒ์„ ์ •๋ฆฌํ–ˆ๋‹ค.


https://rowan-sail-868.notion.site/da602ba0748c4e2cb12b443178b16507

 

ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ธฐ๋ณธ | Notion

์ˆ˜์—… ์ž๋ฃŒ๋กœ ์ ‘์†ํ•˜๊ธฐ

rowan-sail-868.notion.site

https://ldjwj.github.io/CHATGPT_AI_CLASS/01_TextPre_V10.html

 

01_TextPre_V10

 

ldjwj.github.io


๋น„์ „๊ณต์ž, ๋Œ€ํ•™์›X, 4๋…„ ๋™์•ˆ ์˜ฌ๋ผ๊ฐ, ๋Œ€ํšŒ 2๊ฐœ 1๋“ฑ 

์ง‘์ค‘ํ•ด์„œ ๊ณต๋ถ€

ํ•˜๋‚˜๋ฅผ ํ•˜๋”๋ผ๋„ ์ œ๋Œ€๋กœ ํ•œ๋‹ค.

๋…ผ๋ฌธ ์ฝ๊ณ  ๋Œ€ํšŒ ์ฐธ์—ฌ, ํ˜ผ์ž ๊ณต๋ถ€, ์„ฑ์žฅํ•จ

๋Šฆ์€ ๊ฑด X

์™”๋‹ค๊ฐ”๋‹ค X ์šฐ๋ฃจ๋ฃจ ์ซ“์•„๋‹ค๋‹ˆ์ง€ X

๋Œ€๊ธฐ์—… ๋ฌธ์„œ ๋งŽ์Œ 

๊ณผ์ • 2-3๊ฐœ ํ•œ๋‹ค๊ณ  ์ง€์‹ ๋งŽ์•„์ง€๋Š” ๊ฑด ์•„๋‹ˆ๋‹ค

์ดˆ๊ธ‰ ๊ณผ์ •์€ ๋น„์Šทํ•˜๋‹ค

1๋…„ ๊ณต๋ถ€ํ•ด๋„ ๋‹ฌ๋ผ์ง„ ๊ฒŒ ์—†๋‹ค ๋Š˜์€ ๊ฑด X

๋งŽ์ด ์•Œ์•„์•ผ ํ•œ๋‹ค

ํ•„์ˆ˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ํ•„์ˆ˜๋‹ค. 

์—ด์‹ฌํžˆ ํ•œ๋‹ค 


01. ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

  • ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋Š” ์ธ๊ฐ„์˜ ์–ธ์–ด๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•˜๊ณ  ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ์ˆ 
  • NLP๋Š” ํ…์ŠคํŠธ์™€ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์˜๋ฏธ๋ฅผ ์ถ”์ถœํ•˜๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ํฌํ•จ.

03. ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์˜ ๊ธฐ๋ณธ ์šฉ์–ด๋ฅผ ์ดํ•ด

  • ํ† ํฐํ™”(Tokenization): ํ…์ŠคํŠธ๋ฅผ ์ž‘์€ ๋‹จ์œ„๋กœ ๋ถ„ํ• .
  • ํ˜•ํƒœ์†Œ ๋ถ„์„(Morphological Analysis): ๋‹จ์–ด๋ฅผ ํ˜•ํƒœ์†Œ๋กœ ๋ถ„๋ฆฌํ•˜๊ณ  ํ’ˆ์‚ฌ ํƒœ๊น….
  • ์–ด๊ฐ„ ์ถ”์ถœ(Stem Extraction): ๋‹จ์–ด์˜ ๊ธฐ๋ณธ ํ˜•ํƒœ ์ถ”์ถœ.
  • ํ’ˆ์‚ฌ ํƒœ๊น…(part-of-speech tagging (POS) Tagging): ๋‹จ์–ด์˜ ํ’ˆ์‚ฌ๋ฅผ ์‹๋ณ„.
  • ๋ช…๋ช…๋œ ๊ฐœ์ฒด ์ธ์‹(NER): ์ธ๋ฌผ, ์žฅ์†Œ ๋“ฑ์˜ ๋ช…๋ช…๋œ ๊ฐœ์ฒด ์‹๋ณ„.
  • ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ(Word Embedding): ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์˜๋ฏธ๋ฅผ ์ˆ˜์น˜์ ์œผ๋กœ ํ‘œํ˜„.

https://ldjwj.github.io/CLASS_PY_LIB_LEVELUP/06_DATA_ANALYSIS/01_%ED%85%8D%EC%8A%A4%ED%8A%B8%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B6%84%EC%84%9D1_%EB%B9%88%EB%8F%84%EB%B6%84%EC%84%9D_V10.html

 

ํ…์ŠคํŠธ๋ฐ์ดํ„ฐ๋ถ„์„1_๋นˆ๋„๋ถ„์„_V10

 

ldjwj.github.io


๋นˆ๋„ ๋ถ„์„ 

LLM ์ฑ…์— ๋‹ค ๋‚˜์˜จ๋‹ค.

import nltk
print(nltk.__version__)

3.8.1

!pip install nltk

Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (3.8.1) Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk) (8.1.7) Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk) (1.4.2) Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk) (2024.9.11) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk) (4.66.6)

import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

# NLTK ๋ฐ์ดํ„ฐ ๋‹ค์šด๋กœ๋“œ (์ฒ˜์Œ ํ•œ ๋ฒˆ๋งŒ ์‹คํ–‰)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
True

 

# ์˜ˆ์ œ ๋ฌธ์žฅ
sentence = "Natural language processing makes it possible for computers to understand human language."

# ํ† ํฐํ™” (Tokenization)
tokens = word_tokenize(sentence)
print("ํ† ํฐํ™” ๊ฒฐ๊ณผ:", tokens)

# ํ’ˆ์‚ฌ ํƒœ๊น… (POS Tagging)
tagged_tokens = pos_tag(tokens)
print("ํ’ˆ์‚ฌ ํƒœ๊น… ๊ฒฐ๊ณผ:", tagged_tokens)
ํ† ํฐํ™” ๊ฒฐ๊ณผ: ['Natural', 'language', 'processing', 'makes', 'it', 'possible', 'for', 'computers', 'to', 'understand', 'human', 'language', '.']
ํ’ˆ์‚ฌ ํƒœ๊น… ๊ฒฐ๊ณผ: [('Natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('makes', 'VBZ'), ('it', 'PRP'), ('possible', 'JJ'), ('for', 'IN'), ('computers', 'NNS'), ('to', 'TO'), ('understand', 'VB'), ('human', 'JJ'), ('language', 'NN'), ('.', '.')]
 
re ์ •๊ทœ ํ‘œํ˜„์‹ 

F4

๋ถˆ์šฉ์–ด์ฒ˜๋ฆฌ

๋ถˆ์šฉ์–ด ๋˜๋Š” ์ œ์™ธ์–ด๋Š” ๋ถˆ์šฉ ๋ชฉ๋ก์— ์žˆ๋Š” ๋‹จ์–ด๋กœ, ์ž์—ฐ์–ด ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ์ „ํ›„์— ์ค‘์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ํ•„ํ„ฐ๋ง๋œ๋‹ค. ๋ชจ๋“  ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋„๊ตฌ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์ผํ•œ ๋ฒ”์šฉ ๋ถˆ์šฉ์–ด ๋ชฉ๋ก์€ ์—†์œผ๋ฉฐ, ๋ถˆ์šฉ์–ด ์‹๋ณ„์„ ์œ„ํ•ด ํ•ฉ์˜๋œ ๊ทœ์น™๋„ ์—†์œผ๋ฉฐ, ์‹ค์ œ๋กœ ๋ชจ๋“  ๋„๊ตฌ๊ฐ€ ์ด๋Ÿฌํ•œ ๋ชฉ๋ก์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋„ ์•„๋‹ˆ๋‹ค.


import os #๋ฆฌ๋ˆ…์Šค: ํด๋” ์ด๋ฆ„ directory 
from collections import Counter #๋นˆ๋„ ๊ฐœ์ˆ˜ ํ™•์ธ 
import re
from nltk.corpus import stopwords #์˜์–ด, ๋ถˆ์šฉ์–ด ๋ชจ์•„๋‘  
import matplotlib.pyplot as plt
# ํ…์ŠคํŠธ ํŒŒ์ผ ๊ฒฝ๋กœ
file_paths = [
    "01_๋‹ค๋ฅธ๊ฒฝ์Ÿ์‚ฌ์™€๊ฐ„๋‹จ๋น„๊ต.txt",
    "02_๊ธฐ์—…๋ฆฌ์„œ์น˜๊ด€๋ จ์ •๋ฆฌ.txt",
    "03_์ƒ์„ฑAI๋ถ„์„.txt"
]

for file_path in file_paths:
  with open(file_path, 'r', encoding='utf-8') as file: # r ์ฝ๊ธฐ ์ „์šฉ
    text = file.read()
    print("ํŒŒ์ผ๋ช… : ", file_path)
    print("ํŒŒ์ผ ๋‚ด์šฉ : ", text)
# ํ•œ๊ตญ์–ด ๋ถˆ์šฉ์–ด ์ถ”๊ฐ€
# ์ˆ˜๋™์œผ๋กœ ์ •์˜ํ•œ ํ•œ๊ตญ์–ด ๋ถˆ์šฉ์–ด ๋ฆฌ์ŠคํŠธ
korean_stopwords = {
    '์˜', '๊ฐ€', '์ด', '์€', '๋“ค', '๋Š”', '์ข€', '์ž˜', '๊ฑ', '๊ณผ', '๋„', '๋ฅผ', '์œผ๋กœ',
    '์ž', '์—', '์™€', 'ํ•œ', 'ํ•˜๋‹ค', '์—์„œ', '๊ฒƒ', '๋ฐ', '์œ„ํ•ด', '๊ทธ', '๋˜๋‹ค'
}

additional_stopwords = {'๊ฐ•์ ', '์•ฝ์ ', '๊ฒฝ์Ÿ์‚ฌ'}  # ๋ถ„์„์— ๋ถˆํ•„์š”ํ•œ ๋‹จ์–ด ์ถ”๊ฐ€
korean_stopwords.update(additional_stopwords)

# ํŒŒ์ผ๋ณ„ ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ ๋ฐ ๋‹จ์–ด ๋นˆ๋„ ๊ณ„์‚ฐ
def process_text(text):
    # ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ: ์†Œ๋ฌธ์žํ™”, ํŠน์ˆ˜ ๋ฌธ์ž ์ œ๊ฑฐ, ๋ถˆ์šฉ์–ด ์ œ๊ฑฐ
    text = text.lower() 
    text = re.sub(r'[^\w\s]', '', text) #ํŠน์ˆ˜ ๋ฌธ์ž ์ œ๊ฑฐ
    words = text.split() # ๊ณต๋ฐฑ์œผ๋กœ ๋‚˜๋ˆ ์คŒ 
    words = [word for word in words if word not in korean_stopwords and len(word) > 1] # ๋ถˆ์šฉ์–ดX, ๊ธธ์ด๊ฐ€ 1๋ณด๋‹ค ํผ 
    return words

# ๋นˆ๋„ ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ
word_frequencies = []

for file_path in file_paths:
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
        words = process_text(text)
        word_freq = Counter(words) # ๋นˆ๋„ ๋ถ„์„ 
        word_frequencies.append(word_freq) # ์ถ”๊ฐ€ํ•จ 

# ํŒŒ์ผ๋ณ„๋กœ ๊ฐ€์žฅ ์ž์ฃผ ๋“ฑ์žฅํ•œ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด ์ถœ๋ ฅ
for i, freq in enumerate(word_frequencies):
    print(f"\nํŒŒ์ผ {i+1}์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:")
    print(freq.most_common(10))

# ์‹œ๊ฐํ™”: ํŒŒ์ผ๋ณ„ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด ๋นˆ๋„
for i, freq in enumerate(word_frequencies):
    common_words = freq.most_common(10)
    words, counts = zip(*common_words)

    plt.figure(figsize=(10, 5))
    plt.bar(words, counts)
    plt.title(f'ํŒŒ์ผ {i+1} ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด ๋นˆ๋„')
    plt.xticks(rotation=45)
    plt.show()
ํŒŒ์ผ 1์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:
[('์ œํ’ˆ', 11), ('๊ธ€๋กœ๋ฒŒ', 10), ('์‹œ์žฅ', 7), ('๊ธฐ์ˆ ', 5), ('๋ฐ˜๋„์ฒด', 5), ('์‚ผ์„ฑ์ „์ž', 4), ('์Šค๋งˆํŠธํฐ', 4), ('๋””์Šคํ”Œ๋ ˆ์ด', 4), ('๊ฐ•๋ ฅํ•œ', 4), ('a์‚ฌ', 4)]

ํŒŒ์ผ 2์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:
[('์žˆ์Šต๋‹ˆ๋‹ค', 19), ('๋ฐ˜๋„์ฒด', 11), ('๊ธ€๋กœ๋ฒŒ', 11), ('์‚ฌ์—…', 8), ('๋””์Šคํ”Œ๋ ˆ์ด', 7), ('์Šค๋งˆํŠธํฐ', 7), ('์‚ผ์„ฑ์ „์ž๋Š”', 6), ('๋‹ค์–‘ํ•œ', 6), ('ํŠนํžˆ', 6), ('๊ธฐ์ˆ ', 5)]

ํŒŒ์ผ 3์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:
[('์žˆ์Šต๋‹ˆ๋‹ค', 20), ('๊ธ€๋กœ๋ฒŒ', 12), ('์‚ผ์„ฑ์ „์ž๋Š”', 11), ('์žˆ์œผ๋ฉฐ', 11), ('๋ฐ˜๋„์ฒด', 10), ('์Šค๋งˆํŠธํฐ', 6), ('์‹œ์žฅ', 6), ('๊ธฐ์ˆ ', 6), ('๋””์Šคํ”Œ๋ ˆ์ด', 5), ('๋Œ€ํ•œ', 5)]
3-5 [์‹ค์Šต 1] ๋ถˆ์šฉ์–ด๋ฅผ ์ถ”๊ฐ€ํ•ด์„œ, ์ตœ์ข…์ ์œผ๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.
# ํ•œ๊ตญ์–ด ๋ถˆ์šฉ์–ด ์ถ”๊ฐ€
# ์ˆ˜๋™์œผ๋กœ ์ •์˜ํ•œ ํ•œ๊ตญ์–ด ๋ถˆ์šฉ์–ด ๋ฆฌ์ŠคํŠธ
korean_stopwords = { '์žˆ์Šต๋‹ˆ๋‹ค', '์žˆ์œผ๋ฉฐ', '๋Œ€ํ•œ',  
    '์˜', '๊ฐ€', '์ด', '์€', '๋“ค', '๋Š”', '์ข€', '์ž˜', '๊ฑ', '๊ณผ', '๋„', '๋ฅผ', '์œผ๋กœ',
    '์ž', '์—', '์™€', 'ํ•œ', 'ํ•˜๋‹ค', '์—์„œ', '๊ฒƒ', '๋ฐ', '์œ„ํ•ด', '๊ทธ', '๋˜๋‹ค'
}

additional_stopwords = {'๊ฐ•์ ', '์•ฝ์ ', '๊ฒฝ์Ÿ์‚ฌ'}  # ๋ถ„์„์— ๋ถˆํ•„์š”ํ•œ ๋‹จ์–ด ์ถ”๊ฐ€
korean_stopwords.update(additional_stopwords)

# ํŒŒ์ผ๋ณ„ ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ ๋ฐ ๋‹จ์–ด ๋นˆ๋„ ๊ณ„์‚ฐ
def process_text(text):
    # ํ…์ŠคํŠธ ์ „์ฒ˜๋ฆฌ: ์†Œ๋ฌธ์žํ™”, ํŠน์ˆ˜ ๋ฌธ์ž ์ œ๊ฑฐ, ๋ถˆ์šฉ์–ด ์ œ๊ฑฐ
    text = text.lower() 
    text = re.sub(r'[^\w\s]', '', text) #ํŠน์ˆ˜ ๋ฌธ์ž ์ œ๊ฑฐ
    words = text.split() # ๊ณต๋ฐฑ์œผ๋กœ ๋‚˜๋ˆ ์คŒ 
    words = [word for word in words if word not in korean_stopwords and len(word) > 1] # ๋ถˆ์šฉ์–ดX, ๊ธธ์ด๊ฐ€ 1๋ณด๋‹ค ํผ 
    return words

# ๋นˆ๋„ ๋ถ„์„ ๊ฒฐ๊ณผ ์ €์žฅ
word_frequencies = []

for file_path in file_paths:
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
        words = process_text(text)
        word_freq = Counter(words) # ๋นˆ๋„ ๋ถ„์„ 
        word_frequencies.append(word_freq) # ์ถ”๊ฐ€ํ•จ 

# ํŒŒ์ผ๋ณ„๋กœ ๊ฐ€์žฅ ์ž์ฃผ ๋“ฑ์žฅํ•œ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด ์ถœ๋ ฅ
for i, freq in enumerate(word_frequencies):
    print(f"\nํŒŒ์ผ {i+1}์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:")
    print(freq.most_common(10))

# ์‹œ๊ฐํ™”: ํŒŒ์ผ๋ณ„ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด ๋นˆ๋„
for i, freq in enumerate(word_frequencies):
    common_words = freq.most_common(10)
    words, counts = zip(*common_words)

    plt.figure(figsize=(10, 5))
    plt.bar(words, counts)
    plt.title(f'ํŒŒ์ผ {i+1} ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด ๋นˆ๋„')
    plt.xticks(rotation=45)
    plt.show()
ํŒŒ์ผ 1์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:
[('์ œํ’ˆ', 11), ('๊ธ€๋กœ๋ฒŒ', 10), ('์‹œ์žฅ', 7), ('๊ธฐ์ˆ ', 5), ('๋ฐ˜๋„์ฒด', 5), ('์‚ผ์„ฑ์ „์ž', 4), ('์Šค๋งˆํŠธํฐ', 4), ('๋””์Šคํ”Œ๋ ˆ์ด', 4), ('๊ฐ•๋ ฅํ•œ', 4), ('a์‚ฌ', 4)]

ํŒŒ์ผ 2์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:
[('๋ฐ˜๋„์ฒด', 11), ('๊ธ€๋กœ๋ฒŒ', 11), ('์‚ฌ์—…', 8), ('๋””์Šคํ”Œ๋ ˆ์ด', 7), ('์Šค๋งˆํŠธํฐ', 7), ('์‚ผ์„ฑ์ „์ž๋Š”', 6), ('๋‹ค์–‘ํ•œ', 6), ('ํŠนํžˆ', 6), ('๊ธฐ์ˆ ', 5), ('๋„คํŠธ์›Œํฌ', 5)]

ํŒŒ์ผ 3์˜ ์ƒ์œ„ 10๊ฐœ ๋‹จ์–ด:
[('๊ธ€๋กœ๋ฒŒ', 12), ('์‚ผ์„ฑ์ „์ž๋Š”', 11), ('๋ฐ˜๋„์ฒด', 10), ('์Šค๋งˆํŠธํฐ', 6), ('์‹œ์žฅ', 6), ('๊ธฐ์ˆ ', 6), ('๋””์Šคํ”Œ๋ ˆ์ด', 5), ('๋ชจ๋ฐ”์ผ', 5), ('๋‹ค์–‘ํ•œ', 5), ('ํŠนํžˆ', 5)]

3-6 [๋ ˆ๋ฒจ์—… ์‹ค์Šต 1] ๋ถˆ์šฉ์–ด ํŒŒ์ผ์„ ๋งŒ๋“ค์–ด์„œ ์ด๋ฅผ ๋ถˆ๋Ÿฌ์™€์„œ ์ตœ์ข…์ ์œผ๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ํ™•์ธํ•œ๋‹ค.

```
์ฝ”๋“œ
```
๋กœ ์˜ฌ๋ฆฌ๊ธฐ

names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
cities = ['New York', 'London', 'Paris']

for idx, i in enumerate(ages):
  print(idx, i)
0 25
1 30
2 35
person_info = list(zip(names, ages, cities))
print(person_info)
[('Alice', 25, 'New York'), ('Bob', 30, 'London'), ('Charlie', 35, 'Paris')]

 


https://ldjwj.github.io/CLASS_PY_LIB_START/PYLIB_03_02_alice_extreme_V11_2411.html

 

unit01_02_alice_extreme_V11_2411

 

ldjwj.github.io

set ์ง‘ํ•ฉ ์ค‘๋ณต ์ œ๊ฑฐ๋จ

## ์ง‘ํ•ฉ ํ™•์ธ
s2 = set([1,2,3,4,5,1,2])
s2
{1, 2, 3, 4, 5}
### ๋ถˆ์šฉ์–ด ๋‹จ์–ด ์ถ”๊ฐ€
x_words = set(STOPWORDS)
x_words.add("said")
x_words

Lexical dispersion is a measure of how frequently a word appears across the parts of a corpus.

Result)


https://openai.com/index/openai-api/

https://platform.openai.com/docs/overview

https://rowan-sail-868.notion.site/ChatGPT-API-1337d480b59380ec918bfe4d6c0f6c41

 

ChatGPT API ๊ธฐ๋ณธ ์‹œ์ž‘ํ•˜๊ธฐ | Notion

1-1 ChatGPT API๋ž€ ๋ฌด์—‡์ธ๊ฐ€์š”?

rowan-sail-868.notion.site

5๋‹ฌ๋Ÿฌ ์ถฉ์ „ 


https://velog.io/@sy508011/%EC%8A%A4%ED%8A%B8%EB%A6%BC%EB%A6%BF-Github%EB%A1%9C-%EC%97%B0%EB%8F%99%ED%95%B4%EC%84%9C-%ED%8E%98%EC%9D%B4%EC%A7%80-Deploy%ED%95%98%EA%B8%B0

 

[์ŠคํŠธ๋ฆผ๋ฆฟ] Github๋กœ ์—ฐ๋™ํ•ด์„œ ํŽ˜์ด์ง€ Deployํ•˜๊ธฐ

Github์™€ ์—ฐ๋™ํ•ด์„œ App ๋ฐฐํฌํ•˜๊ธฐ.

velog.io

728x90
๋ฐ˜์‘ํ˜•
Comments