일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
- Absolute
- AGI
- ai
- AI agents
- AI engineer
- AI researcher
- ajax
- algorithm
- Algorithms
- aliases
- Array 객체
- ASI
- bayes' theorem
- Bit
- Blur
- BOM
- bootstrap
- canva
- challenges
- ChatGPT
- Today
- In Total
A Joyful AI Research Journey🌳😊
The regex pattern \b\w+\b with examples 본문
The regex pattern \b\w+\b with examples
yjyuwisely 2023. 9. 9. 21:57Let's break down the regex pattern \b\w+\b and explain it with examples.
1. \w
The \w metacharacter matches any word character, which is equivalent to the character set [a-zA-Z0-9_]. This includes:
- Uppercase letters: A to Z
- Lowercase letters: a to z
- Digits: 0 to 9
- Underscore: _
2. \w+
The + is a quantifier that means "one or more" of the preceding character or group. So, \w+ matches one or more word characters. Here are some examples:
- apple: This is matched by \w+ because it contains one or more word characters.
- a: This is also matched by \w+ because it's a single word character.
- a_b: This is matched as well because underscores are part of the word character class.
3. \b
The \b is a word boundary assertion. It does not match any characters but asserts a position where a word character is not followed or preceded by another word character. It's useful to differentiate words from substrings within words.
For instance:
- In the string apple pie, there's a word boundary between e and the space, and another word boundary between the space and p.
- In the string applepie, there's no word boundary between e and p because both are word characters.
Putting it together: \b\w+\b
This pattern ensures that we're matching whole words, and not just substrings inside of words.
Examples:
- In the string apple pie, \b\w+\b matches apple and pie separately.
- In the string applepie, \b\w+\b matches the entire string as a single word.
- In the string an apple, \b\w+\b matches an and apple separately.
- In the string apple-pie, \b\w+\b matches apple and pie separately because the hyphen is not a word character, so there's a word boundary on both sides of it.
Using \b\w+\b ensures that words are matched as distinct tokens, even if they are adjacent to non-word characters like punctuation or spaces.
'🌳AI Projects: NLP🍀✨ > NLP Deep Dive' 카테고리의 다른 글
Computing the Posterior Probability Using Bayes' Theorem (0) | 2023.09.11 |
---|---|
Processing Text Data for Bayesian Inference with Python (0) | 2023.09.11 |
Resolving the "NameError: name 'pd' is not defined" in Python (0) | 2023.09.11 |
Understanding Probability Normalization in Naive Bayes (0) | 2023.09.09 |