Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away. This revised edition covers the regular expression flavors used by C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. You’ll learn powerful new tricks, avoid flavor-specific gotchas, and save valuable time with this huge library of practical solutions. Learn regular expressions basics through a detailed tutorial Use code listings to implement regular expressions with your language of choice Understand how regular expressions differ from language to language Handle common user input with recipes for validation and formatting Find and manipulate words, special characters, and lines of text Detect integers, floating-point numbers, and other numerical formats Parse source code and process log files Use regular expressions in URLs, paths, and IP addresses Manipulate HTML, XML, and data exchange formats Discover little-known regular expression tricks and techniques About The Author Jan Goyvaerts runs Just Great Software, where he designs and develops some of the most popular regular expression software. His products include RegexBuddy, the world's only regular expression editor that emulates the peculiarities of 15 regular expression flavors, and PowerGREP, the most feature-rich grep tool for Microsoft Windows. Steve Levithan works at Facebook as a JavaScript engineer. He has enjoyed programming for nearly 15 years, working in Tokyo, Washington D.C., Baghdad, and Silicon Valley. Steven is a leading JavaScript regular expression expert, and has created a variety of open source regular expression tools including RegexPal and the XRegExp library. Table of Contents Chapter 1 Introduction to Regular Expressions Regular Expressions Defined Search and Replace with Regular Expressions Tools for Working with Regular Expressions Chapter 2 Basic Regular Expression Skills Match Literal Text Match Nonprintable Characters Match One of Many Characters Match Any Character Match Something at the Start and/or the End of a Line Match Whole Words Unicode Code Points, Categories, Blocks, and Scripts Match One of Several Alternatives Group and Capture Parts of the Match Match Previously Matched Text Again Capture and Name Parts of the Match Repeat Part of the Regex a Certain Number of Times Choose Minimal or Maximal Repetition Eliminate Needless Backtracking Prevent Runaway Repetition Test for a Match Without Adding It to the Overall Match Match One of Two Alternatives Based on a Condition Add Comments to a Regular Expression Insert Literal Text into the Replacement Text Insert the Regex Match into the Replacement Text Insert Part of the Regex Match into the Replacement Text Insert Match Context into the Replacement Text Chapter 3 Programming with Regular Expressions Programming Languages and Regex Flavors Literal Regular Expressions in Source Code Import the Regular Expression Library Create Regular Expression Objects Set Regular Expression Options Test If a Match Can Be Found Within a Subject String Test Whether a Regex Matches the Subject String Entirely Retrieve the Matched Text Determine the Position and Length of the Match Retrieve Part of the Matched Text Retrieve a List of All Matches Iterate over All Matches Validate Matches in Procedural Code Find a Match Within Another Match Replace All Matches Replace Matches Reusing Parts of the Match Replace Matches with Replacements Generated in Code Replace All Matches Within the Matches of Another Regex Replace All Matches Between the Matches of Another Regex Split a String Split a String, Keeping the Regex Matches Search Line by Line Construct a Parser Chapter 4 Validation and Formatting Validate Email Addresses Validate and Format North American Phone Numbers Validate International Phone Numbers Validate Traditional Date Formats Validate Traditional Date Formats, Excluding Invalid Dates Validate Traditional Time Formats Validate ISO 8601 Dates and Times Limit Input to Alphanumeric Characters Limit the Length of Text Limit the Number of Lines in Text Validate Affirmative Responses Validate Social Security Numbers Validate ISBNs Validate ZIP Codes Validate Canadian Postal Codes Validate U.K. Postcodes Find Addresses with Post Office Boxes Reformat Names From “FirstName LastName” to “LastName, FirstName” Validate Password Complexity Validate Credit Card Numbers European VAT Numbers Chapter 5 Words, Lines, and Special Characters Find a Specific Word Find Any of Multiple Words Find Similar Words Find All Except a Specific Word Find Any Word Not Followed by a Specific Word Find Any Word Not Preceded by a Specific Word Find Words Near Each Other Find Repeated Words Remove Duplicate Lines Match Complete Lines That Contain a Word Match Complete Lines That Do Not Contain a Word Trim Leading and Trailing Whitespace Replace Repeated Whitespace with a Single Space Escape Regular Expression Metacharacters Chapter 6 Numbers Integer Numbers Hexadecimal Numbers Binary Numbers Octal Numbers Decimal Numbers Strip Leading Zeros Numbers Within a Certain Range Hexadecimal Numbers Within a Certain Range Integer Numbers with Separators Floating-Point Numbers Numbers with Thousand Separators Add Thousand Separators to Numbers Roman Numerals Chapter 7 Source Code and Log Files Keywords Identifiers Numeric Constants Operators Single-Line Comments Multiline Comments All Comments Strings Strings with Escapes Regex Literals Here Documents Common Log Format Combined Log Format Broken Links Reported in Web Logs Chapter 8 URLs, Paths, and Internet Addresses Validating URLs Finding URLs Within Full Text Finding Quoted URLs in Full Text Finding URLs with Parentheses in Full Text Turn URLs into Links Validating URNs Validating Generic URLs Extracting the Scheme from a URL Extracting the User from a URL Extracting the Host from a URL Extracting the Port from a URL Extracting the Path from a URL Extracting the Query from a URL Extracting the Fragment from a URL Validating Domain Names Matching IPv4 Addresses Matching IPv6 Addresses Validate Windows Paths Split Windows Paths into Their Parts Extract the Drive Letter from a Windows Path Extract the Server and Share from a UNC Path Extract the Folder from a Windows Path Extract the Filename from a Windows Path Extract the File Extension from a Windows Path Strip Invalid Characters from Filenames Chapter 9 Markup and Data Formats Processing Markup and Data Formats with Regular Expressions Find XML-Style Tags Replace Tags with Remove All XML-Style Tags Except and Match XML Names Convert Plain Text to HTML by Adding
and Tags Decode XML Entities Find a Specific Attribute in XML-Style Tags Add a cellspacing Attribute to
Tags That Do Not Already Include It Remove XML-Style Comments Find Words Within XML-Style Comments Change the Delimiter Used in CSV Files Extract CSV Fields from a Specific Column Match INI Section Headers Match INI Section Blocks Match INI Name-Value Pairs Colophon