Cullen Programming

JAVA Scan and Change Facility


Users Guide

JSCAN is a fast search and change facility that lets you scan files for a particular string of characters or match agains a Regular Expression. Optionally each match can be automatically replaced with a new string of characters.  Each line that contains a found or replaced string is logged with the filename, line number and the contents of the matching text.  And the files that contained matched strings can be viewed using the supplied fullscreen editor immediately from the display panel.

JScan is in 100% pure JAVA. It will operate on Windows, Linux, UNIX or any platform that supports the JAVA Runtime Environment (JRE).

The Command Interface

The graphical user command interface allows you to:

Specify search criteria and files to be scanned.
Apply Regular Expressions for both filename and text-within-file searching.
Start and Stop buttons to control the scan execution.
Provides a logging area to which all results are posted.
Shows progress of file scans.
Perform recursive subdirectory searches.
Optionally can skip to the next file after a first hit is recorded.
Optionally you can skip over non-text files thereby reducing scan time.
Option for CASE-INSENSITIVE scan.
Can search for string matches of full or partial filenames.
You can edit a matched file while the searching continues.
Its multi-threaded design collects candidate files and searches concurrently.

Specify Search Directory Pathname
This is the part of the directory tree that you wish to search.  The top-level folder from which you want the search to start scanning.
 
Symbolic links are ignored to prevent recursive looping.

If you leave the filename blank then all files with any number of qualifiers will match.
 

The search pattern must end in a string pattern
or if a directory then end with a "/" . 

/home/gjcullen/mydirectory/
/home/
/home/gjcullen/
/
/home/gjcullen/mydirectory/nextfolder/subfolder/nextsubfolder/  etc


Specify Search FileName or Regular Expression Pattern
This is the pattern that will be applied against each file in the directories searched to find matches.

Refer to the Regular Expressions below.

If you leave the filename blank then all files with any number of qualifiers will match.

.html
.txt
MyCode.*
[ABC]*.txt
N{2}.java

Case-Insensitive Checkbox
This will cause the search to ignore the alphabetic case of the argument.
Upper or Lower case characters will match.

Text-Only Checkbox
Search will skip non-text files such as graphics, music, video, etc.

First-Hit Checkbox
This will cause the search to continue with the next file after a first occurence match in the current file being scanned.


Hidden Files Checkbox

This will cause a search including "hidden" files.  The default is to exclude the hidden files.

Subdirectory Checkbox
This will cause a recursive search of all subdirectories under the current directory for the pattern.

Just FILENAME Checkbox
This will compare against filenames that contain the matched string in their names. Results will contain a list of full pathnames of files that match.

Text String or Regular Expression  search argument
This combination list box contains the search argument. The string of data that you wish to scan for in each of the target files. The string can contain LEADING, EMBEDDED or TRAILING spaces.
The search argument is CASE-SENSITIVE and SPACE-SENSITIVE. 

You can also search for Hexidecimal String (argument must start with X and all letters must be capitalized).

abc123
Geoffrey J. Cullen
#pqrs.7
XFED2

This field may contain a Clear Text Pattern or  a Regular Expression to be applied to each line of the file.  

Such as:    Cullen       or     C....n    or   C.*n

(See below for a discussion of Regular Expression constructs)

Consult most JAVA or PERL syntax for regular expressions options and examples.

The combo box will hold up to 100 of your prior search arguments used during the session. Enter a new string or select a previously used string.

ChangeTo  String
This is a string of clear text data, or a Regular Expression, that you supply in the event that you desire to change all matching strings in the target files to a new string. The new string can be a differing length.  This option is disabled when "Just FILENAMES" checkbox is marked.

If a Regular Expression is used then the resulting text match is used as the text pattern to be replaced by the changeto text string. 

Hex Strings can also be matched and changed.  Use a leading X followed immediately by a valid hex argument.


Example.

xyq456                    abc123
John J. Cullen         Mary Todd
RE: <a.*>                <p class="narrative">
XFE2C                    XAD12

Results Panel
This is the area where all matching files appear. This will include the file name, its pathname, the linenumber of the occurence in the file and the text of the line itself showing the match.

Messages Area
The file currently being scanned will appear here. When completed this area will show the elapsed scan time. 
The number of files scanned.
The number of Files that had matches.
The total number of matches.

Edit pulldown
Used to Page UP/Page Down, Select All or Copy/Paste Clipboard functions.

View pulldown
Choose any occurence in the results panel by clicking on the line. Then select EDIT in the View pulldown to edit the file in which the occurence appears.
Or just double-click on a result line and the file from which it is found will be edited.
The editor will function both during and after scan execution. 

Text Editing
Select any match using your left mouse button to highlight the line.  Then double-click and the text file will be brought into a text editor for full view.  The matched line will be pointed to by "prefix arrows" in the editor window.  You can edit files while the scan continues to run.

Performance
The speed of the scanner will be a function of the number of files scanned and the number of hits on each line of the files scanned.  The more hits the greater the total scan time.

Summary of regular-expression constructs

Construct Matches
 
Characters
x The character x
\\ The backslash character
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh The character with hexadecimal value 0xhh
\uhhhh The character with hexadecimal value 0xhhhh
\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')
\f The form-feed character ('\u000C')
\a The alert (bell) character ('\u0007')
\e The escape character ('\u001B')
\cx The control character corresponding to x
 
Character classes
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction)
 
Predefined character classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
 
POSIX character classes (US-ASCII only)
\p{Lower} A lower-case alphabetic character: [a-z]
\p{Upper} An upper-case alphabetic character:[A-Z]
\p{ASCII} All ASCII:[\x00-\x7F]
\p{Alpha} An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit} A decimal digit: [0-9]
\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph} A visible character: [\p{Alnum}\p{Punct}]
\p{Print} A printable character: [\p{Graph}\x20]
\p{Blank} A space or a tab: [ \t]
\p{Cntrl} A control character: [\x00-\x1F\x7F]
\p{XDigit} A hexadecimal digit: [0-9a-fA-F]
\p{Space} A whitespace character: [ \t\n\x0B\f\r]
 
java.lang.Character classes (simple java character type)
\p{javaLowerCase} Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace} Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored} Equivalent to java.lang.Character.isMirrored()
 
Classes for Unicode blocks and categories
\p{InGreek} A character in the Greek block (simple block)
\p{Lu} An uppercase letter (simple category)
\p{Sc} A currency symbol
\P{InGreek} Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]]  Any letter except an uppercase letter (subtraction)
 
Boundary matchers
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
 
Greedy quantifiers
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
 
Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
 
Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
 
Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group
 
Back references
\n Whatever the nth capturing group matched
 
Quotation
\ Nothing, but quotes the following character
\Q Nothing, but quotes all characters until \E
\E Nothing, but ends quoting started by \Q
 
Special constructs (non-capturing)
(?:X) X, as a non-capturing group
(?idmsux-idmsux)  Nothing, but turns match flags on - off
(?idmsux-idmsux:X)   X, as a non-capturing group with the given flags on - off
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group

Backslashes, escapes, and quoting

The backslash character ('\') serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \\ matches a single backslash and \{ matches a left brace.

It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

Backslashes within string literals in Java source code are interpreted as required by the Java Language Specification as either Unicode escapes or other character escapes. It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expressions,while "\\b" matches a word boundary. The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.

Character Classes

Character classes may appear within other character classes, and may be composed by the union operator (implicit) and the intersection operator (&&). The union operator denotes a class that contains every character that is in at least one of its operand classes. The intersection operator denotes a class that contains every character that is in both of its operand classes.

The precedence of character-class operators is as follows, from highest to lowest:

1     Literal escape     \x
2     Grouping [...]
3     Range a-z
4     Union [a-e][i-u]
5     Intersection [a-z&&[aeiou]]

Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.

Line terminators

A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:



[Return to Index]


[Return to Cullen Programming Home Page]


Cullen Programming logo