|
JSCAN
is a fast search and change facility
that lets you scan files for a particular string of characters or match
agains a Regular Expression.
Optionally each match can be automatically replaced with a new string
of characters.
Each line that contains a found or replaced string is logged with the
filename, line number and the contents of the matching text.
And the files that contained matched strings can be viewed using the
supplied
fullscreen editor immediately from the display panel.
JScan is in 100% pure JAVA. It will operate on Windows, Linux, UNIX or
any
platform that supports the JAVA Runtime Environment (JRE).
The
Command Interface
The
graphical user command interface allows you to:
Specify
search criteria and files to
be scanned.
Apply Regular Expressions for both filename and text-within-file
searching.
Start and Stop buttons to control the scan execution.
Provides a logging area to which all results are posted.
Shows progress of file scans.
Perform recursive subdirectory searches.
Optionally can skip to the next file after a first hit is recorded.
Optionally you can skip over non-text files thereby reducing scan time. Option for CASE-INSENSITIVE scan.
Can search for string matches of full or partial filenames.
You can edit a matched file while the searching continues.
Its multi-threaded design collects candidate files and searches
concurrently.
Specify
Search Directory Pathname
This
is the part of the directory tree that you wish to search.
The top-level folder from which you want the search to start
scanning.
Symbolic links are ignored to
prevent recursive looping.
If
you leave the filename blank then all files with any number of
qualifiers will match.
The search pattern must end in a string pattern
or if a directory then end with a "/" .
/home/gjcullen/mydirectory/
/home/
/home/gjcullen/
/
/home/gjcullen/mydirectory/nextfolder/subfolder/nextsubfolder/
etc
Specify
Search FileName or Regular Expression Pattern
This
is the pattern that will be applied against each file in the
directories searched to find matches.
Refer to the Regular
Expressions below.
If
you leave the filename blank then all files with any number of
qualifiers will match.
.html
.txt
MyCode.*
[ABC]*.txt
N{2}.java
Case-Insensitive Checkbox This will cause the search to ignore the alphabetic case of the argument. Upper or Lower case characters will match. Text-Only Checkbox Search will skip non-text files such as graphics, music, video, etc.
First-Hit
Checkbox
This will cause the search to continue with the next file after a first
occurence match in the current file being scanned.
Hidden Files
Checkbox
This will cause a search including "hidden" files. The
default is to exclude the hidden files.
Subdirectory
Checkbox
This will cause a recursive search of all subdirectories under the
current directory for the pattern.
Just
FILENAME Checkbox
This
will compare against filenames that contain the matched string in
their names.
Results will contain a list of full pathnames of files that match.
Text
String or Regular Expression search argument
This
combination list box contains the search argument. The string
of
data that you wish to scan
for in each
of the target files. The string can contain LEADING, EMBEDDED or
TRAILING spaces.
The search argument is CASE-SENSITIVE and SPACE-SENSITIVE. You can also search for Hexidecimal String (argument must start with X and all letters must be capitalized).
abc123
Geoffrey J. Cullen
#pqrs.7 XFED2
This field may
contain a Clear Text Pattern or a Regular
Expression to be applied to each line of the file.
Such
as: Cullen
or C....n
or
C.*n
(See below for a discussion of
Regular Expression constructs)
Consult most JAVA or PERL
syntax for regular expressions options and examples.
The combo box will hold up to
100 of your
prior search arguments used during the session. Enter a new string
or select a previously used string.
ChangeTo String
This
is a string of clear text data, or a Regular Expression, that you
supply in the event that you desire to
change all matching
strings in the target files to a new string. The new string can be a
differing length.
This option is disabled when "Just FILENAMES" checkbox is
marked.
If a Regular Expression is used
then the
resulting text match is used as the text pattern to be replaced by the
changeto text string. Hex Strings can also be matched and changed. Use a leading X followed immediately by a valid hex argument.
Example.
xyq456
abc123
John J. Cullen
Mary Todd
RE: <a.*>
<p
class="narrative">
XFE2C XAD12
Results
Panel
This
is the area where all matching files appear. This will include the
file name,
its pathname, the linenumber of the occurence in the file and the text
of the line
itself showing the match.
Messages
Area
The
file currently being scanned will appear here.
When completed this area will show the elapsed scan time.
The number of files scanned.
The number of Files that had matches.
The total number of matches.
Edit
pulldown
Used
to Page UP/Page Down, Select All or Copy/Paste Clipboard
functions.
View
pulldown
Choose
any occurence in the results panel by clicking on the line. Then
select
EDIT in the View pulldown to edit the file in which the occurence
appears.
Or just double-click on a result line and the file from which it is
found will be edited.
The editor will function both during and after scan execution.
Text
Editing
Select any match using your left mouse button to highlight the line.
Then double-click and the text file will be brought into a
text editor for full view. The matched line will be pointed
to by "prefix arrows" in the editor window. You can edit
files while the scan continues to run.
Performance
The speed of the scanner will be a function of the number of files
scanned and the number of hits on each line of the files scanned.
The more hits the greater the total scan time.
Construct |
Matches |
|
Characters |
x |
The character x |
\\ |
The backslash character |
\0n |
The character with octal
value 0n (0 <= n <= 7) |
\0nn |
The character with octal
value 0nn (0 <= n <= 7) |
\0mnn |
The character with octal
value 0mnn (0 <= m <= 3,
0 <= n <= 7) |
\xhh |
The character with
hexadecimal value 0xhh |
\uhhhh |
The character with
hexadecimal value 0xhhhh |
\t |
The tab character ('\u0009') |
\n |
The newline (line feed)
character ('\u000A') |
\r |
The carriage-return
character ('\u000D') |
\f |
The form-feed character ('\u000C') |
\a |
The alert (bell)
character ('\u0007') |
\e |
The escape character ('\u001B') |
\cx |
The control character
corresponding to x |
|
Character
classes |
[abc] |
a, b,
or c (simple class) |
[^abc] |
Any character except a,
b, or c
(negation) |
[a-zA-Z] |
a
through z or A through Z,
inclusive (range) |
[a-d[m-p]] |
a
through d, or m through p:
[a-dm-p] (union) |
[a-z&&[def]] |
d, e,
or f (intersection) |
[a-z&&[^bc]] |
a
through z, except for b and
c: [ad-z]
(subtraction) |
[a-z&&[^m-p]] |
a
through z, and not m
through p: [a-lq-z](subtraction) |
|
Predefined
character classes |
. |
Any character (may or
may not match line
terminators) |
\d |
A digit: [0-9] |
\D |
A non-digit: [^0-9] |
\s |
A whitespace character: [
\t\n\x0B\f\r] |
\S |
A non-whitespace
character: [^\s] |
\w |
A word character: [a-zA-Z_0-9] |
\W |
A non-word character: [^\w] |
|
POSIX
character classes (US-ASCII only) |
\p{Lower} |
A lower-case alphabetic
character: [a-z] |
\p{Upper} |
An upper-case alphabetic
character:[A-Z] |
\p{ASCII} |
All ASCII:[\x00-\x7F] |
\p{Alpha} |
An alphabetic character:[\p{Lower}\p{Upper}] |
\p{Digit} |
A decimal digit: [0-9] |
\p{Alnum} |
An alphanumeric
character:[\p{Alpha}\p{Digit}] |
\p{Punct} |
Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ |
\p{Graph} |
A visible character: [\p{Alnum}\p{Punct}] |
\p{Print} |
A printable character: [\p{Graph}\x20] |
\p{Blank} |
A space or a tab: [
\t] |
\p{Cntrl} |
A control character: [\x00-\x1F\x7F] |
\p{XDigit} |
A hexadecimal digit: [0-9a-fA-F] |
\p{Space} |
A whitespace character: [
\t\n\x0B\f\r] |
|
java.lang.Character classes
(simple java
character type) |
\p{javaLowerCase} |
Equivalent to java.lang.Character.isLowerCase() |
\p{javaUpperCase} |
Equivalent to java.lang.Character.isUpperCase() |
\p{javaWhitespace} |
Equivalent to java.lang.Character.isWhitespace() |
\p{javaMirrored} |
Equivalent to java.lang.Character.isMirrored() |
|
Classes
for Unicode blocks and categories |
\p{InGreek} |
A character in the
Greek block (simple block) |
\p{Lu} |
An uppercase letter
(simple category) |
\p{Sc} |
A currency symbol |
\P{InGreek} |
Any character except one
in the Greek block (negation) |
[\p{L}&&[^\p{Lu}]] |
Any letter except an
uppercase letter (subtraction) |
|
Boundary
matchers |
^ |
The beginning of a line |
$ |
The end of a line |
\b |
A word boundary |
\B |
A non-word boundary |
\A |
The beginning of the
input |
\G |
The end of the previous
match |
\Z |
The end of the input but
for the final terminator,
if any |
\z |
The end of the input |
|
Greedy
quantifiers |
X? |
X,
once or not at all |
X* |
X,
zero or more times |
X+ |
X,
one or more times |
X{n} |
X,
exactly n times |
X{n,} |
X, at
least n times |
X{n,m} |
X, at
least n but not more than m
times |
|
Reluctant
quantifiers |
X?? |
X,
once or not at all |
X*? |
X,
zero or more times |
X+? |
X,
one or more times |
X{n}? |
X,
exactly n times |
X{n,}? |
X, at
least n times |
X{n,m}? |
X, at
least n but not more than m
times |
|
Possessive
quantifiers |
X?+ |
X,
once or not at all |
X*+ |
X,
zero or more times |
X++ |
X,
one or more times |
X{n}+ |
X,
exactly n times |
X{n,}+ |
X, at
least n times |
X{n,m}+ |
X, at
least n but not more than m
times |
|
Logical
operators |
XY |
X
followed by Y |
X|Y |
Either X
or Y |
(X) |
X, as a capturing
group |
|
Back
references |
\n |
Whatever
the nth capturing
group matched |
|
Quotation |
\ |
Nothing, but quotes the
following character |
\Q |
Nothing, but quotes all
characters until \E |
\E |
Nothing, but ends
quoting started by \Q |
|
Special
constructs (non-capturing) |
(?:X) |
X, as
a non-capturing group |
(?idmsux-idmsux) |
Nothing, but turns match
flags on - off |
(?idmsux-idmsux:X) |
X, as
a non-capturing
group with the given flags on - off |
(?=X) |
X,
via zero-width positive lookahead |
(?!X) |
X,
via zero-width negative lookahead |
(?<=X) |
X,
via zero-width positive lookbehind |
(?<!X) |
X,
via zero-width negative lookbehind |
(?>X) |
X, as
an independent, non-capturing group |
The backslash character ('\')
serves to introduce escaped constructs, as defined in the table above,
as well as to quote characters that otherwise would be interpreted as
unescaped constructs. Thus the expression \\
matches a single backslash and \{ matches a left
brace.
It is an error to use a
backslash prior to any alphabetic character that does not denote an
escaped construct; these are reserved for future extensions to the
regular-expression language. A backslash may be used prior to a
non-alphabetic character regardless of whether that character is part
of an unescaped construct.
Backslashes within string
literals in Java source code are interpreted as required by the Java
Language Specification as either Unicode
escapes or other character
escapes. It is therefore necessary to double backslashes in
string literals that represent regular expressions to protect them from
interpretation by the Java bytecode compiler. The string literal "\b",
for example, matches a single backspace character when interpreted as a
regular expressions,while "\\b" matches a word
boundary. The string literal "\(hello\)" is
illegal and leads to a compile-time error; in order to match the string
(hello) the string literal "\\(hello\\)"
must be used.
Character classes may appear
within other character classes, and may be composed by the union
operator (implicit) and the intersection operator (&&).
The union operator denotes a class that contains every character that
is in at least one of its operand classes. The intersection operator
denotes a class that contains every character that is in both of its
operand classes.
The precedence of
character-class operators is as follows, from highest to lowest:
1 |
Literal
escape |
\x |
2 |
Grouping |
[...] |
3 |
Range |
a-z |
4 |
Union |
[a-e][i-u] |
5 |
Intersection |
[a-z&&[aeiou]] |
Note that a different set of
metacharacters are in effect inside a character class than outside a
character class. For instance, the regular expression .
loses its special meaning inside a character class, while the
expression - becomes a range forming
metacharacter.
A line terminator
is a one- or two-character sequence that marks the end of a line of the
input character sequence. The following are recognized as line
terminators:
[Return
to Index]
[Return to Cullen Programming Home Page]
|
Copyright © Cullen Programming 1987, 2016
All Rights Reserved
|
|