REPSTR -- string replace utility for programmers

Copyright (C) 2007 by Laszlo Menczel (menczel at mailbox dot hu)

This is free software distributed under the GNU General Public Licence (GPL) version 2.

REPSTR is distributed in the hope that it will be useful, but with NO WARRANTY expressed or implied.

Summary

REPSTR is a powerful string replace utility for Linux and Win9x/ME/2000/XP. It has many options (those marked by '*' are optional):

replaces hundreds of strings in many files in a single operation
processes files specified by extension(s) or file globbing patterns
uses a single (command line) replacement rule or a rule table file
checks the rule table for recursive (destructive) replacement rules
* processing hidden files
* recursive processing of files in subdirectories
* case insensitive replacement
* whole word replacement (word delimiters can be specified)
* excluding comments (comment delimiters can be specified)
* logging the operations and/or errors
* creating backup files with specified extension
* batch mode (for scripts)

Limitation: string patterns spanning two or more lines cannot be replaced.

Content

1. Introduction

2. Usage

   2.1. Command line options
     -a      -B      -C      -h      -H      -l      -n      -r      -s      -W      -y

   2.2. Optional command line arguments
     -b app
     -c comm-delim
     -d sep
     -u rule-file
     -w del

   2.3. Source arguments
     -e ext-list
     -f src-file
     -g glob-patt

   2.4. Rule arguments
     -t table
     -x rule

3. Syntax limitations

4. Numerical limits

5. Notes

6. Compiling REPSTR

1. Introduction

REPSTR is a small utility to mechanize the replacement of multiple strings in multiple text files. This job occurs quite frequently during program development. Maybe you want to use the code of someone else, but you are not satisfied with the names of variables and/or functions in the code. Or you write a huge library and later realize that the names conflict with those in an other library you must also use.

Manual replacement is tedious and a lot of errors can occur. When you have only a couple of strings to replace, a quick bash script using SED will be sufficient. However, when you have to replace let's say 200 strings in 100 files, it becomes cumbersome. This is why I have written REPSTR. BTW, I did not want to reinvent the wheel: I searched the net for a suitable freeware string replace utility, but none of them has all the functionality I consider essential.

You should save pairs of strings (the string to replace and the replacement string, separated by whitespaces) into a text file. REPSTR will read this file, sort the string pairs according to the length of the string to replace (descending order), checks the list for recursive replacements then proceeds to replace in the specified source files all occurences in a line-by-line fashion.

Note: The above is the default behaviour which may be modified by specifying options on the command line.

2. Usage


  repstr [OPTIONS] [OPT_ARGS] RULE_ARG SOURCE_ARG

Arguments shown between brackets are optional. Arguments may be specified in any convenient order (unless otherwise noted).

2.1. Command line options

-a

Ignore case when matching strings to be replaced.

-B

Replacement will be done in batch mode (no screen output).

-C

Strings occuring within C-style comments will *not* be replaced. Any text between '/*' and '*/', or any text following '//' up to the end of line is considered to be a comment. See also '-c comm-delim' below.

-h
--help

Displays a short help text and quits. '-h' should be the first argument specified. The rest of the command line is ignored.

-H

REPSTR will process hidden files matching the search pattern specified. By default hidden files are not processed. If you specify a single file, this switch has no effect.

-l

(This is a lower case 'L'.) REPSTR will write its messages to a logfile called 'repstr.log' (in the current directory).

-n

REPSTR will use the rule table w/o sorting. The default behaviour is that strings in the rule table are sorted into descending order according to length (see Note 4 in Section 5).

-r

REPSTR will not check for recursive replacement patterns before performing its operations. See notes in Section 3.

-s

REPSTR will recursively processe files in subdirectories. The default is that only the files in the current directory are processed.

-W

REPSTR will perform whole word replacement using the default set of word delimiter characters, i.e. whitespace and newline. Substrings within longer words will *not* be replaced. See also '-w del' below.

-y

Allow empty replacement strings in the rules (results in the removal of the single string specified as rule). If you do not specify this switch, empty strings in the rules will generate error messages and the program aborts.

2.2. Optional command line arguments

WARNING: If any argument string contains character(s) which have a special meaning for the shell (*, ?, $, etc.), you should double-quote the string. Beware!

-b app

Create backup files by appending the string 'app' to the names of the original files.

-c comm-delim

'comm-delim' should be a comma-separated list of strings which define your own comment delimiter sequences. The format of the string is one of the following:


  single-delim
  start-delim,end-delim
  single-delim,start-delim,end-delim

where 'single-delim' is a delimiter which is effective to the end of line (like '//' in C), 'start-delim' and 'end-delim' are the start and end delimiters of (possibly) multi-line comments (like '/*' and '*/' in C).

The maximum length of comment delimiter strings is 4. 'start-delim' and 'end-delim' may be the same, but both should be different from 'single-delim'.

-d sep

'sep' is the character separating the two strings in the rule file. Without this option the default separator (any number of spaces and tabs) is used.

-u rule-file
--check-rules rule-file

REPSTR will check the rules listed in 'rule-file' and reports (in the log file) any recursive replacement rules detected. Returns zero if the rules are OK (not recursive), non-zero otherwise. '-u' and the name of the rule table file must be the first two arguments on the command line, any other arguments are ignored.

-w delim

If this argument is present, the old strings will be replaced by the new ones only if a whole word is matched (i.e. substrings matched in longer words will *not* be replaced). Words are assumed to be delimited by the characters listed in 'delim'. If the option '-W' is also specified, whitespaces are considered to be delimiters as well.

2.3. Source arguments

These arguments specify the source file(s) to process. Use only one of them!

-e ext-list

'ext-list' is a comma-separated list of filename extensions to use in the search for the files to process. For example 'c,h,txt'

-f src-file

This specifies the name of a single file to process. It may contain an absolute or a relative (to the current directory) path.

-g glob-patt

'glob-patt' is a comma separated list of search patterns for filename globbing. The patterns should use the normal wildcard conventions. For example '*.c,foo_?.h'. File search is started in the current working directory.

2.4. Rule arguments

These arguments specify replacement rule(s). Use only one of them!

-t table

'table' is the file containing replacement rules (one rule per line). Each rule consists of a pair of strings separated by whitespaces (default) or by the character specified in the optional argument '-d del'. Instances of the first string are replaced by the second.

-x rule

A single string repacement rule in the form "old,new". The limitation in this case is that you cannot have whitespace and comma in the rule strings.

3. Syntax limitations

Some option and argument combinations are not allowed (REPSTR aborts with an error message if any of these combinations occurs):


  '-t table' and '-x rule' together
  '-f file', '-e ext-list' and '-g glob-patt' together in any combination
  '-C' and '-c delim' together

Some options are ignored when specified in a certain combination:


  '-H' and '-s' are ignored if only a single file is specified
  '-d' and '-r' are ignored if '-x rule' is specified

Some combinations of rules and options/argument values may yield unexpected results. For example, if you specify '-w xyz' and some rule strings contain 'x', 'y' and/or 'z', the program will probably make a mess of your text. Try to avoid such situations.

4. Numerical limits

* strings in rules must be at least 2 characters long
* strings in rules must not be longer than 63 characters
* lines in the source files must not be longer than 255 characters
* filenames must not be longer than 255 characters
* absolute pathes must not be longer than 512 characters
* maximum 8 file search patterns may be specified
* maximum 64 characters may be specified as word delimiters
* comment delimiter strings must not be longer than 4 characters

5. Notes

Note 1.

Messages and statistics are written to the logfile 'repstr.log'. In normal mode (no '-B' switch) messages are also displayed on the screen.

Note 2.

Replacement is carried out according to the length of the strings to replace, i.e. longest first (unless you specify the '-n' switch which supresses rule sorting). This avoids the destruction of a string pattern by a previous replacement. Consider the following replacement rules:


  foo bar
  foobar whatever

If we first replace 'foo' with 'bar', the string 'foobar' is converted to 'barbar' and the second replacement will fail. If we proceed from longest string to shortest, there is no such problem. The problem may also be solved by specifying '-w del' or '-W' to match only whole words.

Note 3.

The program checks for recursive replacement rules like this:


  foobar whatever
  what nothing

The problem here is that after 'foobar' is replaced by 'whatever', a new instance of the 'what' pattern is created and it will be later replaced by 'nothing'. So finally the string 'foobar' is converted to 'nothingever' which is probably not what you want. Recursive replacement is considered a fatal error and the program is aborted (unless you specify the '-r' option on the command line).

See also Note 4 below.

Note 4.

Sorting the string replacement table can be supressed by the command line switch '-n'. This may be useful in cases where correct replacement depends on the order of rules in the table. For example, using the following rules w/o sorting eliminates the recursive replacement problem outlined in Note 3 above:


  foobar      whatever
  whatever    ###
  what        nothing
  ###         whatever
Note 5.

String replacement proceeds line by line. For this reason, patterns spanning two or more lines cannot be replaced by REPSTR. This is IMO not a serious limitation in programming projects.

Note 6.

If you specify delimiters for both word matching and comment skipping, the latter takes precedence.

6. Compiling REPSTR

The program currently can be compiled and run under Win9x/ME/2000/XP and Linux. For the Win32 build you will need the MinGW development system.

The program depends on the library 'libmutil.a'. This is a library of useful utilities written by me during the last ten years. Among other things it contains a much better 'atof' and 'atoi' (both functions have proper error handling and reporting, which is missing from the primitive Glibc equivalents), several string and file system utilities, command line parsing, reading config files, an easy to use filename globbing module, etc. Check it out, you may find it quite useful [archive] [manual]. I have included copies of 'libmutil.a' in the build directories and the 'mutil.h' header in the source directory for building REPSTR.

To build the program, change to the appropriate build directory ('build-mingw' or build-linux') and run the make script 'mk(.bat)'. Under Win32 you will have to create an environmental variable called 'MINGW' and set it to the install path of your MinGW system (e.g. 'c:\mingw').

I suggest that any time you change/recompile the program, you should first copy the REPSTR binary to the appropriate 'bin/xxx' directory by running the script 'save(.bat)'. Then you should run the test scripts 'test01(.bat)' to 'test13(.bat)' included in the 'bin/linux' and 'bin/win32' directories. The input file is 'base.txt' (in the directory 'testdata'), it contains a number of patterns enclosed between '|' characters. The rule table files for the tests are also located in 'testdata'. The processed files are created in the 'bin/linux' or 'bin/win32' directory. In these files some or all of the '|xxx|' patterns (depending on the options used) are replaced by the same patterns with '#' added at both ends (e.g. '|curs|' becomes '|#curs#|').

Bug reports, questions and feature requests should be sent to the e-mail address specified at the start of this document.

Final note: Back up your files before you run REPSTR on them! I have made an effort to keep the program (relatively) bug-free, but nobody is perfect. I hope that it performs as intended and will be useful, but remember that this is GPLed software with no warranty (but no price tag either :-)

Enjoy!

TODO


- Ensure that glob patterns do not contain a path.

- Add more checking for nonsense or confusing option/rule combinations.

- Allow the use of a user-specified escape character in explicit rules
  so that spaces and commas can be included as well.