REPSTR is a small utility to mechanize the replacement of multiple strings in multiple text files. This job occurs quite frequently during program development. Maybe you want to use the code of someone else, but you are not satisfied with the names of variables and/or functions in the code. Or you write a huge library and later realize that the names conflict with those in an other library you must also use.
Manual replacement is tedious and a lot of errors can occur. When you have only a couple of strings to replace, a quick bash script using SED will be sufficient. However, when you have to replace let's say 200 strings in 100 files, it becomes cumbersome. This is why I have written REPSTR. BTW, I did not want to reinvent the wheel: I searched the net for a suitable freeware string replace utility, but none of them has all the functionality I consider essential.
You should save pairs of strings (the string to replace and the replacement string, separated by whitespaces) into a text file. REPSTR will read this file, sort the string pairs according to the length of the string to replace (descending order), checks the list for recursive replacements then proceeds to replace in the specified source files all occurences in a line-by-line fashion.
Note: The above is the default behaviour which may be modified by specifying options on the command line.
repstr [OPTIONS] [OPT_ARGS] RULE_ARG SOURCE_ARG
Arguments shown between brackets are optional. Arguments may be specified in any convenient order (unless otherwise noted).
Ignore case when matching strings to be replaced.
-BReplacement will be done in batch mode (no screen output).
-CStrings occuring within C-style comments will *not* be replaced. Any text between '/*' and '*/', or any text following '//' up to the end of line is considered to be a comment. See also '-c comm-delim' below.
-hDisplays a short help text and quits. '-h' should be the first argument specified. The rest of the command line is ignored.
-HREPSTR will process hidden files matching the search pattern specified. By default hidden files are not processed. If you specify a single file, this switch has no effect.
-l(This is a lower case 'L'.) REPSTR will write its messages to a logfile called 'repstr.log' (in the current directory).
-nREPSTR will use the rule table w/o sorting. The default behaviour is that strings in the rule table are sorted into descending order according to length (see Note 4 in Section 5).
-rREPSTR will not check for recursive replacement patterns before performing its operations. See notes in Section 3.
-sREPSTR will recursively processe files in subdirectories. The default is that only the files in the current directory are processed.
-WREPSTR will perform whole word replacement using the default set of word delimiter characters, i.e. whitespace and newline. Substrings within longer words will *not* be replaced. See also '-w del' below.
-yAllow empty replacement strings in the rules (results in the removal of the single string specified as rule). If you do not specify this switch, empty strings in the rules will generate error messages and the program aborts.
WARNING: If any argument string contains character(s) which have a special meaning for the shell (*, ?, $, etc.), you should double-quote the string. Beware!
-b appCreate backup files by appending the string 'app' to the names of the original files.
-c comm-delim'comm-delim' should be a comma-separated list of strings which define your own comment delimiter sequences. The format of the string is one of the following:
single-delim start-delim,end-delim single-delim,start-delim,end-delim
where 'single-delim' is a delimiter which is effective to the end of line (like '//' in C), 'start-delim' and 'end-delim' are the start and end delimiters of (possibly) multi-line comments (like '/*' and '*/' in C).
The maximum length of comment delimiter strings is 4. 'start-delim' and 'end-delim' may be the same, but both should be different from 'single-delim'.
-d sep'sep' is the character separating the two strings in the rule file. Without this option the default separator (any number of spaces and tabs) is used.
-u rule-fileREPSTR will check the rules listed in 'rule-file' and reports (in the log file) any recursive replacement rules detected. Returns zero if the rules are OK (not recursive), non-zero otherwise. '-u' and the name of the rule table file must be the first two arguments on the command line, any other arguments are ignored.
-w delimIf this argument is present, the old strings will be replaced by the new ones only if a whole word is matched (i.e. substrings matched in longer words will *not* be replaced). Words are assumed to be delimited by the characters listed in 'delim'. If the option '-W' is also specified, whitespaces are considered to be delimiters as well.
These arguments specify the source file(s) to process. Use only one of them!
-e ext-list'ext-list' is a comma-separated list of filename extensions to use in the search for the files to process. For example 'c,h,txt'
-f src-fileThis specifies the name of a single file to process. It may contain an absolute or a relative (to the current directory) path.
-g glob-patt'glob-patt' is a comma separated list of search patterns for filename globbing. The patterns should use the normal wildcard conventions. For example '*.c,foo_?.h'. File search is started in the current working directory.
These arguments specify replacement rule(s). Use only one of them!
-t table'table' is the file containing replacement rules (one rule per line). Each rule consists of a pair of strings separated by whitespaces (default) or by the character specified in the optional argument '-d del'. Instances of the first string are replaced by the second.
-x ruleA single string repacement rule in the form "old,new". The limitation in this case is that you cannot have whitespace and comma in the rule strings.
Some option and argument combinations are not allowed (REPSTR aborts with an error message if any of these combinations occurs):
'-t table' and '-x rule' together '-f file', '-e ext-list' and '-g glob-patt' together in any combination '-C' and '-c delim' together
Some options are ignored when specified in a certain combination:
'-H' and '-s' are ignored if only a single file is specified '-d' and '-r' are ignored if '-x rule' is specified
Some combinations of rules and options/argument values may yield unexpected results. For example, if you specify '-w xyz' and some rule strings contain 'x', 'y' and/or 'z', the program will probably make a mess of your text. Try to avoid such situations.
Messages and statistics are written to the logfile 'repstr.log'. In normal mode (no '-B' switch) messages are also displayed on the screen.
Note 2.Replacement is carried out according to the length of the strings to replace, i.e. longest first (unless you specify the '-n' switch which supresses rule sorting). This avoids the destruction of a string pattern by a previous replacement. Consider the following replacement rules:
foo bar foobar whatever
If we first replace 'foo' with 'bar', the string 'foobar' is converted to 'barbar' and the second replacement will fail. If we proceed from longest string to shortest, there is no such problem. The problem may also be solved by specifying '-w del' or '-W' to match only whole words.
Note 3.The program checks for recursive replacement rules like this:
foobar whatever what nothing
The problem here is that after 'foobar' is replaced by 'whatever', a new instance of the 'what' pattern is created and it will be later replaced by 'nothing'. So finally the string 'foobar' is converted to 'nothingever' which is probably not what you want. Recursive replacement is considered a fatal error and the program is aborted (unless you specify the '-r' option on the command line).
See also Note 4 below.
Note 4.Sorting the string replacement table can be supressed by the command line switch '-n'. This may be useful in cases where correct replacement depends on the order of rules in the table. For example, using the following rules w/o sorting eliminates the recursive replacement problem outlined in Note 3 above:
foobar whatever whatever ### what nothing ### whateverNote 5.
String replacement proceeds line by line. For this reason, patterns spanning two or more lines cannot be replaced by REPSTR. This is IMO not a serious limitation in programming projects.
Note 6.If you specify delimiters for both word matching and comment skipping, the latter takes precedence.
The program currently can be compiled and run under Win9x/ME/2000/XP and Linux. For the Win32 build you will need the MinGW development system.
The program depends on the library 'libmutil.a'. This is a library of useful utilities written by me during the last ten years. Among other things it contains a much better 'atof' and 'atoi' (both functions have proper error handling and reporting, which is missing from the primitive Glibc equivalents), several string and file system utilities, command line parsing, reading config files, an easy to use filename globbing module, etc. Check it out, you may find it quite useful [archive] [manual]. I have included copies of 'libmutil.a' in the build directories and the 'mutil.h' header in the source directory for building REPSTR.
To build the program, change to the appropriate build directory ('build-mingw' or build-linux') and run the make script 'mk(.bat)'. Under Win32 you will have to create an environmental variable called 'MINGW' and set it to the install path of your MinGW system (e.g. 'c:\mingw').
I suggest that any time you change/recompile the program, you should first copy the REPSTR binary to the appropriate 'bin/xxx' directory by running the script 'save(.bat)'. Then you should run the test scripts 'test01(.bat)' to 'test13(.bat)' included in the 'bin/linux' and 'bin/win32' directories. The input file is 'base.txt' (in the directory 'testdata'), it contains a number of patterns enclosed between '|' characters. The rule table files for the tests are also located in 'testdata'. The processed files are created in the 'bin/linux' or 'bin/win32' directory. In these files some or all of the '|xxx|' patterns (depending on the options used) are replaced by the same patterns with '#' added at both ends (e.g. '|curs|' becomes '|#curs#|').
Bug reports, questions and feature requests should be sent to the e-mail address specified at the start of this document.
Final note: Back up your files before you run REPSTR on them! I have made an effort to keep the program (relatively) bug-free, but nobody is perfect. I hope that it performs as intended and will be useful, but remember that this is GPLed software with no warranty (but no price tag either :-)
Enjoy!
- Ensure that glob patterns do not contain a path. - Add more checking for nonsense or confusing option/rule combinations. - Allow the use of a user-specified escape character in explicit rules so that spaces and commas can be included as well.