Unix Code and Scripts

Below are some of the Unix commands and examples that I have found useful for Campaign Management Purposes. For detailed syntax of the many commands available you can type man <command_name> in a Unix window. Similarly you can review the Wikepedia Website here (opens in a new window), which provides detailed information for each command, including a description, syntax and examples.


CHMOD

One of the first things that you need to understand in the Unix environment is the CHMOD command. This command allows for the change of permissions to files and folders in Unix directories. When you view a file in the Unix environment you will see details like the following:

19883 –rwxrwxrwx–   1 <Username> <Role/Group>      34 Dec 11 16:30 Filename.sql

The rwx components stand for read, write and execute respectively.

  • Read – lets you read the file
  • Write – lets you update and save the file
  • Execute – lets you execute a file (or script – .sh extension)

Depending upon which level is set dictates what permissions users, the file group, and everyone else has over the file/directory in question. Where the colour coding in the example above corresponds to the respective rwx flags against each file and permission group.

In order to change permissions for a file you can use the following numeric values:

# Permission rwx
7 Read, Write, Execute 111
6 Read, Write 110
5 Read and Execute 101
4 Read Only 100
3 Write and Execute 011
2 Write Only 010
1 Execute Only 001
0 None 000

So to change permissions for the example above you could type:

chmod 624 Filename.sql

This would set read and write permissions (but not execute) for the user, write-only for the group, and read-only for everyone else. The Wikipedia link at the top of the page can provide more detail if necessary.

Warning IconYou will often find that issues creating or appending to files using scripts and campaign management tools are permissions based. Check the permissions before anything else. Ensure that both yourself and the role executing the script have sufficient permissions.

 


 Directory space available (% used, etc.)

A very quick command to check if problems with file creation are related to not having enough space in the directory

df -k


While we’re on the topic of efficiency problems…

A couple of commands that I used a lot:

mpstat 5

This will run statistics against each of the available cpu’s and tell you if there are any bottle-necks. The 5 in this example means that it will refresh every 5 secs. If you have a lot of users, or some very intensive queries it is a good idea to run this command to see how hard it is hitting your system.

Similarly…

glance

This gives you a lot richer information about system performance. If it doesn’t work ask your technology department if they can enable it for you.


Find which Chordiant Marketing Directory processes are running

ps -ef | grep vantage


Select a string from one file and create a new file of all records containing the string

 cat   <filename.txt>   |grep   <string_to_search_for>   >   <new_filename.txt>


Select from a list of strings from one file and creating a new file of all records containing one of the strings

 cat   <filename.txt>   |egrep -e “<string_to_search_for1>|<string_to_search_for2>|<string_to_search_for3>”   >   <new_filename.txt>


Replacing a set position in a row with something else and output as a new file

Output only positions 1 thru 7, then ***, then all chars from position 11 to end of line. This technique can be used to mask account or credit card numbers

cat <filename.txt> | awk ‘ {print substr($0,1,7)”***”substr($0,11)}’ > <new_filename.txt>


Global change within a file using vi

vi <filename.txt>

: <colon>

1,$s/<origtext>/<newtext>/g


Determine the maximum character row length for a file

Good for checking maximum record length before sending to a file to an agency

 cat  <filename.txt> | awk ‘{ if (length($0) > max) max = length($0) } END { print max }’


Create a new file in a specific format from a file that only contains a customer_id

The bit in text (“,25/02/2015,’campcode’,1,1,1,1”) could contain whatever ‘hard coded’ values you want to add to your list of customer ids.

 cat filename.txt | awk ‘{print substr($0,1,10)”,25/02/2015,’campcode’,1,1,1,1″}’ > newfilename.txt


Grep Search

Search for a string within any file in a directory. Start by looping through all files in the directory.

for i in `ls -1`
do grep -l <string_to_search_for> $i
done

Note: in this case ls -1  creates a list of filenames which is then used in the search.


Checking users in a UNIX user group

 awk -F: ‘$4 == 2001 {print $1}’ /etc/passwd


AWK Search and Replace

Searches through a file for the <n>th delimiter (in this case a hash {#}) and replaces any nulls with a string (in this case it inputs “N/A”)
Replace $4 with the field position that you want.

 awk -F# ‘BEGIN {OFS=”#”} {if ($4==””) $4 = “NA”} { print $0}’   filename


Count the total number of records containing a particular string

 cat <filename.txt> |egrep -e “<string>”| awk ‘{ print $1 }’|wc -l


Count the total number of records that match at least one of a number of string parameters

 cat filename.txt |egrep -e “RB00001|RB00003|RB00004″| awk ‘{ print $1 }’|wc -l

Can be used to count the number of campaign codes output in a given file


Use CAT to open filename.txt and pipe that information to a new file (filename2.txt) but limit it to the first 500 characters of each record from the first file

 cat <filename.txt> |   awk ‘ {print substr($0,1,500)}’ > <filename2.txt>


Use CAT to open filename.txt and pipe that information to a new file (filename2.txt) but only the information after the first 500 characters

 cat <filename_to_check> | awk ‘{ if (length($0) > 500) print $0 } END { }’ > <output_new_filename>


UNIX Shell script checker

sh -x <script_name>


Prints each record in the file filename.txt where the first field contains the string <STRING>

awk ‘/<STRING>/ { print $0 }’ <filename.txt>


Looks for <STRING> in the entire record of filename.txt and prints the first field and the last field for each input record containing a match

awk ‘/<STRING>/ { print $1, $NF }’ <filename.txt>


Open a file and group by and count the nth field in a file

 cat filename.txt | awk -F, ‘{ print $1 }’ | sort | uniq -c

Quick TipThis command is a very simple way to count the number of lead types in a server-side file, just make sure that you update the field number (i.e. $1) to correspond to your campaign code field in the file

 


Print the length of the longest input line.

awk ‘{ if (length($0) > max) max = length($0) }
END { print max }’ <filename.txt>


Prints every line that has at least one field

 awk ‘NF > 0’ <filename.txt>


Count lines in a file

 awk ‘END { print NR }’ <filename.txt>


Count the number of records that match a given string

cat <filename.txt> |egrep -e “<string>”| awk ‘{ print $1 }’|wc -l


 Check file format

head Filename.txt
or
more Filename.txt


Look through a file for a specific string and counts the number of records containing the string.

 cut -f 1 Filename.txt | grep ‘R21’ | wc -l


 Finding a log file based on a string within it. This command will search the whole directory that it is in.

 grep BS228 *log


UNIX Split command

This command is a very easy way to automatically split a file into multiple files of a given number of rows.
In the case of the example below it will create files of 100,000 rows.
If the original file is 250,000 rows you would get 2 files of 100,000 rows and a third file of 50,000 rows.
The files will all have the suffix “newfile_part_” followed by a file number (for memory 1 to x)

split -l  100000 filename.txt newfile_part_


UNIX Send Email

Attach a file and email

mailx -s “SUBJECT” YourEmail@hotmail.com < <filename.txt>

Email with Subject and message (no attachment)

echo “email body… ” | mailx -s “email subject” YourEmail@hotmail.com


Will pipe filename.txt into AWK program,  sort and count on campaign code and treatment code.

The following command can be used to create a summary file of the number of leads in a particular file

cat AllLeads.txt | awk ‘{print “#”substr($1,11,7)”#”($3) }’| sort | uniq -c > Leads_Summary.txt


Example of using multiple commands to clean a file

Split a file into separate brands

cat   Filename.txt   |grep   “:Brand1:” > filename_Brand1.txt

Create a file of just Customer_Id and Email address

cat filename_Brand1.txt| awk ‘{print ($1,$8)}’ > filename_Brand1_2Fields.txt 

From this new file only output where the record length is greater than 14, send to a new file

cat filename_Brand1_2Fields.txt  | awk ‘length($0) > 14’ > filename_Brand1_Email.txt

Delete First Temporary File

rm filename_Brand1_2Fields.txt

Clean up the file so that only CIN and Email address remain (i.e. remove ”  :”)

cat filename_Brand1_Email.txt | awk ‘{print substr($0,1,10) substr($0,13,60)}’ > Brand1_RequiredInfo.txt

Remove Second Temporary File

rm filename_Brand1_Email.txt

Pad email field with spaces

cat Brand1_RequiredInfo.txt| awk ‘{ printf “%-70s %s\n”, $1, $2 }’ > Brand1_Padded_Email.txt

Remove Third Temporary File

rm Brand1_RequiredInfo.txt

Use this file and format it appropriately for whatever system is picking it up

cat Brand1_Padded_Email.txt | awk ‘{print substr($0,1,10)”2006-09-01EMAIL ” substr($0,11,60)”                    00000001.50″}’ > Final_Brand1.txt

Quick TipIf you work through your process one line at a time and test and eye-ball the results you can then put all of your commands together into a single script. The script ending .sh can then be called in the Unix environment  and you have created a powerful and efficient way to process files directly on the Unix Server. A great time saver if you have repeat jobs each week month that need to be automated. You will typically find that most campaign management software will then allow you to schedule your script to run directly from the tool. Add a few checks (e.g. polling for the file to be written to a given directory) will allow you to completely automate load scripts as a business user.


A Unix File Matching Routine

The following process will remove duplicate records, remove spaces, remove quotes, sort the files and then match the records

Remove duplicates in files

cat Large_File.txt | awk ‘{print $1}’| sort | uniq > removed_dupes_L.txt
cat Small_File.txt | awk ‘{print $1}’| sort | uniq > removed_dupes_S.tx

Change casing in a file

perl -e ‘ while(<>) { print uc($_); } warn “Changed $. lines to upper case\n” ‘ Small_File.txt > File_s1.txt

Remove all quotes in a file

perl -e ‘ while(<>) { s/”//g; print $_; } warn “Removed all quotes from $. lines\n” ‘ File_s1.txt > File_s2.txt

Remove all spaces in a file

perl -e ‘ while(<>) { s/ //g; print $_; } warn “Removed all spaces from $. lines\n” ‘ File_s2.txt > File_s3.txt

Collect: Collect information to match on… in this case Title,(1st letter Forename),Surname,Add1, Postcode (Do this for both files)

nawk -F# ‘ {print ($1) substr($2,1,1)($3)($4)($9)}’ File_s3.txt > File_s4.txt

Sort: Sort Both Files

sort -u File_s4.txt > File_s5.txt

Compare: Check if the same records are in both files and output as File_join1.txt

join -t”,” File_s5.txt File_b3.txt > File_join1.txt

Repeat above collect, sort and compare checks for as many variations as you want.


UNIX Delete a Unix File using Inode Number

At one stage I had a file sitting in the Unix directory that had funny characters in the name, and as a result it was impossible to delete the file using the standard rm command (because you couldn’t type the file name). Here is a technique that allows you to delete a file without having to type the file name in the command.

Remove file by an inode number, but first find out the file inode number:

ls -il

Output: 19883 -rw-rw-r–   1 <Username> <Role>      34 Dec 11 16:30 Filename.sql

Syntax: Find and remove file using find command, type the command as follows:

find . -inum <InodeNumber> -exec rm -i {} \;

Example: File removal, and (yes/no) prompt to delete it

find . -inum 19883 -exec rm -i {} \;


To view the page on Visual Basic for Applications (VBA) click here.