Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!
Here is a command-line (bash) script that uses [cci lang="bash"]sed[/cci] to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text "ERROR" which can be handled accordingly:
[cc lang="bash" nowrap="true"]
# Assembling a sample URI (including an injection attack)
uri_1='http://user:pass@www.example.com:19741/dir1/dir2/file.php'
uri_2='?param=some_value&array[0]=123¶m2=\`cat /etc/passwd\`'
uri_3='#bottom-left'
uri="$uri_1$uri_2$uri_3"
# Parse URI
op=`echo "$uri" | sed -nrf "uri.sed"`
# Handle invalid URI
[[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; }
# Execute assignments
eval "$op"
# ...work with URI components...
[/cc]Notice the [cci_bash]"uri.sed"[/cci_bash] file given to [cci_bash]sed[/cci_bash]?
It is actually responsible for the URI parsing and it contains the required regular expression rules that will produce bash code out of the given URI which, in turn, when executed, will create our final variables to play with:
[cc lang="text" nowrap="true"]
# initialize
s/[\r\n]+//g; s/`/%60/g; s/"/%22/g; T begin; :begin
# scheme, address, path, query, fragment
s/^(([a-z]+):\/\/)?(([^:\/]+(:[^@\/]*)?@)?[^:\/?]+(:[0-9]+)?)(\/[^?]*)?(\?[^#]*)?(#.*)?$/\
uri_scheme="\2"; uri_address="\3"; uri_path="\7"; uri_query="\8"; uri_fragment="\9"/i
T error
# user, pass, host, port
s/uri_address="(([a-z0-9_.+=-]+)(:([^@]*))?@)?([a-z0-9.-]*)(:([0-9]*))?"/\0; \
uri_user="\2"; uri_pass="\4"; uri_host="\5"; uri_port="\7"/i; T error
# path parts
h; s/.*uri_path="([^"]+)".*/uri_parts=(); \1/
s/\/+([^/]+)/uri_parts[$[${#uri_parts[*]}]]="\1"; /ig; x; G
# query args
h; s/.*uri_query="([^"]+)".*/uri_args=(); \1/
s/[?&]+([^= ]+)(=([^&]*))?/uri_args[$[${#uri_args[*]}]]="\1"; uri_arg_\1="\3"; /ig
x; G
s/\n\ +//g; s/\n//g; p; q
# failure
:error; c ERROR
[/cc]
After the successful execution of this piece of code the following variables will exist in the running environment:
[cc lang="bash"]
uri_scheme="http"
uri_address="user:pass@www.example.com:19741"
uri_user="user"
uri_password="pass"
uri_host="www.example.com"
uri_port="19741"
uri_path="/dir1/dir2/file.php"
uri_parts[0]="dir1"
uri_parts[1]="dir2"
uri_parts[2]="file.php"
uri_query="?param=some_value&array[0]=123¶m2=`cat /etc/passwd`"
uri_args[0]="param"
uri_args[1]="array[0]"
uri_args[2]="param2"
uri_arg_param="some_value"
uri_arg_array[0]="123"
uri_arg_param2="`cat /etc/passwd`"
uri_fragment="#bottom-left"
[/cc]
You could play around with it a bit and tell me if you find any problems. Right now it is only a first effort but it could be improved. Cheers!
[Edit] Changed parsing of the query args to permit parsing of arguments that have no value assigned to them (e.g. ...?arg_with_no_value&...)
RăspundețiȘtergere[Update] Moved the [cci_bash]sed[/cci_bash] instructions into a separate file for modularization.
RăspundețiȘtergere[...] for this post?3 February 2010URI parsing using Bash built-in featuresA bit of backgroundA while ago I posted an article describing how one could parse complete URIs in Bash using the sed program. Since then, I have [...]
RăspundețiȘtergere@all: Dan's comment reminded me that I made a big improvement on this parser a while back so I hurried and posted a new article about it. Check it out!
RăspundețiȘtergere