Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > .NET General

Vista - Parsing space delimited records

Reply
 
Old 10-29-2008   #1 (permalink)
M1iS


 
 

Parsing space delimited records

I’m trying to parse out Amazon S3 server logs which are space delimited.
However date fields are in the following form:

[28/Oct/2008:21:44:21 +0000]

When I try to use the following code to split the record on the spaces it
also splits date field:

string[] fields = record.Split(' ');

What can I do to get around this?

Scott


My System SpecsSystem Spec
Old 10-29-2008   #2 (permalink)
Stanimir Stoyanov


 
 

Re: Parsing space delimited records

Hi Scott,

I personally would use Regular Expressions to split the words in a smart
way. Below is a sample console application to demonstrate it. The regular
expression \[.*\]\s*|.+ means that it can select from two alternatives:

a) Text wrapped inside [ and ]
b) Any other text (your actual server log)

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main(string[] args)
{
string expr = @"\[.*\]\s*|.+";
string line = "[28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion
word goes here!";

Regex regex = new Regex(expr);

foreach (Match m in regex.Matches(line))
{
string value = m.Value.Trim();

if (value.StartsWith("[") && value.EndsWith("]"))
{
// This is part of the timestamp
Console.WriteLine("TEST: time = " + value);
}
else
{
// This is an actual slice of the result
Console.WriteLine("TEST: word = " + value);
}
}

Console.Read();
}
}

"M1iS" <M1iS@xxxxxx> wrote in message
news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
Quote:

> I’m trying to parse out Amazon S3 server logs which are space delimited.
> However date fields are in the following form:
>
> [28/Oct/2008:21:44:21 +0000]
>
> When I try to use the following code to split the record on the spaces it
> also splits date field:
>
> string[] fields = record.Split(' ');
>
> What can I do to get around this?
>
> Scott
>
My System SpecsSystem Spec
Old 10-30-2008   #3 (permalink)
M1iS


 
 

Re: Parsing space delimited records

I was hoping to avoid taking the time to create a regular expression as there
are 17 fields per S3 record. It took me a while but here is what I ended up
with:

(.*?)(\s+)(.*?)(\s+)(\[.*?\])(\s+)((??:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\d])(\s+)(.*?)(\s+)(.*?)(\s+)(.*?)(\s+)(.*)(\s+)(".*?")(\s+)(.*?)(\s+)(.*?)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(".*?")(\s+)(".*?")

Yuck, I'd rather being doing about a million other things, but oh well
problem solved.



"Stanimir Stoyanov" wrote:
Quote:

> Hi Scott,
>
> I personally would use Regular Expressions to split the words in a smart
> way. Below is a sample console application to demonstrate it. The regular
> expression \[.*\]\s*|.+ means that it can select from two alternatives:
>
> a) Text wrapped inside [ and ]
> b) Any other text (your actual server log)
>
> using System;
> using System.Text.RegularExpressions;
>
> class Program
> {
> static void Main(string[] args)
> {
> string expr = @"\[.*\]\s*|.+";
> string line = "[28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion
> word goes here!";
>
> Regex regex = new Regex(expr);
>
> foreach (Match m in regex.Matches(line))
> {
> string value = m.Value.Trim();
>
> if (value.StartsWith("[") && value.EndsWith("]"))
> {
> // This is part of the timestamp
> Console.WriteLine("TEST: time = " + value);
> }
> else
> {
> // This is an actual slice of the result
> Console.WriteLine("TEST: word = " + value);
> }
> }
>
> Console.Read();
> }
> }
>
> "M1iS" <M1iS@xxxxxx> wrote in message
> news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
Quote:

> > I’m trying to parse out Amazon S3 server logs which are space delimited.
> > However date fields are in the following form:
> >
> > [28/Oct/2008:21:44:21 +0000]
> >
> > When I try to use the following code to split the record on the spaces it
> > also splits date field:
> >
> > string[] fields = record.Split(' ');
> >
> > What can I do to get around this?
> >
> > Scott
> >
>
My System SpecsSystem Spec
Old 10-30-2008   #4 (permalink)
Stanimir Stoyanov


 
 

Re: Parsing space delimited records

I am sure there is *more* elegant solution to the problem, can you post a
sample log output, and do you want to get the individual words out of the
log?

E.g. if the log line is
[28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion word goes here!
would you like to have the timestamp, "Test", "with", etc as separate
matches? If so, you could split the text using string.Split() once you have
the actual log text (see my previous code example for the 'log text' case).

--
Stanimir Stoyanov
http://stoyanoff.info

"M1iS" <M1iS@xxxxxx> wrote in message
news:7144B06E-3E70-4281-A367-0D871786348C@xxxxxx
Quote:

>I was hoping to avoid taking the time to create a regular expression as
>there
> are 17 fields per S3 record. It took me a while but here is what I ended
> up
> with:
>
> (.*?)(\s+)(.*?)(\s+)(\[.*?\])(\s+)((??:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\d])(\s+)(.*?)(\s+)(.*?)(\s+)(.*?)(\s+)(.*)(\s+)(".*?")(\s+)(.*?)(\s+)(.*?)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(".*?")(\s+)(".*?")
>
> Yuck, I'd rather being doing about a million other things, but oh well
> problem solved.
>
>
>
> "Stanimir Stoyanov" wrote:
>
Quote:

>> Hi Scott,
>>
>> I personally would use Regular Expressions to split the words in a smart
>> way. Below is a sample console application to demonstrate it. The regular
>> expression \[.*\]\s*|.+ means that it can select from two alternatives:
>>
>> a) Text wrapped inside [ and ]
>> b) Any other text (your actual server log)
>>
>> using System;
>> using System.Text.RegularExpressions;
>>
>> class Program
>> {
>> static void Main(string[] args)
>> {
>> string expr = @"\[.*\]\s*|.+";
>> string line = "[28/Oct/2008:21:44:21 +0000] Test with
>> p~nctuat!ion
>> word goes here!";
>>
>> Regex regex = new Regex(expr);
>>
>> foreach (Match m in regex.Matches(line))
>> {
>> string value = m.Value.Trim();
>>
>> if (value.StartsWith("[") && value.EndsWith("]"))
>> {
>> // This is part of the timestamp
>> Console.WriteLine("TEST: time = " + value);
>> }
>> else
>> {
>> // This is an actual slice of the result
>> Console.WriteLine("TEST: word = " + value);
>> }
>> }
>>
>> Console.Read();
>> }
>> }
>>
>> "M1iS" <M1iS@xxxxxx> wrote in message
>> news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
Quote:

>> > I’m trying to parse out Amazon S3 server logs which are space
>> > delimited.
>> > However date fields are in the following form:
>> >
>> > [28/Oct/2008:21:44:21 +0000]
>> >
>> > When I try to use the following code to split the record on the spaces
>> > it
>> > also splits date field:
>> >
>> > string[] fields = record.Split(' ');
>> >
>> > What can I do to get around this?
>> >
>> > Scott
>> >
>>
My System SpecsSystem Spec
Old 10-30-2008   #5 (permalink)
Mark S. Milley


 
 

Re: Parsing space delimited records

Unless you're somehow married to the format, just drop the time zone:

string[] fields = record.Replace(' +0000','',Split(' ');


"M1iS" <M1iS@xxxxxx> wrote in message
news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
Quote:

> I’m trying to parse out Amazon S3 server logs which are space delimited.
> However date fields are in the following form:
>
> [28/Oct/2008:21:44:21 +0000]
>
> When I try to use the following code to split the record on the spaces it
> also splits date field:
>
> string[] fields = record.Split(' ');
>
> What can I do to get around this?
>
> Scott
>
My System SpecsSystem Spec
Old 10-30-2008   #6 (permalink)
Mark S. Milley


 
 

Re: Parsing space delimited records

Er, make that:

string[] fields = record.Replace(' +0000','').Split(' ');

"Mark S. Milley" <mark.milley@xxxxxx> wrote in message
news:90C93CA7-2648-47D5-958E-A1A2E9B89E18@xxxxxx
Quote:

> Unless you're somehow married to the format, just drop the time zone:
>
> string[] fields = record.Replace(' +0000','',Split(' ');
>
>
> "M1iS" <M1iS@xxxxxx> wrote in message
> news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
Quote:

>> I’m trying to parse out Amazon S3 server logs which are space delimited.
>> However date fields are in the following form:
>>
>> [28/Oct/2008:21:44:21 +0000]
>>
>> When I try to use the following code to split the record on the spaces it
>> also splits date field:
>>
>> string[] fields = record.Split(' ');
>>
>> What can I do to get around this?
>>
>> Scott
>>
>
My System SpecsSystem Spec
Old 10-30-2008   #7 (permalink)
Jesse Houwing


 
 

Re: Parsing space delimited records

Hello Stanimir,

If you do a Regex.Match with the following regex:

^((\[(?<result>[^\]]*)\]|(?<result>[^ ]*))([ ]|$)*

Should get you a Match object with 1 named group and 17 captures in there.
Exactly what you need...

You should also be able to use the Log parser class that the IIS team once
published... but I cannot find a link at the moment...

Jesse
Quote:

> I am sure there is *more* elegant solution to the problem, can you
> post a sample log output, and do you want to get the individual words
> out of the log?
>
> E.g. if the log line is
> [28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion word goes here!
> would you like to have the timestamp, "Test", "with", etc as separate
> matches? If so, you could split the text using string.Split() once you
> have
> the actual log text (see my previous code example for the 'log text'
> case).
> --
> Stanimir Stoyanov
> http://stoyanoff.info
> "M1iS" <M1iS@xxxxxx> wrote in message
> news:7144B06E-3E70-4281-A367-0D871786348C@xxxxxx
>
Quote:

>> I was hoping to avoid taking the time to create a regular expression
>> as
>> there
>> are 17 fields per S3 record. It took me a while but here is what I
>> ended
>> up
>> with:
>> (.*?)(\s+)(.*?)(\s+)(\[.*?\])(\s+)((??:25[0-5]|2[0-4][0-9]|[01]?[0-
>> 9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\d])(\s+)
>> (.*?)(\s+)(.*?)(\s+)(.*?)(\s+)(.*)(\s+)(".*?")(\s+)(.*?)(\s+)(.*?)(\s
>> +)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(".*?")(\s+)(".*?")
>>
>> Yuck, I'd rather being doing about a million other things, but oh
>> well problem solved.
>>
>> "Stanimir Stoyanov" wrote:
>>
Quote:

>>> Hi Scott,
>>>
>>> I personally would use Regular Expressions to split the words in a
>>> smart way. Below is a sample console application to demonstrate it.
>>> The regular expression \[.*\]\s*|.+ means that it can select from
>>> two alternatives:
>>>
>>> a) Text wrapped inside [ and ]
>>> b) Any other text (your actual server log)
>>> using System;
>>> using System.Text.RegularExpressions;
>>> class Program
>>> {
>>> static void Main(string[] args)
>>> {
>>> string expr = @"\[.*\]\s*|.+";
>>> string line = "[28/Oct/2008:21:44:21 +0000] Test with
>>> p~nctuat!ion
>>> word goes here!";
>>> Regex regex = new Regex(expr);
>>>
>>> foreach (Match m in regex.Matches(line))
>>> {
>>> string value = m.Value.Trim();
>>> if (value.StartsWith("[") && value.EndsWith("]"))
>>> {
>>> // This is part of the timestamp
>>> Console.WriteLine("TEST: time = " + value);
>>> }
>>> else
>>> {
>>> // This is an actual slice of the result
>>> Console.WriteLine("TEST: word = " + value);
>>> }
>>> }
>>> Console.Read();
>>> }
>>> }
>>> "M1iS" <M1iS@xxxxxx> wrote in message
>>> news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
>>>
>>>> I’m trying to parse out Amazon S3 server logs which are space
>>>> delimited.
>>>> However date fields are in the following form:
>>>> [28/Oct/2008:21:44:21 +0000]
>>>>
>>>> When I try to use the following code to split the record on the
>>>> spaces
>>>> it
>>>> also splits date field:
>>>> string[] fields = record.Split(' ');
>>>>
>>>> What can I do to get around this?
>>>>
>>>> Scott
>>>>
--
Jesse Houwing
jesse.houwing at sogeti.nl


My System SpecsSystem Spec
Old 10-30-2008   #8 (permalink)
M1iS


 
 

Re: Parsing space delimited records

Below is an example of what is in a log file. I'm just trying to read the
logs and dump the fields into a database.

4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:21 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
AAE9C2CCFFE5E6DB REST.GET.ACL - "GET /?acl HTTP/1.1" 200 - 556 - 488 - "-" "-"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:24 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
66FB31B05AFA84E9 REST.GET.LOGGING_STATUS - "GET /?logging HTTP/1.1" 200 - 244
- 171 - "-" "-"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:56 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
40AC4747CFF7ACFD REST.GET.BUCKET - "GET / HTTP/1.1" 200 - 1298 - 15 12 "-"
"Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:44:56 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
5938B6855868E040 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1298 - 642 473
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:33 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
16F565F75362B5A8 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1298 - 508 293
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:33 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
D61C9201C46617CF REST.PUT.OBJECT testFile.zip "PUT /testFile.zip HTTP/1.1"
200 - - 17428 334 11 "-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:34 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
B2FEB30917A1F050 REST.GET.BUCKET - "GET / HTTP/1.1" 200 - 1634 - 181 15 "-"
"Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:45:34 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
B41FCF38CD590562 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1634 - 15 13 "-"
"Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:11 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
C42BF5C887E61F18 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1634 - 476 299
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:12 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
A590228971F16081 REST.PUT.OBJECT testFile.zip "PUT /testFile.zip HTTP/1.1"
200 - - 1487163 20298 48 "-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:32 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
6528418F2CCABB59 REST.HEAD.BUCKET - "HEAD / HTTP/1.1" 200 - 1969 - 312 309
"-" "Amazon S3 CSharp Library"
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887 testBucket
[28/Oct/2008:21:46:33 +0000] 127.0.0.1
4c54d3704f3a82592af823f518a6443186e92168fe07cdcdb20cfc2a21655887
EE65B98BD633E32C REST.GET.BUCKET - "GET / HTTP/1.1" 200 - 1969 - 16 14 "-"
"Amazon S3 CSharp Library"


"Stanimir Stoyanov" wrote:
Quote:

> I am sure there is *more* elegant solution to the problem, can you post a
> sample log output, and do you want to get the individual words out of the
> log?
>
> E.g. if the log line is
> [28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion word goes here!
> would you like to have the timestamp, "Test", "with", etc as separate
> matches? If so, you could split the text using string.Split() once you have
> the actual log text (see my previous code example for the 'log text' case).
>
> --
> Stanimir Stoyanov
> http://stoyanoff.info
>
> "M1iS" <M1iS@xxxxxx> wrote in message
> news:7144B06E-3E70-4281-A367-0D871786348C@xxxxxx
Quote:

> >I was hoping to avoid taking the time to create a regular expression as
> >there
> > are 17 fields per S3 record. It took me a while but here is what I ended
> > up
> > with:
> >
> > (.*?)(\s+)(.*?)(\s+)(\[.*?\])(\s+)((??:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\d])(\s+)(.*?)(\s+)(.*?)(\s+)(.*?)(\s+)(.*)(\s+)(".*?")(\s+)(.*?)(\s+)(.*?)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(".*?")(\s+)(".*?")
> >
> > Yuck, I'd rather being doing about a million other things, but oh well
> > problem solved.
> >
> >
> >
> > "Stanimir Stoyanov" wrote:
> >
Quote:

> >> Hi Scott,
> >>
> >> I personally would use Regular Expressions to split the words in a smart
> >> way. Below is a sample console application to demonstrate it. The regular
> >> expression \[.*\]\s*|.+ means that it can select from two alternatives:
> >>
> >> a) Text wrapped inside [ and ]
> >> b) Any other text (your actual server log)
> >>
> >> using System;
> >> using System.Text.RegularExpressions;
> >>
> >> class Program
> >> {
> >> static void Main(string[] args)
> >> {
> >> string expr = @"\[.*\]\s*|.+";
> >> string line = "[28/Oct/2008:21:44:21 +0000] Test with
> >> p~nctuat!ion
> >> word goes here!";
> >>
> >> Regex regex = new Regex(expr);
> >>
> >> foreach (Match m in regex.Matches(line))
> >> {
> >> string value = m.Value.Trim();
> >>
> >> if (value.StartsWith("[") && value.EndsWith("]"))
> >> {
> >> // This is part of the timestamp
> >> Console.WriteLine("TEST: time = " + value);
> >> }
> >> else
> >> {
> >> // This is an actual slice of the result
> >> Console.WriteLine("TEST: word = " + value);
> >> }
> >> }
> >>
> >> Console.Read();
> >> }
> >> }
> >>
> >> "M1iS" <M1iS@xxxxxx> wrote in message
> >> news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
> >> > I’m trying to parse out Amazon S3 server logs which are space
> >> > delimited.
> >> > However date fields are in the following form:
> >> >
> >> > [28/Oct/2008:21:44:21 +0000]
> >> >
> >> > When I try to use the following code to split the record on the spaces
> >> > it
> >> > also splits date field:
> >> >
> >> > string[] fields = record.Split(' ');
> >> >
> >> > What can I do to get around this?
> >> >
> >> > Scott
> >> >
> >>
>
My System SpecsSystem Spec
Old 10-31-2008   #9 (permalink)
Stanimir Stoyanov


 
 

Re: Parsing space delimited records

One of the following regular expressions might fit better:

\[.*\]|\"[^\"]*\"|[^\s-]+

or

\[.*\]|\"[^\"]*\"|[^\s]+

The difference is that the first omits single dashes as found on some rows
(in between figures), e.g.
200 - 1634 - 181 15

--
Stanimir Stoyanov
http://stoyanoff.info

"M1iS" <M1iS@xxxxxx> wrote in message
news14E31DB-E0C8-44C2-AAE2-F51EEB9778B5@xxxxxx
Quote:

> Below is an example of what is in a log file. I'm just trying to read the
> logs and dump the fields into a database.
>
> <SNIPPED>
>
> "Stanimir Stoyanov" wrote:
>
Quote:

>> I am sure there is *more* elegant solution to the problem, can you post a
>> sample log output, and do you want to get the individual words out of the
>> log?
>>
>> E.g. if the log line is
>> [28/Oct/2008:21:44:21 +0000] Test with p~nctuat!ion word goes here!
>> would you like to have the timestamp, "Test", "with", etc as separate
>> matches? If so, you could split the text using string.Split() once you
>> have
>> the actual log text (see my previous code example for the 'log text'
>> case).
>>
>> --
>> Stanimir Stoyanov
>> http://stoyanoff.info
>>
>> "M1iS" <M1iS@xxxxxx> wrote in message
>> news:7144B06E-3E70-4281-A367-0D871786348C@xxxxxx
Quote:

>> >I was hoping to avoid taking the time to create a regular expression as
>> >there
>> > are 17 fields per S3 record. It took me a while but here is what I
>> > ended
>> > up
>> > with:
>> >
>> > (.*?)(\s+)(.*?)(\s+)(\[.*?\])(\s+)((??:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?![\d])(\s+)(.*?)(\s+)(.*?)(\s+)(.*?)(\s+)(.*)(\s+)(".*?")(\s+)(.*?)(\s+)(.*?)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(\d+|-)(\s+)(".*?")(\s+)(".*?")
>> >
>> > Yuck, I'd rather being doing about a million other things, but oh well
>> > problem solved.
>> >
>> >
>> >
>> > "Stanimir Stoyanov" wrote:
>> >
>> >> Hi Scott,
>> >>
>> >> I personally would use Regular Expressions to split the words in a
>> >> smart
>> >> way. Below is a sample console application to demonstrate it. The
>> >> regular
>> >> expression \[.*\]\s*|.+ means that it can select from two
>> >> alternatives:
>> >>
>> >> a) Text wrapped inside [ and ]
>> >> b) Any other text (your actual server log)
>> >>
>> >> using System;
>> >> using System.Text.RegularExpressions;
>> >>
>> >> class Program
>> >> {
>> >> static void Main(string[] args)
>> >> {
>> >> string expr = @"\[.*\]\s*|.+";
>> >> string line = "[28/Oct/2008:21:44:21 +0000] Test with
>> >> p~nctuat!ion
>> >> word goes here!";
>> >>
>> >> Regex regex = new Regex(expr);
>> >>
>> >> foreach (Match m in regex.Matches(line))
>> >> {
>> >> string value = m.Value.Trim();
>> >>
>> >> if (value.StartsWith("[") && value.EndsWith("]"))
>> >> {
>> >> // This is part of the timestamp
>> >> Console.WriteLine("TEST: time = " + value);
>> >> }
>> >> else
>> >> {
>> >> // This is an actual slice of the result
>> >> Console.WriteLine("TEST: word = " + value);
>> >> }
>> >> }
>> >>
>> >> Console.Read();
>> >> }
>> >> }
>> >>
>> >> "M1iS" <M1iS@xxxxxx> wrote in message
>> >> news:81E4AA72-53B2-482A-8B4B-719C2E2CDFC3@xxxxxx
>> >> > I’m trying to parse out Amazon S3 server logs which are space
>> >> > delimited.
>> >> > However date fields are in the following form:
>> >> >
>> >> > [28/Oct/2008:21:44:21 +0000]
>> >> >
>> >> > When I try to use the following code to split the record on the
>> >> > spaces
>> >> > it
>> >> > also splits date field:
>> >> >
>> >> > string[] fields = record.Split(' ');
>> >> >
>> >> > What can I do to get around this?
>> >> >
>> >> > Scott
>> >> >
>> >>
>>
My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
calculated property - delimited array output PowerShell
export results to tab-delimited file PowerShell
Tab delimited output PowerShell
Open Tab Delimited File with Excel VB Script
comma delimited text file into database PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46