EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: DiTBho on January 26, 2021, 01:09:07 am

Title: how to parse time-stamp lines?
Post by: DiTBho on January 26, 2021, 01:09:07 am
I have a file with a lot of lines like 2021-01-17--18-41-57,2021-01-17--18-43-55, reporting the time-stamp of each event, and I need to parse them and to calculate the time between them

for instance
A=convert-to-sec(2021-01-17--18-41-57)
B=convert-to-sec(2021-01-17--18-43-55)
C=abs(B-A)=?

how would you do it?
Title: Re: how to parse time-stamp lines?
Post by: nightfire on January 26, 2021, 01:26:23 am
Depending on the system you are working on, sometimes it is an option to convert that to "Unix time". Under Unix/Linux, the date function sometimes has a flag builtin that converts a date to Unix time in seconds and vice versa- from that on, you can easily calculate in seconds.

Unix time: Seconds passed after 1.Jan.1970 (also called: the epoch)
Title: Re: how to parse time-stamp lines?
Post by: langwadt on January 26, 2021, 01:30:22 am
https://pubs.opengroup.org/onlinepubs/007904975/functions/strptime.html
Title: Re: how to parse time-stamp lines?
Post by: JohnnyMalaria on January 26, 2021, 01:45:23 am
What programming language?
Title: Re: how to parse time-stamp lines?
Post by: Nominal Animal on January 26, 2021, 03:37:06 am
A=convert-to-sec(2021-01-17--18-41-57)
B=convert-to-sec(2021-01-17--18-43-55)
I'd use
Code: [Select]
/**
 * Parse timestamp (expressed in local time)
 * @src     - String to parse
 * @tm_to   - Pointer to store parsed struct tm, may be NULL
 * @time_to - Pointer to store parsed time_t, may be NULL
 * @return  - Pointer to first unparsed character if successful, NULL with errno set if error
*/
const char *parse_timestamp(const char *src, struct tm *tm_to, time_t *time_to)
{
    struct tm  temp_tm = { .tm_isdst = -1 };
    time_t  temp_time;
    const char *end;

    errno = EINVAL;
    if (!src)
        return NULL;
    end = strptime(src, "%Y-%m-%d--%H-%M-%S", &temp_tm);
    if (!end)
        return NULL;

    temp_time = mktime(&temp_tm);
    if (temp_time == (time_t)-1)
        return NULL;

    if (tm_to)
        *tm_to = temp_tm;

    if (time_to)
        *time_to = temp_time;

    return end;
}
noting that it assumes the timestamps are in local time.  For the A-B case, I'd use difftime() if I wanted to be portable, but in Linux/POSIX, time_t is an integer type, so you can just subtract two time_t values.

If the timestamps are in UTC, you have two options:

1. Use the GNU extension timegm() (https://www.man7.org/linux/man-pages/man3/timegm.3.html) instead of mktime().

2. Temporarily switch to UTC by running
Code: [Select]
const char *oldtz = getenv("TZ");
setenv("TZ", ":", 1);
tzset();
so that local time (in the current process) is UTC.  To revert back, use
Code: [Select]
if (oldtz)
    setenv("TZ", oldtz, 1);
else
    unsetenv("TZ");
tzset();
but remember that this affects all threads in the current process.

The latter is a bit more work, but would keep the code portable across POSIXy systems (Linux, Mac, BSDs).

Here is an example program:
Code: [Select]
// SPDX-License-Identifier: CC0-1.0
#define  _POSIX_C_SOURCE  200809L
#define  _XOPEN_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <time.h>
#include <errno.h>

/**
 * Parse timestamp (expressed in local time)
 * @src     - String to parse
 * @tm_to   - Pointer to store parsed struct tm, may be NULL
 * @time_to - Pointer to store parsed time_t, may be NULL
 * @return  - Pointer to first unparsed character if successful, NULL with errno set if error
*/
const char *parse_timestamp(const char *src, struct tm *tm_to, time_t *time_to)
{
    struct tm  temp_tm = { .tm_isdst = -1 };
    time_t  temp_time;
    const char *end;

    errno = EINVAL;
    if (!src)
        return NULL;
    end = strptime(src, "%t%Y-%m-%d--%H-%M-%S%t", &temp_tm);
    if (!end)
        return NULL;

    temp_time = mktime(&temp_tm);
    if (temp_time == (time_t)-1)
        return NULL;

    if (tm_to)
        *tm_to = temp_tm;

    if (time_to)
        *time_to = temp_time;

    return end;
}

int usage(const char *arg0)
{
    fprintf(stderr, "\n");
    fprintf(stderr, "Usage: %s [ -h | --help ]\n", arg0);
    fprintf(stderr, "       %s [ -u ] FILE [ FILE ... ]\n", arg0);
    fprintf(stderr, "Where:\n");
    fprintf(stderr, "       -u      UTC timestamps\n");
    fprintf(stderr, "       FILE    File name(s) to read\n");
    fprintf(stderr, "\n");
    return EXIT_SUCCESS;
}

int main(int argc, char *argv[])
{
    const char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
    int         use_utc = 0;
    int         opt, arg;

    char       *line_ptr = NULL;
    size_t      line_max = 0;
    ssize_t     line_len;

    if (argc < 2)
        return usage(arg0);
    if (!strcmp(argv[1], "--help"))
        return usage(arg0);

    while ((opt = getopt(argc, argv, "hu")) != -1) {
        switch (opt) {
        case 'h':
            return usage(arg0);

        case 'u':
            use_utc = 1;
            break;

        default:
            /* getopt() has already printed an error message. */
            return EXIT_FAILURE;
        }
    }
    if (optind >= argc) {
        fprintf(stderr, "No files specified.\n");
        return EXIT_FAILURE;
    }

    if (use_utc)
        setenv("TZ", ":", 1);
    tzset();

    for (arg = optind; arg < argc; arg++) {
        const char *inname = argv[arg];
        long  linenum = 0;
        FILE *in;

        if (!strcmp(inname, "-")) {
            in = stdin;
            inname = "Standard input";
        } else {
            in = fopen(inname, "r");
            if (!in) {
                fprintf(stderr, "%s: %s.\n", inname, strerror(errno));
                return EXIT_FAILURE;
            }
        }

        while (1) {
            const char *ptr;
            time_t      when;

            line_len = getline(&line_ptr, &line_max, in);
            if (line_len == -1)
                break;
            linenum++;

            /* For ease of printing the lines, trim at CR and LF. */
            line_ptr[strcspn(line_ptr, "\r\n")] = '\0';

            ptr = parse_timestamp(line_ptr, NULL, &when);
            if (!ptr) {
                fflush(stdout);
                fprintf(stderr, "%s: Line %ld: Cannot parse timestamp from \"%s\".\n",
                                inname, linenum, line_ptr);
                continue;
            }

            printf("%s: Line %ld: %ld \"%s\"\n", inname, linenum, (long)when, ptr);
        }
        fflush(stdout);

        if (!feof(in) || ferror(in)) {
            fprintf(stderr, "%s: Read error.\n", argv[arg]);
            fclose(in);
            return EXIT_FAILURE;
        }
        if (in != stdin && fclose(in)) {
            fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
            return EXIT_FAILURE;
        }
    }

    return EXIT_SUCCESS;
}
The %t specifier in strptime() refers to arbitrary whitespace, i.e. the above parse_timestamp() version consumes whitespace before and after the timestamp, but does not require there to be any whitespace.  Compile and run using e.g.
    gcc -Wall -Wextra -O2 example.c -o ex
    printf '2021-01-17--18-41-57 event1\n2021-01-17--18-43-55 event2\n' | ./ex -
Run ./ex --help to see the usage.  The program will print the Unix timestamp in seconds (converted to long) and the rest of each input line, and complain if a line cannot be parsed.
Title: Re: how to parse time-stamp lines?
Post by: DiTBho on January 26, 2021, 09:55:34 am
What programming language?

Logs are on a Windows machine, but I can (and I'd prefer to) share them with a Linux box through CIFS.
I can use C or Python  :D
Title: Re: how to parse time-stamp lines?
Post by: DiTBho on January 26, 2021, 10:17:37 am
noting that it assumes the timestamps are in local time.

Yes, I'm not sure. The time-stamps come from two different sensors, one is a circuit that filters the z-frame { longitude, latitude, time } from a GPS, one is a camera installed on a kind of gorilla-pod that can be attached on trees. There are several sensors that trigger the camera to take a photo, you don't know (well I don't know) which is the source (does it come from GPS this time? of from the Camera? mumbler ... !?!) but when a source writes a couple of events they are guaranteed to come from the same circuit and appears with the same format "year - mouth - day -- hour - minute - sec".

This remote stuff will probably monitor white leopards and other stuff in a remote corner of Siberia.

p.s.
thanks for your help  :D
Title: Re: how to parse time-stamp lines?
Post by: Nominal Animal on January 26, 2021, 11:04:22 am
It's even simpler in Python 3:
Code: [Select]
import datetime

def parse_timestamp(timestamp, UTC=False):
    if UTC:
        return datetime.datetime.strptime(timestamp, "%Y-%m-%d--%H-%M-%S").astimezone(datetime.timezone.utc)
    else:
        return datetime.datetime.strptime(timestamp, "%Y-%m-%d--%H-%M-%S").astimezone()

if __name__ == '__main__':
    A = parse_timestamp("2021-01-17--18-41-57", False)
    B = parse_timestamp("2021-01-17--18-43-55", False)
    print("A=%s (%s), B=%s (%s), |A-B|=%s" % (A, A.tzinfo, B, B.tzinfo, abs(A-B)))

    A = parse_timestamp("2021-01-17--18-41-57", True)
    B = parse_timestamp("2021-01-17--18-43-55", True)
    print("A=%s (%s), B=%s (%s), |A-B|=%s" % (A, A.tzinfo, B, B.tzinfo, abs(A-B)))
If you omit the .astimezone() stuff, the datetime object (https://docs.python.org/3/library/datetime.html) returned will be naive (non-timezone-aware).
Title: Re: how to parse time-stamp lines?
Post by: Syntax Error on January 26, 2021, 05:16:44 pm
Are your timestamps always zero filled? I asume your parser is happy with 2021-01-01 and 2021-1-1 ?  Also, is it y-m-d all of the time? I've seen dates go back to a default y-d-m format after someone did a 'reset'. I might be tempted to Regex the input string first.

My favoured date format is 24 hour UTC time format. Which permits timezones and second fractions too. And is unambiguous.
2020-02-29T15.10Z
2020-02-29T15:10:50+12:00
2020-02-29T15:10:50.0001Z
These are stored low level as timer 'ticks'.
Title: Re: how to parse time-stamp lines?
Post by: DiTBho on January 26, 2021, 07:28:40 pm
Are your timestamps always zero filled?

there is a spec, it tells they are always guaranteed to be always zero filled
Title: Re: how to parse time-stamp lines?
Post by: Syntax Error on January 26, 2021, 09:12:00 pm
there is a spec, it tells they are always guaranteed to be always zero filled
Even so, validate their inputs. Someone somewhere down the line is bound to interpret the spec creatively (just wrong). A crude regex in Pythonese for validating the format:
Code: [Select]
import re
str = "2021-01-17--18-41-57"
x = re.search("^20(\d{2}-){3}(-\d{2}){3}", str)
print(x.group(0)) # Result is 'None' if FAIL - Now log the fail so they cannot blame us for their garbage data!

On any platform, date processing is a deceptively simple task that divides the programmers from the code hacks. Time zones, daylight saving time and leap years are all in there to trip the unsuspecting coder. Some tricky examples of date processing that come to mind; an overlapping 30 hour clock for a radio station (with DST compensation) and, a file backup regime for a global bank. By global, I mean 24 hour operation in every timezone. So it was possible to delete an expired backup file that had been created tomorrow - go figure? Date processing often requires a bit of 'back to the future' thinking. ???