Author Topic: Unicode printf and scanf from SIM800C  (Read 1104 times)

0 Members and 1 Guest are viewing this topic.

Online ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1900
  • Country: ca
Unicode printf and scanf from SIM800C
« on: October 14, 2019, 09:06:47 am »
Hi,
I trying to print and scanf persian strings from SMS recived from a sim800C module,
For example a sample SMS text in unicode could be something like this

Quote
"0627064A06310627064606330644"

So I want to scanf it and get some strings from it also scan it and get some numbers in it, also I want to be able to print the output with printf to the console too.

here I have written some code to extract every 4 character from the SMS and convert it to a wchar_t type,

Code: [Select]
wchar_t  mystring [500];
int i,j=0;
char txt[5] ={0};
char input[300]= {"0627064A06310627064606330644064A002006390632064A0632060C000A0631064506320020064A06A906280627063106450635063106410020062806310627064A00200634064506270020003A0020003400370035003800390038"};
for(i=0;i<sizeof(input);i+=4)
{
txt[0] = input[i];
txt[1] = input[i+1] ;
txt[2] = input[i+2];
txt[3] = input[i+3];

mystring[j] = (int)strtol(txt, NULL, 16);
j++;

}

so if I use printf or sscanf on mystring variable, I got wrong and invalid results

Code: [Select]
printf("%ls\n",mystring);
Quote
    : 475898
the output of printf, which is clearly invalid!

Any Idea how I can use scanf and printf to extract the parameters from SMS?

ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6227
  • Country: fi
    • My home page and email address
Re: Unicode printf and scanf from SIM800C
« Reply #1 on: October 14, 2019, 11:42:12 am »
You need to tell your C library about the current locale, and use wide output (wprintf()); some C libraries don't do wide output from printf() too well.  Here's a full example program, using explicitly sscanf() and wprintf():
Code: [Select]
#define  _POSIX_C_SOURCE  200809L
#include <stdlib.h>
#include <limits.h>
#include <locale.h>
#include <stdio.h>
#include <wchar.h>

/* Convert unicode to wide character */
wchar_t  unicode_to_wide(const int code)
{
    return (wchar_t)code;
}

/* Convert a string consisting of 4-hex-digit Unicode code points
   to a wide string. Returns the number of wide characters,
   or -1 if an error occurs.
*/
int hex4_to_wide(wchar_t *dst, int max, const char *src)
{
    int  len = 0;
    int  c, n;

    /* NULL source? */
    if (!src)
        return -1;

    while (*src) {

        /* Convert up to four hex digits. */
        n = 0;
        if (sscanf(src, "%04x%n", &c, &n) < 1 || n < 1)
            return -1;

        /* Verify valid code point, 1..65535. */
        if (c < 1 || c > 65535)
            return -1;

        /* Convert unicode to wide character */
        if (len < max)
            dst[len] = unicode_to_wide(c);

        len++;
        src += n;
    }

    if (len < max)
        dst[len] = L'\0';
    else
    if (max > 0)
        dst[max-1] = L'\0';

    return len;
}

int main(int argc, char *argv[])
{
    wchar_t  *ws = NULL;
    int       wsmax = 0;
    int       arg, len, tmp;

    if (!setlocale(LC_ALL, ""))
        fprintf(stderr, "Warning: Your C library does not support your current locale.\n");

    if (fwide(stdout, 1) < 1)
        fprintf(stderr, "Warning: Your C library does not support wide character output.\n");

    if (argc < 1) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s HEX-STRING ...\n", argv[0]);
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    for (arg = 1; arg < argc; arg++) {

        len = hex4_to_wide(ws, wsmax, argv[arg]);
        if (len < 1) {
            fprintf(stderr, "%s: Not a hex string.\n", argv[arg]);
            return EXIT_FAILURE;
        }

        if (len >= wsmax) {
            free(ws);
            ws = calloc(sizeof ws[0], len + 1);
            if (!ws) {
                fprintf(stderr, "Out of memory.\n");
                return EXIT_FAILURE;
            }
            wsmax = len + 1;

            tmp = hex4_to_wide(ws, wsmax, argv[arg]);
            if (tmp != len) {
                fprintf(stderr, "%s: BUG: String modified while processing.\n", argv[arg]);
                return EXIT_FAILURE;
            }
        }

        wprintf(L"%s = %ls\n", argv[arg], ws);
        fflush(stdout);
    }

    return EXIT_SUCCESS;
}
Note that this assumes the C library uses Unicode code points for wide character codes.  I can fix that by using POSIX.1 iconv() with conversion from UCS2-LE or UCS4-LE to WCHAR_T, but I omitted it because it requires POSIX.1-2001 support from the C library.

Supply one or more hex-quad-strings as command-line arguments.  For example,
    gcc -Wall -O2 above.c -o example
    ./example 0627064A06310627064606330644
outputs
    0627064A06310627064606330644 = ايرانسل
in a en_GB.utf8 locale (western UTF-8).  Note that because this locale is western, both strings are written left-to-right (first glyph, U+0627, leftmost).

However, because you should consider using UTF8 everywhere, it would make sense to instead write a function that converts a hex quad string (encoded in ASCII, any other ASCII-compatible character set like UTF-8, and easily modified to support any fixed one-byte character set) into an UTF-8 string.  The resulting UTF-8 string is at most as long as the original string.  Would you like to see how much simpler/better that would be?  I can write it to work well on both microcontrollers and full-fledged OSes.  :)
« Last Edit: October 14, 2019, 11:49:32 am by Nominal Animal »
 
The following users thanked this post: I wanted a rude username

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14431
  • Country: fr
Re: Unicode printf and scanf from SIM800C
« Reply #2 on: October 14, 2019, 03:01:43 pm »
My advice here would be to roll your own conversion function instead of willing to twist printf() until it spits out what you expect, in probably non-standard ways.

You didn't mention what target system it's going to run on, but any embedded stuff without a full POSIX-compliant OS will NOT give you access to any locale anyway, so you're on your own, or will have to use specific API functions. Tell us more about the target system.
 

Online ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1900
  • Country: ca
Re: Unicode printf and scanf from SIM800C
« Reply #3 on: October 15, 2019, 06:37:12 am »
Thanks Nominal Animal :-+ :-+
Yes I would like that, also I'm using ARM keil as compiler,I should check it to see if it does support it.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1900
  • Country: ca
Re: Unicode printf and scanf from SIM800C
« Reply #4 on: October 15, 2019, 06:59:44 am »
I have ported your code to the Keil, But no lock on the wprintf function.

though it seems it would pass these tests

Quote
"Warning: Your C library does not support your current locale.\n"
"Warning: Your C library does not support wide character output.\n"

Here is the modified code

Code: [Select]

#include <math.h>
#include "arm_math.h"
#include <stdlib.h>
#include <wchar.h>
#include <string.h>
#include <stdio.h>
#include <limits.h>
#include <locale.h>
 
 /* Convert unicode to wide character */
wchar_t  unicode_to_wide(const int code)
{
    return (wchar_t)code;
}

/* Convert a string consisting of 4-hex-digit Unicode code points
   to a wide string. Returns the number of wide characters,
   or -1 if an error occurs.
*/
int hex4_to_wide(wchar_t *dst, int max, char *src)
{
    int  len = 0;
    int  c, n;

    /* NULL source? */
    if (!src)
        return -1;

    while (*src) {

        /* Convert up to four hex digits. */
        n = 0;
        if (sscanf(src, "%04x%n", &c, &n) < 1 || n < 1)
            return -1;

        /* Verify valid code point, 1..65535. */
        if (c < 1 || c > 65535)
            return -1;

        /* Convert unicode to wide character */
        if (len < max)
            dst[len] = unicode_to_wide(c);

        len++;
        src += n;
    }

    if (len < max)
        dst[len] = L'\0';
    else
    if (max > 0)
        dst[max-1] = L'\0';

    return len;
}

int32_t main(void)
{
int argc=10;
char argv[200]={"0627064A06310627064606330644064A002006390632064A0632060C000A0631064506320020064A06A906280627063106450635063106410020062806310627064A00200634064506270020003A0020003400370035003800390038"};
wchar_t  *ws = NULL;
  int       wsmax = 0;
  static int       arg, len, tmp;
if (!setlocale(LC_ALL, ""))
        fprintf(stderr, "Warning: Your C library does not support your current locale.\n");

  if (fwide(stdout, 1) < 1)
       fprintf(stderr, "Warning: Your C library does not support wide character output.\n");

    if (argc < 1) {
        printf("\n");
        printf( "Usage: %s HEX-STRING ...\n", argv);
        printf( "\n");
        return EXIT_FAILURE;
    }


    for (arg = 1; arg < argc; arg++) {

        len = hex4_to_wide(ws, wsmax, argv + arg);
        if (len < 1) {
            printf("%s: Not a hex string.\n", argv + arg);
            return EXIT_FAILURE;
        }

        if (len >= wsmax) {
            free(ws);
            ws = calloc(sizeof ws[0], len + 1);
            if (!ws) {
                fprintf(stderr, "Out of memory.\n");
                return EXIT_FAILURE;
            }
            wsmax = len + 1;

            tmp = hex4_to_wide(ws, wsmax, argv + arg);
            if (tmp != len) {
                printf("%s: BUG: String modified while processing.\n", argv + arg);
                return EXIT_FAILURE;
            }
        }

        wprintf(L"%s = %ls\n", argv[arg], ws);
        fflush(stdout);
    }
}

And here is the output

Quote
= 8
 = 8
 = cdbc 

 = 
    : 475898
 = 8
 = 8

 = cdbc 

 = 
    : 475898
 = 8



I have attached the Keil simulation project, you can see the output in the simulation output console.

ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6227
  • Country: fi
    • My home page and email address
Re: Unicode printf and scanf from SIM800C
« Reply #5 on: October 15, 2019, 04:11:52 pm »
Here's the standard C version:
Code: [Select]
#include <stdlib.h>
#include <stdio.h>

/* Convert a hex digit to decimal value.
   Returns -1 if the character is not a hex digit.
*/
static inline signed char hexdigit(const char c)
{
    if (c >= '0' && c <= '9')
        return c - '0';
    if (c >= 'A' && c <= 'F')
        return c - 'A' + 10;
    if (c >= 'a' && c <= 'f')
        return c - 'a' + 10;
    return -1;
}

/* Convert a string of hex quads to UTF-8.
   Returns the number of bytes in the result,
   or -1 if an error occurs.
*/
int hex4_utf8(char *dst, char *src)
{
    int  d, c, n = 0;

    if (!src)
        return -1;

    while (*src) {
        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c = d << 12;

        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c += d << 8;

        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c += d << 4;

        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c += d;

        if (c < 128) {
            if (dst) {
                dst[n++] = c;
            } else
                n++;
        } else
        if (c < 2048) {
            if (dst) {
                dst[n++] = 192 + (c >> 6);
                dst[n++] = 128 + (c & 63);
            } else
                n += 2;
        } else {
            if (dst) {
                dst[n++] = 224 + (c >> 12);
                dst[n++] = 128 + ((c >> 6) & 63);
                dst[n++] = 128 + (c & 63);
            } else
                n += 3;
        }
    }

    if (dst)
        dst[n] = '\0';

    return n;
}

int main(int argc, char *argv[])
{
    char *dst = NULL;
    int   max = 0;
    int   arg, len, tmp;

    for (arg = 1; arg < argc; arg++) {

        len = hex4_utf8(dst, argv[arg]);
        if (len < 0) {
            fprintf(stderr, "%s: Not a hex string.\n", argv[arg]);
            return EXIT_FAILURE;
        }

        if (len >= max) {
            free(dst);
            dst = malloc(len + 1);
            if (!dst) {
                fprintf(stderr, "Out of memory.\n");
                return EXIT_FAILURE;
            }
            tmp = hex4_utf8(dst, argv[arg]);
            if (tmp != len) {
                fprintf(stderr, "%s: BUG: String modified unexpectedly.\n", argv[arg]);
                return EXIT_FAILURE;
            }
        }

        printf("%s = %s\n", argv[arg], dst);
    }

    return EXIT_SUCCESS;
}
You only need the hex4_utf8(destination, source) and hexdigit(character) functions.  They do not need any library support, and will work even in freestanding mode (without the C library, no <stdlib.h> or <stdio.h> needed).  Only the example main() uses <stdlib.h> and <stdio.h>.

The destination string will have at most 3/4 of the bytes of the source string (excluding the nul end-of-string; each quad of hex digits converts to one-, two-, or three-byte UTF-8 character), and it can do the conversion in place.

If you do not need the number of bytes in the result, and you never call hex4_utf8() with a NULL destination, I suggest that in Keil, you try something like
Code: [Select]
#include <stdio.h>

static inline signed char hexdigit(const char c)
{
    if (c >= '0' && c <= '9')
        return c - '0';
    if (c >= 'A' && c <= 'F')
        return c - 'A' + 10;
    if (c >= 'a' && c <= 'f')
        return c - 'a' + 10;
    return -1;
}

int hex4_utf8(char *dst, char *src)
{
    int  d, c;

    while (*src) {
        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c = d << 12;

        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c += d << 8;

        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c += d << 4;

        d = hexdigit(*(src++));
        if (d < 0)
            return -1;
        else
            c += d;

        if (c < 128) {
            *(dst++) = c;
        } else
        if (c < 2048) {
            *(dst++) = 192 + (c >> 6);
            *(dst++) = 128 + (c & 63);
        } else {
            *(dst++) = 224 + (c >> 12);
            *(dst++) = 128 + ((c >> 6) & 63);
            *(dst++) = 128 + (c & 63);
        }
    }

    *dst = '\0';
    return 0;
}

void main(void)
{
    char msg[] = "0627064A06310627064606330644";
    int  n;
    if (hex4_utf8(msg, msg) < 0) {
        printf("Failed!\n");
    } else {
        printf("%s\n", msg);
        for (n = 0; msg[n] != '\0'; n++)
            printf("msg[%d] = %02x\n", n, msg[n]);
    }
    while (1) ;
}
It should output
    ايرانسل
    msg[0] = d8
    msg[1] = a7
    msg[2] = d9
    msg[3] = 8a
    msg[4] = d8
    msg[5] = b1
    msg[6] = d8
    msg[7] = a7
    msg[8] = d9
    msg[9] = 86
    msg[10] = d8
    msg[11] = b3
    msg[12] = d9
    msg[13] = 84
However, I don't have the Keil C compiler, and I'm too lazy to install anything, so you'll have to test it yourself.  I have only verified that the hexdigit() and hex4_utf8() functions work as expected.
« Last Edit: October 15, 2019, 04:13:44 pm by Nominal Animal »
 

Online ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1900
  • Country: ca
Re: Unicode printf and scanf from SIM800C
« Reply #6 on: October 16, 2019, 06:45:26 am »
Big thumbs up Nominal Animal :-+
I will test it and report it here.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf