here is my foo with memset included on a P12F609
_memset:
;__Lib_CString.c,77 ::
;__Lib_CString.c,80 ::
0x0009 0x1283 BCF STATUS, 5
0x000A 0x084B MOVF FARG_memset_p1, 0
0x000B 0x00F2 MOVWF R2
;__Lib_CString.c,81 ::
L_memset20:
0x000C 0x084D MOVF FARG_memset_n, 0
0x000D 0x00F0 MOVWF R0
0x000E 0x084E MOVF FARG_memset_n+1, 0
0x000F 0x00F1 MOVWF R0+1
0x0010 0x3001 MOVLW 1
0x0011 0x02CD SUBWF FARG_memset_n, 1
0x0012 0x1C03 BTFSS STATUS, 0
0x0013 0x03CE DECF FARG_memset_n+1, 1
0x0014 0x0870 MOVF R0, 0
0x0015 0x0471 IORWF R0+1, 0
0x0016 0x1903 BTFSC STATUS, 2
0x0017 0x281E GOTO L_memset21
;__Lib_CString.c,82 ::
0x0018 0x0872 MOVF R2, 0
0x0019 0x0084 MOVWF FSR
0x001A 0x084C MOVF FARG_memset_character, 0
0x001B 0x0080 MOVWF INDF
0x001C 0x0AF2 INCF R2, 1
0x001D 0x280C GOTO L_memset20
L_memset21:
;__Lib_CString.c,83 ::
0x001E 0x084B MOVF FARG_memset_p1, 0
0x001F 0x00F0 MOVWF R0
;__Lib_CString.c,84 ::
L_end_memset:
0x0020 0x0008 RETURN
; end of _memset
_foo:
;MyProject.c,1 :: void foo(unsigned long *p) {
;MyProject.c,2 :: memset(p, 0, 4);
0x0021 0x1283 BCF STATUS, 5
0x0022 0x084A MOVF FARG_foo_p, 0
0x0023 0x00CB MOVWF FARG_memset_p1
0x0024 0x01CC CLRF FARG_memset_character
0x0025 0x3004 MOVLW 4
0x0026 0x00CD MOVWF FARG_memset_n
0x0027 0x3000 MOVLW 0
0x0028 0x00CE MOVWF FARG_memset_n+1
0x0029 0x2009 CALL _memset
;MyProject.c,3 :: }
L_end_foo:
0x002A 0x0008 RETURN
; end of _foo
Seems it has no trouble at all, and that's as primitive as a chip can be.
Edit: But yeah it can be optimized if you are setting a 32bit unsigned integer and your processor can clear it that way. But memcpy and strcpy can be optimized that way.
Still memcpy, memset, strcpy etc are not part of C, implementation aside.