Discussion:
UNZIP for Z80 CP/M updated to support Deflate compression
(too old to reply)
Tony Nicholson
2020-09-09 01:09:34 UTC
Permalink
I've successfully applied Martin's recent updates to UNZIP
for CP/M (Z80 only) to remedy a few more bugs and implement an
UnDeflate enhancement.

You'll find updated source-code and a pre-built CP/M binary
at my GitHub at

https://github.com/agn453/UNZIP-CPM-Z80

You can download both the source and binary directly from

https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip152.lbr

I've only tested it on a few .ZIP files and it seems to work.

The later version of UNZIP with Z-System support (ZCPR3) will
be updated soon (I'll need to manually apply the context
differences).

You can report problems by replying here or use the GitHub Issues
tracking feature.

regards
Tony
Martin
2020-09-09 04:36:38 UTC
Permalink
Post by Tony Nicholson
I've successfully applied Martin's recent updates to UNZIP
for CP/M (Z80 only) to remedy a few more bugs and implement an
UnDeflate enhancement.
Thanks, Tony!

The file handling code in UNZIP 1.8 is quite different.
I hope you find an easy way to fit my adaptions.

Martin
Tony Nicholson
2020-09-09 06:31:21 UTC
Permalink
Post by Tony Nicholson
I've successfully applied Martin's recent updates to UNZIP
for CP/M (Z80 only) to remedy a few more bugs and implement an
UnDeflate enhancement.
You'll find updated source-code and a pre-built CP/M binary
at my GitHub at
https://github.com/agn453/UNZIP-CPM-Z80
You can download both the source and binary directly from
https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip152.lbr
I've only tested it on a few .ZIP files and it seems to work.
The later version of UNZIP with Z-System support (ZCPR3) will
be updated soon (I'll need to manually apply the context
differences).
You can report problems by replying here or use the GitHub Issues
tracking feature.
regards
Tony
I've just fixed some string line terminations following a
"call ilprt" ( changing \r\n\0 to CR,LF,0 ) in the new
UNZIP V1.5-2. GitHub has been updated - so if you fetched
a copy of the new version prior to this message, please
fetch another copy.

Tony
Martin
2020-09-09 17:41:04 UTC
Permalink
Post by Tony Nicholson
Post by Tony Nicholson
I've successfully applied Martin's recent updates to UNZIP
for CP/M (Z80 only) to remedy a few more bugs and implement an
UnDeflate enhancement.
You'll find updated source-code and a pre-built CP/M binary
at my GitHub at
https://github.com/agn453/UNZIP-CPM-Z80
You can download both the source and binary directly from
https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip152.lbr
I've only tested it on a few .ZIP files and it seems to work.
The later version of UNZIP with Z-System support (ZCPR3) will
be updated soon (I'll need to manually apply the context
differences).
You can report problems by replying here or use the GitHub Issues
tracking feature.
regards
Tony
I've just fixed some string line terminations following a
"call ilprt" ( changing \r\n\0 to CR,LF,0 ) in the new
UNZIP V1.5-2. GitHub has been updated - so if you fetched
a copy of the new version prior to this message, please
fetch another copy.
Tony
Thanks for doing all the collecting and publishing!

Please let me make a clarification of its history...

The Hitech-C port was done by *ME* as a precursor of the Z80 translation.
I started with "degzip_portable.c" from Keir Fraser's github repository.

"degzip_portable.c" was a modification of Fraser's own "degzip_gnu.c"
made by "phx" for "vbcc on Amiga 68k".

The original code was memory based, both the compressed and the
uncompressed data had to be stored in memory.

Without my modifications it was unsuitable for porting to CP/M!

You find the complete source a few weeks ago in:
"Public Domain "gunzip" for CP/M (C source for Hitech-C)".

Martin
Lawrence Nelson
2020-10-12 22:11:48 UTC
Permalink
Post by Martin
Post by Tony Nicholson
Post by Tony Nicholson
I've successfully applied Martin's recent updates to UNZIP
for CP/M (Z80 only) to remedy a few more bugs and implement an
UnDeflate enhancement.
You'll find updated source-code and a pre-built CP/M binary
at my GitHub at
https://github.com/agn453/UNZIP-CPM-Z80
You can download both the source and binary directly from
https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip152.lbr
I've only tested it on a few .ZIP files and it seems to work.
The later version of UNZIP with Z-System support (ZCPR3) will
be updated soon (I'll need to manually apply the context
differences).
You can report problems by replying here or use the GitHub Issues
tracking feature.
regards
Tony
I've just fixed some string line terminations following a
"call ilprt" ( changing \r\n\0 to CR,LF,0 ) in the new
UNZIP V1.5-2. GitHub has been updated - so if you fetched
a copy of the new version prior to this message, please
fetch another copy.
Tony
Thanks for doing all the collecting and publishing!
Please let me make a clarification of its history...
The Hitech-C port was done by *ME* as a precursor of the Z80 translation.
I started with "degzip_portable.c" from Keir Fraser's github repository.
"degzip_portable.c" was a modification of Fraser's own "degzip_gnu.c"
made by "phx" for "vbcc on Amiga 68k".
The original code was memory based, both the compressed and the
uncompressed data had to be stored in memory.
Without my modifications it was unsuitable for porting to CP/M!
"Public Domain "gunzip" for CP/M (C source for Hitech-C)".
Martin
Martin & Tony

Many thanks for making UNZIP152 available with support for the deflate compression algorithm. Downloaded it from GITHUB and tried it out. Tried it out on zip files that were compressed with four variants of Deflate (max, normal, fast and super fast). Able to extract all files no problem. However the CRC algorithm in UNZIP152 is God awful slow. Makes using unzip on CP/M a painful process. It appears that the CRC algorithm runs after decompression and takes forever.

On a related UNZIP topic, there is a Z-system version of UNZIP by Simon Cran that is derived from UNZIP12. Simon cleaned up the code and improved the look of the screen as well as making it a true Zsystem app. Since it is derived from UNZIP12 it will contain the same bugs and of course it can't do deflate. Simon calls it UNZIPZ02. Have you guys ever looked at this code? Just wondering.

Lars
Martin
2020-10-13 04:44:17 UTC
Permalink
Post by Lawrence Nelson
Martin & Tony
Many thanks for making UNZIP152 available with support for the deflate compression algorithm. Downloaded it from GITHUB and tried it out. Tried it out on zip files that were compressed with four variants of Deflate (max, normal, fast and super fast). Able to extract all files no problem. However the CRC algorithm in UNZIP152 is God awful slow. Makes using unzip on CP/M a painful process. It appears that the CRC algorithm runs after decompression and takes forever.
On a related UNZIP topic, there is a Z-system version of UNZIP by Simon Cran that is derived from UNZIP12. Simon cleaned up the code and improved the look of the screen as well as making it a true Zsystem app. Since it is derived from UNZIP12 it will contain the same bugs and of course it can't do deflate. Simon calls it UNZIPZ02. Have you guys ever looked at this code? Just wondering.
Lars
Yes, in my post at: Fri, 05 Jun 2020 05:59:46 +0200.

If called on non Z-system, it does not work.
I have also posted a fix for that.

Deflate support:
A "job" for the Z-system fans out here?


Martin
Russell Marks
2020-10-13 17:34:43 UTC
Permalink
[...]
Post by Lawrence Nelson
Post by Tony Nicholson
https://github.com/agn453/UNZIP-CPM-Z80
[...]
Post by Lawrence Nelson
Many thanks for making UNZIP152 available with support for the deflate
compression algorithm. Downloaded it from GITHUB and tried it out.
Tried it out on zip files that were compressed with four variants of
Deflate (max, normal, fast and super fast). Able to extract all files
no problem. However the CRC algorithm in UNZIP152 is God awful slow.
Makes using unzip on CP/M a painful process. It appears that the CRC
algorithm runs after decompression and takes forever.
While I do agree that unzip is slow on deflate files as might be
expected (it's great to have support now though, personally I was
amazed that this had happened), I'm not sure about the CRC point
there. The bulk of the CRC code seems to be called on a per-byte basis
during decompression. Or perhaps that's what you meant. Anyway, a
32-bit CRC is probably never going to be very fast on a Z80, I would
guess.

I have to say - maybe directing this more at Tony/Martin? - just
looking at the code generally my first thought on optimising for speed
is that "readbits" is understandably a bit of a hotspot and could
stand a revision or two (some of this being run per bit!). Here's a
quick go at that which doesn't use stack at all in the basic per-bit
part and which I reckon is about 30% quicker:

(I haven't tested this much yet, but it worked on a simple
deflate-only test zip.)

-------------------- unzip152-readbits.diff --------------------
--- UNZIP152.Z80 2020-10-13 16:02:05.906861502 +0100
+++ unzip152-new.z80 2020-10-13 17:27:12.692912388 +0100
@@ -490,36 +490,42 @@
getcode:
ld a,(codesize)
readbits:
- ld hl,8000h
-bitlp: push af
- push hl
-getbit: ld hl,bleft
- dec (hl)
+ ;saving bc/de may not be required, callers seem to save it?
+ push bc
+ push de
+ ld b,a
+ ld de,8000h
+ ld hl,bleft
+getbit: dec (hl)
jp m,readbt
- dec hl
+ dec hl ;point to bitbuf
rr (hl)
- pop hl
- rr h
- rr l
- jr c,bitex
- pop af
- dec a
- jr nz,bitlp
-finbit: srl h
- rr l
- jr nc,finbit
- jr bitret
-bitex: pop af
-bitret: ld a,l
+ rr d
+ rr e
+ jr c,bitret
+ inc hl ;point back to bleft, faster than ld hl,NN
+ djnz getbit
+finbit: srl d
+ rr e
+ jp nc,finbit ;jp likely faster in this case
+bitret: ex de,hl ;return in HL and A
+ ld a,l
+ pop de
+ pop bc
ret
;
-readbt: push hl
+;should be worth having this per-byte reading slower (more stack use)
+;to keep the per-bit reading above faster. Or is exx an option?
+readbt: push bc
+ push de
call getbyte
- pop hl
- ld (hl),8
- dec hl
+ pop de
+ pop bc
+ ld hl,bitbuf
ld (hl),a
- jr getbit
+ inc hl ;point to bleft
+ ld (hl),8
+ jp getbit ;jp for speed
;
scanfn: ld a,(de)
cp '.'
@@ -2237,6 +2243,7 @@
ds 24
mtchfcb:
ds 11
+;note that as indicated above, bitbuf must be the byte before bleft
bitbuf: ds 1
vars:
bleft: ds 1
-------------------- unzip152-readbits.diff --------------------


In case tabs prove to be an issue (and for readability), here's the
revised readbits in full:

-------------------- readbits.txt --------------------
readbits:
;saving bc/de may not be required, callers seem to save it?
push bc
push de
ld b,a
ld de,8000h
ld hl,bleft
getbit: dec (hl)
jp m,readbt
dec hl ;point to bitbuf
rr (hl)
rr d
rr e
jr c,bitret
inc hl ;point back to bleft, faster than ld hl,NN
djnz getbit
finbit: srl d
rr e
jp nc,finbit ;jp likely faster in this case
bitret: ex de,hl ;return in HL and A
ld a,l
pop de
pop bc
ret
;
;should be worth having this per-byte reading slower (more stack use)
;to keep the per-bit reading above faster. Or is exx an option?
readbt: push bc
push de
call getbyte
pop de
pop bc
ld hl,bitbuf
ld (hl),a
inc hl ;point to bleft
ld (hl),8
jp getbit ;jp for speed
-------------------- readbits.txt --------------------


I'm assuming the routine didn't really need to save flags across
invocation as it happened to sometimes previously, I couldn't spot
anything that took advantage of that.

Also, I think it might be worth moving bitbuf/bleft to just before
"start" if possible, so both are very near to 0100h and thus
definitely in the same page, then the inc hl/dec hl ops above could
become slightly faster inc l/dec l ones instead.

-Rus.
Russell Marks
2020-10-14 12:22:17 UTC
Permalink
Post by Russell Marks
[...]
Post by Tony Nicholson
https://github.com/agn453/UNZIP-CPM-Z80
[...]
Post by Russell Marks
I have to say - maybe directing this more at Tony/Martin? - just
looking at the code generally my first thought on optimising for speed
is that "readbits" is understandably a bit of a hotspot and could
Here's another version of readbits which should be slightly quicker
again. Disappointingly, a quick test in Virtual Kaypro suggests that
this version only makes deflate about 7% quicker overall (comparing to
UNZIP152), but it should benefit the other methods too at least.

-------------------- readbits2.txt --------------------
readbits:
push bc ; may not need to save bc?
ld b,a
ld c,80h ; bits rotate into C and A
xor a ; (rra is 4 cycles vs 8 for others)
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
getbit: dec h
jp m,readbt ; read next byte if needed
rr l
rr c
rra
jr c,bitret
djnz getbit
finbit: srl c
rra
jp nc,finbit ; jp likely faster in this case
bitret: ld (bitbuf),hl ; update bitbuf/bleft
ld h,c ; return bits in HL and A
ld l,a
pop bc
ret
;
; should be worth having this per-byte reading slower (more stack use)
; to keep the per-bit reading above faster. Or is exx an option?
;
readbt: push af
push bc
call getbyte
ld l,a ; new bitbuf
ld h,8 ; 8 bits left
pop bc
pop af
jp getbit ; jp for speed
-------------------- readbits2.txt --------------------

I did have a quick look at the "huffman" routine (which is basically
what deflate runs), but that looks hard to optimise much. Maybe the
odd unconditional jr could become jp (for 10 cycles rather than 12)
but the difference would be absolutely minimal.

-Rus.
Russell Marks
2020-10-14 14:17:02 UTC
Permalink
Post by Russell Marks
[...]
Post by Tony Nicholson
https://github.com/agn453/UNZIP-CPM-Z80
[...]
Post by Russell Marks
I have to say - maybe directing this more at Tony/Martin? - just
looking at the code generally my first thought on optimising for speed
is that "readbits" is understandably a bit of a hotspot and could
Yet another version of readbits, I missed a rather obvious
optimisation or two, most notably quite a few calls to readbits only
need a byte or less rather than a word. So I added a new "rdbybits" to
be used only when <=8 bits are required. This makes for a more
noticeable change vs. UNZIP152, it's about 20% quicker. More like what
I'd expect from optimising a hotspot routine. :-)

First up the revised/new routines, then the full patch against
UNZIP152 with obvious byte-sized readbits calls changed to use
rdbybits.

(There are two other changes in the patch - a simple TPA-size check as
I didn't notice one in there already, and a renaming of the "V" label
to "urV" to keep Unix zmac happy. I've already been using the latter
change personally and it seems harmless enough. The bad news is this
patch does push the COM size past 4k, but I think dropping the
TPA-size check would be enough to squeeze it into 4k again.)

-------------------- readbits3.txt --------------------
readbits:
push bc ; may not need to save bc?
ld b,a
ld c,80h ; bits rotate into C and A
xor a ; (rra is 4 cycles vs 8 for others)
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
getbit: dec h
jp p,getbt2 ; skip if new byte not needed yet
push af
push bc
call getbyte
ld l,a ; new bitbuf
ld h,7 ; 8 bits left, pre-dec'd
pop bc
pop af
getbt2: rr l
rr c
rra
jr c,bitret
djnz getbit
finbit: srl c
rra
jp nc,finbit ; jp likely faster in this case
bitret: ld (bitbuf),hl ; update bitbuf/bleft
ld h,c ; return bits in HL and A
ld l,a
pop bc
ret
;
; rdbybits - faster version of readbits for <=8 bits
;
rdbybits:
push bc ; may not need to save bc?
ld b,a
ld a,80h ; bits rotate into A (rra faster)
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
rdbylp: dec h
jp p,rdby1 ; skip if new byte not needed yet
ld c,a
push bc
call getbyte
ld l,a ; new bitbuf
ld h,7 ; 8 bits left, pre-dec'd
pop bc
ld a,c
rdby1: rr l
rra
jr c,rdbyrt
djnz rdbylp
or a ; clear carry flag initially
rdby2: rra ; safe as dropped bits are all zeroes
jp nc,rdby2 ; jp likely faster in this case
rdbyrt: ld (bitbuf),hl ; update bitbuf/bleft
ld h,0 ; return bits in HL and A
ld l,a
pop bc
ret
-------------------- readbits3.txt --------------------

The patch:

-------------------- unzip152-readbits-better.diff --------------------
--- UNZIP152.Z80 2020-10-13 16:02:05.906861502 +0100
+++ unzip152-new2.z80 2020-10-14 14:46:25.667325030 +0100
@@ -150,6 +150,19 @@
cp '/'
jp z,usage
;
+; Check TPA size (this will need adjusting if warm-boot-only exit
+; is changed).
+;
+ ld hl,-128 ; allow for a decent stack size
+ add hl,sp
+ ld de,endaddr
+ or a
+ sbc hl,de ; check endaddr is less (i.e. hl is >=)
+ jr nc,wasfil
+ call ilprt
+ db 'Low mem',0 ; just a short error msg to give the idea
+ jp exit
+;
wasfil: ld de,altfcb
ld a,(de) ; output drive given?
ld (opfcb),a ; store it in output file control block
@@ -490,36 +503,62 @@
getcode:
ld a,(codesize)
readbits:
- ld hl,8000h
-bitlp: push af
- push hl
-getbit: ld hl,bleft
- dec (hl)
- jp m,readbt
- dec hl
- rr (hl)
- pop hl
- rr h
- rr l
- jr c,bitex
+ push bc ; may not need to save bc?
+ ld b,a
+ ld c,80h ; bits rotate into C and A
+ xor a ; (rra is 4 cycles vs 8 for others)
+ ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
+getbit: dec h
+ jp p,getbt2 ; skip if new byte not needed yet
+ push af
+ push bc
+ call getbyte
+ ld l,a ; new bitbuf
+ ld h,7 ; 8 bits left, pre-dec'd
+ pop bc
pop af
- dec a
- jr nz,bitlp
-finbit: srl h
- rr l
- jr nc,finbit
- jr bitret
-bitex: pop af
-bitret: ld a,l
+getbt2: rr l
+ rr c
+ rra
+ jr c,bitret
+ djnz getbit
+finbit: srl c
+ rra
+ jp nc,finbit ; jp likely faster in this case
+bitret: ld (bitbuf),hl ; update bitbuf/bleft
+ ld h,c ; return bits in HL and A
+ ld l,a
+ pop bc
ret
;
-readbt: push hl
+; rdbybits - faster version of readbits for <=8 bits
+;
+rdbybits:
+ push bc ; may not need to save bc?
+ ld b,a
+ ld a,80h ; bits rotate into A (rra faster)
+ ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
+rdbylp: dec h
+ jp p,rdby1 ; skip if new byte not needed yet
+ ld c,a
+ push bc
call getbyte
- pop hl
- ld (hl),8
- dec hl
- ld (hl),a
- jr getbit
+ ld l,a ; new bitbuf
+ ld h,7 ; 8 bits left, pre-dec'd
+ pop bc
+ ld a,c
+rdby1: rr l
+ rra
+ jr c,rdbyrt
+ djnz rdbylp
+ or a ; clear carry flag initially
+rdby2: rra ; safe as dropped bits are all zeroes
+ jp nc,rdby2 ; jp likely faster in this case
+rdbyrt: ld (bitbuf),hl ; update bitbuf/bleft
+ ld h,0 ; return bits in HL and A
+ ld l,a
+ pop bc
+ ret
;
scanfn: ld a,(de)
cp '.'
@@ -984,7 +1023,7 @@
lflp: push bc
push hl
ld a,6
- call readbits
+ call rdbybits
pop hl
pop de
ld (hl),a
@@ -999,7 +1038,7 @@
ldfllp: push hl
push bc
ld a,8
- call readbits
+ call rdbybits
pop bc
pop hl
ld (hl),a
@@ -1035,11 +1074,11 @@
or a
jr nz,ur2
ur4: ld a,8
- call readbits
+ call rdbybits
jr ur3
;
ur2: ld a,1
- call readbits
+ call rdbybits
dec l
jr z,ur4
call slenlch
@@ -1073,7 +1112,7 @@
ld a,l
or a
jr z,ur10
- ld (V),a
+ ld (urV),a
ld a,(L_table)
ld h,a
and l
@@ -1106,7 +1145,7 @@
jr nz,ur13
ld a,(D_shift)
ld b,a
- ld a,(V)
+ ld a,(urV)
ur14: srl a
djnz ur14
ld h,a
@@ -1191,7 +1230,7 @@
;
readlengths:
ld a,8
- call readbits
+ call rdbybits
ld d,h
ld e,d
inc hl
@@ -1211,11 +1250,11 @@
push de
push hl
ld a,4
- call readbits
+ call rdbybits
inc a
push af
ld a,4
- call readbits
+ call rdbybits
inc a
ld b,a
pop af
@@ -1412,7 +1451,7 @@
push de
push bc
ld a,1
- call readbits
+ call rdbybits
pop af
push af
or a
@@ -1487,7 +1526,7 @@
jr ui4
;
ui3: ld a,8
- call readbits
+ call rdbybits
ui4: call outb
jr ui1
;
@@ -1512,7 +1551,7 @@
jr nz,ui6
push hl
ld a,8
- call readbits
+ call rdbybits
pop de
add hl,de
ui6: ld de,(mml)
@@ -1529,7 +1568,7 @@
ld (treep),hl
nsloop: push hl
ld a,1
- call readbits
+ call rdbybits
pop hl
or a
jr z,nsleft
@@ -1730,19 +1769,19 @@
;
huffman:
ld a,5
- call readbits
+ call rdbybits
inc a
ld l,a
ld h,1
ld (hlit),hl

ld a,5
- call readbits
+ call rdbybits
inc a
ld (hdist),a

ld a,4
- call readbits
+ call rdbybits
add a,4
ld c,a

@@ -1754,7 +1793,7 @@
push bc
push de
ld a,3
- call readbits
+ call rdbybits
pop hl
ld c,(hl)
ld b,0
@@ -1805,7 +1844,7 @@
cp 010h
jr nz,hmn16
ld a,2
- call readbits
+ call rdbybits
pop hl
pop bc
add a,3
@@ -1823,7 +1862,7 @@
hmn16: cp 011h
jr nz,hmn17
ld a,3
- call readbits
+ call rdbybits
pop hl
pop bc
add a,3
@@ -1839,7 +1878,7 @@
hmn17: cp 012h
jr nz,hmn18
ld a,7
- call readbits
+ call rdbybits
pop hl
pop bc
add a,11
@@ -1965,11 +2004,11 @@
ret nz

ld a,1
- call readbits
+ call rdbybits
push af

ld a,2
- call readbits
+ call rdbybits
or a
jr nz,udnt0

@@ -2237,6 +2276,7 @@
ds 24
mtchfcb:
ds 11
+; note that as indicated above, bitbuf must be the byte before bleft
bitbuf: ds 1
vars:
bleft: ds 1
@@ -2250,7 +2290,7 @@
ds 1
D_shift:
ds 1
-V: ds 1
+urV: ds 1
nchar: ds 1
lchar: ds 1
ExState:
@@ -2311,5 +2351,5 @@
disttr: ds 4 * nrdist
endtr:
ds 8192 + 2 - (endtr - lenld)
-
+endaddr: ; must be no vars/data beyond this point
end
-------------------- unzip152-readbits-better.diff --------------------

-Rus.
Russell Marks
2020-10-15 02:02:45 UTC
Permalink
Post by Russell Marks
[...]
Post by Tony Nicholson
https://github.com/agn453/UNZIP-CPM-Z80
[...]
Post by Russell Marks
I have to say - maybe directing this more at Tony/Martin? - just
looking at the code generally my first thought on optimising for speed
Here we go again. :-) I thought a constant in the degzip_portable.c
that the deflate code is based on looked familiar, and it is - so I
ported over the table-based CRC code from that. Now this is
"expensive" as the table is 1k long and pushes the COM file slightly
past 5k, but extracting my deflate test zip is 32% quicker with this
(combined with my previous changes) as compared with UNZIP152. It
would be possible to construct the table at runtime of course, but on
a Z80 I imagine a precalculated table might be for the best.

(While I'm posting I may as well note that I ran the version *before*
this table-CRC change against every zip on the Walnut Creek CP/M CD
earlier on, with no CRC errors. Obviously the goal there was to check
the non-deflate code is still working ok, and it seems to be.)

Here's the overall patch against UNZIP152.Z80:

--------------- unzip152-rdbybits-and-crc32tab.diff ---------------
--- UNZIP152.Z80 2020-10-13 16:02:05.906861502 +0100
+++ unzip152-new3.z80 2020-10-15 02:27:13.342645445 +0100
@@ -150,6 +150,19 @@
cp '/'
jp z,usage
;
+; Check TPA size (this will need adjusting if warm-boot-only exit
+; is changed).
+;
+ ld hl,-128 ; allow for a decent stack size
+ add hl,sp
+ ld de,endaddr
+ or a
+ sbc hl,de ; check endaddr is less (i.e. hl is >=)
+ jr nc,wasfil
+ call ilprt
+ db 'Low mem',0 ; just a short error msg to give the idea
+ jp exit
+;
wasfil: ld de,altfcb
ld a,(de) ; output drive given?
ld (opfcb),a ; store it in output file control block
@@ -490,36 +503,62 @@
getcode:
ld a,(codesize)
readbits:
- ld hl,8000h
-bitlp: push af
- push hl
-getbit: ld hl,bleft
- dec (hl)
- jp m,readbt
- dec hl
- rr (hl)
- pop hl
- rr h
- rr l
- jr c,bitex
+ push bc ; may not need to save bc?
+ ld b,a
+ ld c,80h ; bits rotate into C and A
+ xor a ; (rra is 4 cycles vs 8 for others)
+ ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
+getbit: dec h
+ jp p,getbt2 ; skip if new byte not needed yet
+ push af
+ push bc
+ call getbyte
+ ld l,a ; new bitbuf
+ ld h,7 ; 8 bits left, pre-dec'd
+ pop bc
pop af
- dec a
- jr nz,bitlp
-finbit: srl h
- rr l
- jr nc,finbit
- jr bitret
-bitex: pop af
-bitret: ld a,l
+getbt2: rr l
+ rr c
+ rra
+ jr c,bitret
+ djnz getbit
+finbit: srl c
+ rra
+ jp nc,finbit ; jp likely faster in this case
+bitret: ld (bitbuf),hl ; update bitbuf/bleft
+ ld h,c ; return bits in HL and A
+ ld l,a
+ pop bc
ret
;
-readbt: push hl
+; rdbybits - faster version of readbits for <=8 bits
+;
+rdbybits:
+ push bc ; may not need to save bc?
+ ld b,a
+ ld a,80h ; bits rotate into A (rra faster)
+ ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
+rdbylp: dec h
+ jp p,rdby1 ; skip if new byte not needed yet
+ ld c,a
+ push bc
call getbyte
- pop hl
- ld (hl),8
- dec hl
- ld (hl),a
- jr getbit
+ ld l,a ; new bitbuf
+ ld h,7 ; 8 bits left, pre-dec'd
+ pop bc
+ ld a,c
+rdby1: rr l
+ rra
+ jr c,rdbyrt
+ djnz rdbylp
+ or a ; clear carry flag initially
+rdby2: rra ; safe as dropped bits are all zeroes
+ jp nc,rdby2 ; jp likely faster in this case
+rdbyrt: ld (bitbuf),hl ; update bitbuf/bleft
+ ld h,0 ; return bits in HL and A
+ ld l,a
+ pop bc
+ ret
;
scanfn: ld a,(de)
cp '.'
@@ -716,34 +755,37 @@
ld (hl),a
ret
;
-updcrc: ld hl,(crc32)
+; based on this from crc32() in degzip_portable.c:
+; for (i = 0; i < len; i++)
+; crc = crc32_tab[(uint8_t)(crc ^ *b++)] ^ (crc >> 8);
+;
+updcrc: ld bc,(crc32)
+ xor c ; A=low byte of crc xor output byte
+ ld h,0
+ ld l,a
+ add hl,hl ; *2
+ add hl,hl ; *4
+ ld de,crc32tab
+ add hl,de
ld de,(crc32 + 2)
+ ; now DEBC is "crc", and HL points to low byte of
+ ; relevant crc32tab entry. Do the xor with "crc"/256,
+ ; starting from the low bytes.
+ ld a,(hl)
+ xor b
ld c,a
- ld b,8
-crclp: ld a,l
- xor c
- srl c
- srl d
- rr e
- rr h
- rr l
- rra
- jr nc,noxor
- ld a,d
- xor 0edh
- ld d,a
- ld a,e
- xor 0b8h
+ inc hl
+ ld a,(hl)
+ xor e
+ ld b,a
+ inc hl
+ ld a,(hl)
+ xor d
ld e,a
- ld a,h
- xor 83h
- ld h,a
- ld a,l
- xor 20h
- ld l,a
-noxor: djnz crclp
- ld (crc32),hl
+ inc hl
+ ld d,(hl) ; high byte is a simple copy
ld (crc32 + 2),de
+ ld (crc32),bc
ret
;
unshrink:
@@ -984,7 +1026,7 @@
lflp: push bc
push hl
ld a,6
- call readbits
+ call rdbybits
pop hl
pop de
ld (hl),a
@@ -999,7 +1041,7 @@
ldfllp: push hl
push bc
ld a,8
- call readbits
+ call rdbybits
pop bc
pop hl
ld (hl),a
@@ -1035,11 +1077,11 @@
or a
jr nz,ur2
ur4: ld a,8
- call readbits
+ call rdbybits
jr ur3
;
ur2: ld a,1
- call readbits
+ call rdbybits
dec l
jr z,ur4
call slenlch
@@ -1073,7 +1115,7 @@
ld a,l
or a
jr z,ur10
- ld (V),a
+ ld (urV),a
ld a,(L_table)
ld h,a
and l
@@ -1106,7 +1148,7 @@
jr nz,ur13
ld a,(D_shift)
ld b,a
- ld a,(V)
+ ld a,(urV)
ur14: srl a
djnz ur14
ld h,a
@@ -1191,7 +1233,7 @@
;
readlengths:
ld a,8
- call readbits
+ call rdbybits
ld d,h
ld e,d
inc hl
@@ -1211,11 +1253,11 @@
push de
push hl
ld a,4
- call readbits
+ call rdbybits
inc a
push af
ld a,4
- call readbits
+ call rdbybits
inc a
ld b,a
pop af
@@ -1412,7 +1454,7 @@
push de
push bc
ld a,1
- call readbits
+ call rdbybits
pop af
push af
or a
@@ -1487,7 +1529,7 @@
jr ui4
;
ui3: ld a,8
- call readbits
+ call rdbybits
ui4: call outb
jr ui1
;
@@ -1512,7 +1554,7 @@
jr nz,ui6
push hl
ld a,8
- call readbits
+ call rdbybits
pop de
add hl,de
ui6: ld de,(mml)
@@ -1529,7 +1571,7 @@
ld (treep),hl
nsloop: push hl
ld a,1
- call readbits
+ call rdbybits
pop hl
or a
jr z,nsleft
@@ -1730,19 +1772,19 @@
;
huffman:
ld a,5
- call readbits
+ call rdbybits
inc a
ld l,a
ld h,1
ld (hlit),hl

ld a,5
- call readbits
+ call rdbybits
inc a
ld (hdist),a

ld a,4
- call readbits
+ call rdbybits
add a,4
ld c,a

@@ -1754,7 +1796,7 @@
push bc
push de
ld a,3
- call readbits
+ call rdbybits
pop hl
ld c,(hl)
ld b,0
@@ -1805,7 +1847,7 @@
cp 010h
jr nz,hmn16
ld a,2
- call readbits
+ call rdbybits
pop hl
pop bc
add a,3
@@ -1823,7 +1865,7 @@
hmn16: cp 011h
jr nz,hmn17
ld a,3
- call readbits
+ call rdbybits
pop hl
pop bc
add a,3
@@ -1839,7 +1881,7 @@
hmn17: cp 012h
jr nz,hmn18
ld a,7
- call readbits
+ call rdbybits
pop hl
pop bc
add a,11
@@ -1965,11 +2007,11 @@
ret nz

ld a,1
- call readbits
+ call rdbybits
push af

ld a,2
- call readbits
+ call rdbybits
or a
jr nz,udnt0

@@ -2125,10 +2167,10 @@
counting:
db 0
init:
- db 0
- db 0
- dw 0,0
- dw -1,-1
+ db 0 ; for bleft
+ db 0 ; for wrtpt
+ dw 0,0 ; for outpos
+ dw -1,-1 ; for crc32
endinit:
inbufp: dw 0080h
readpt: db 80h
@@ -2211,6 +2253,135 @@
db 06dh, 0dbh, 0b6h, 06dh, 0dbh, 0b6h, 0cdh, 0dbh
db 0b6h, 06dh, 0dbh, 0b6h, 06dh, 0dbh, 0a8h, 06dh
db 0ceh, 08bh, 06dh, 03bh
+crc32tab: ; crc32_tab[] from degzip_portable.c, takes 1k
+ db 000h,000h,000h,000h,096h,030h,007h,077h
+ db 02Ch,061h,00Eh,0EEh,0BAh,051h,009h,099h
+ db 019h,0C4h,06Dh,007h,08Fh,0F4h,06Ah,070h
+ db 035h,0A5h,063h,0E9h,0A3h,095h,064h,09Eh
+ db 032h,088h,0DBh,00Eh,0A4h,0B8h,0DCh,079h
+ db 01Eh,0E9h,0D5h,0E0h,088h,0D9h,0D2h,097h
+ db 02Bh,04Ch,0B6h,009h,0BDh,07Ch,0B1h,07Eh
+ db 007h,02Dh,0B8h,0E7h,091h,01Dh,0BFh,090h
+ db 064h,010h,0B7h,01Dh,0F2h,020h,0B0h,06Ah
+ db 048h,071h,0B9h,0F3h,0DEh,041h,0BEh,084h
+ db 07Dh,0D4h,0DAh,01Ah,0EBh,0E4h,0DDh,06Dh
+ db 051h,0B5h,0D4h,0F4h,0C7h,085h,0D3h,083h
+ db 056h,098h,06Ch,013h,0C0h,0A8h,06Bh,064h
+ db 07Ah,0F9h,062h,0FDh,0ECh,0C9h,065h,08Ah
+ db 04Fh,05Ch,001h,014h,0D9h,06Ch,006h,063h
+ db 063h,03Dh,00Fh,0FAh,0F5h,00Dh,008h,08Dh
+ db 0C8h,020h,06Eh,03Bh,05Eh,010h,069h,04Ch
+ db 0E4h,041h,060h,0D5h,072h,071h,067h,0A2h
+ db 0D1h,0E4h,003h,03Ch,047h,0D4h,004h,04Bh
+ db 0FDh,085h,00Dh,0D2h,06Bh,0B5h,00Ah,0A5h
+ db 0FAh,0A8h,0B5h,035h,06Ch,098h,0B2h,042h
+ db 0D6h,0C9h,0BBh,0DBh,040h,0F9h,0BCh,0ACh
+ db 0E3h,06Ch,0D8h,032h,075h,05Ch,0DFh,045h
+ db 0CFh,00Dh,0D6h,0DCh,059h,03Dh,0D1h,0ABh
+ db 0ACh,030h,0D9h,026h,03Ah,000h,0DEh,051h
+ db 080h,051h,0D7h,0C8h,016h,061h,0D0h,0BFh
+ db 0B5h,0F4h,0B4h,021h,023h,0C4h,0B3h,056h
+ db 099h,095h,0BAh,0CFh,00Fh,0A5h,0BDh,0B8h
+ db 09Eh,0B8h,002h,028h,008h,088h,005h,05Fh
+ db 0B2h,0D9h,00Ch,0C6h,024h,0E9h,00Bh,0B1h
+ db 087h,07Ch,06Fh,02Fh,011h,04Ch,068h,058h
+ db 0ABh,01Dh,061h,0C1h,03Dh,02Dh,066h,0B6h
+ db 090h,041h,0DCh,076h,006h,071h,0DBh,001h
+ db 0BCh,020h,0D2h,098h,02Ah,010h,0D5h,0EFh
+ db 089h,085h,0B1h,071h,01Fh,0B5h,0B6h,006h
+ db 0A5h,0E4h,0BFh,09Fh,033h,0D4h,0B8h,0E8h
+ db 0A2h,0C9h,007h,078h,034h,0F9h,000h,00Fh
+ db 08Eh,0A8h,009h,096h,018h,098h,00Eh,0E1h
+ db 0BBh,00Dh,06Ah,07Fh,02Dh,03Dh,06Dh,008h
+ db 097h,06Ch,064h,091h,001h,05Ch,063h,0E6h
+ db 0F4h,051h,06Bh,06Bh,062h,061h,06Ch,01Ch
+ db 0D8h,030h,065h,085h,04Eh,000h,062h,0F2h
+ db 0EDh,095h,006h,06Ch,07Bh,0A5h,001h,01Bh
+ db 0C1h,0F4h,008h,082h,057h,0C4h,00Fh,0F5h
+ db 0C6h,0D9h,0B0h,065h,050h,0E9h,0B7h,012h
+ db 0EAh,0B8h,0BEh,08Bh,07Ch,088h,0B9h,0FCh
+ db 0DFh,01Dh,0DDh,062h,049h,02Dh,0DAh,015h
+ db 0F3h,07Ch,0D3h,08Ch,065h,04Ch,0D4h,0FBh
+ db 058h,061h,0B2h,04Dh,0CEh,051h,0B5h,03Ah
+ db 074h,000h,0BCh,0A3h,0E2h,030h,0BBh,0D4h
+ db 041h,0A5h,0DFh,04Ah,0D7h,095h,0D8h,03Dh
+ db 06Dh,0C4h,0D1h,0A4h,0FBh,0F4h,0D6h,0D3h
+ db 06Ah,0E9h,069h,043h,0FCh,0D9h,06Eh,034h
+ db 046h,088h,067h,0ADh,0D0h,0B8h,060h,0DAh
+ db 073h,02Dh,004h,044h,0E5h,01Dh,003h,033h
+ db 05Fh,04Ch,00Ah,0AAh,0C9h,07Ch,00Dh,0DDh
+ db 03Ch,071h,005h,050h,0AAh,041h,002h,027h
+ db 010h,010h,00Bh,0BEh,086h,020h,00Ch,0C9h
+ db 025h,0B5h,068h,057h,0B3h,085h,06Fh,020h
+ db 009h,0D4h,066h,0B9h,09Fh,0E4h,061h,0CEh
+ db 00Eh,0F9h,0DEh,05Eh,098h,0C9h,0D9h,029h
+ db 022h,098h,0D0h,0B0h,0B4h,0A8h,0D7h,0C7h
+ db 017h,03Dh,0B3h,059h,081h,00Dh,0B4h,02Eh
+ db 03Bh,05Ch,0BDh,0B7h,0ADh,06Ch,0BAh,0C0h
+ db 020h,083h,0B8h,0EDh,0B6h,0B3h,0BFh,09Ah
+ db 00Ch,0E2h,0B6h,003h,09Ah,0D2h,0B1h,074h
+ db 039h,047h,0D5h,0EAh,0AFh,077h,0D2h,09Dh
+ db 015h,026h,0DBh,004h,083h,016h,0DCh,073h
+ db 012h,00Bh,063h,0E3h,084h,03Bh,064h,094h
+ db 03Eh,06Ah,06Dh,00Dh,0A8h,05Ah,06Ah,07Ah
+ db 00Bh,0CFh,00Eh,0E4h,09Dh,0FFh,009h,093h
+ db 027h,0AEh,000h,00Ah,0B1h,09Eh,007h,07Dh
+ db 044h,093h,00Fh,0F0h,0D2h,0A3h,008h,087h
+ db 068h,0F2h,001h,01Eh,0FEh,0C2h,006h,069h
+ db 05Dh,057h,062h,0F7h,0CBh,067h,065h,080h
+ db 071h,036h,06Ch,019h,0E7h,006h,06Bh,06Eh
+ db 076h,01Bh,0D4h,0FEh,0E0h,02Bh,0D3h,089h
+ db 05Ah,07Ah,0DAh,010h,0CCh,04Ah,0DDh,067h
+ db 06Fh,0DFh,0B9h,0F9h,0F9h,0EFh,0BEh,08Eh
+ db 043h,0BEh,0B7h,017h,0D5h,08Eh,0B0h,060h
+ db 0E8h,0A3h,0D6h,0D6h,07Eh,093h,0D1h,0A1h
+ db 0C4h,0C2h,0D8h,038h,052h,0F2h,0DFh,04Fh
+ db 0F1h,067h,0BBh,0D1h,067h,057h,0BCh,0A6h
+ db 0DDh,006h,0B5h,03Fh,04Bh,036h,0B2h,048h
+ db 0DAh,02Bh,00Dh,0D8h,04Ch,01Bh,00Ah,0AFh
+ db 0F6h,04Ah,003h,036h,060h,07Ah,004h,041h
+ db 0C3h,0EFh,060h,0DFh,055h,0DFh,067h,0A8h
+ db 0EFh,08Eh,06Eh,031h,079h,0BEh,069h,046h
+ db 08Ch,0B3h,061h,0CBh,01Ah,083h,066h,0BCh
+ db 0A0h,0D2h,06Fh,025h,036h,0E2h,068h,052h
+ db 095h,077h,00Ch,0CCh,003h,047h,00Bh,0BBh
+ db 0B9h,016h,002h,022h,02Fh,026h,005h,055h
+ db 0BEh,03Bh,0BAh,0C5h,028h,00Bh,0BDh,0B2h
+ db 092h,05Ah,0B4h,02Bh,004h,06Ah,0B3h,05Ch
+ db 0A7h,0FFh,0D7h,0C2h,031h,0CFh,0D0h,0B5h
+ db 08Bh,09Eh,0D9h,02Ch,01Dh,0AEh,0DEh,05Bh
+ db 0B0h,0C2h,064h,09Bh,026h,0F2h,063h,0ECh
+ db 09Ch,0A3h,06Ah,075h,00Ah,093h,06Dh,002h
+ db 0A9h,006h,009h,09Ch,03Fh,036h,00Eh,0EBh
+ db 085h,067h,007h,072h,013h,057h,000h,005h
+ db 082h,04Ah,0BFh,095h,014h,07Ah,0B8h,0E2h
+ db 0AEh,02Bh,0B1h,07Bh,038h,01Bh,0B6h,00Ch
+ db 09Bh,08Eh,0D2h,092h,00Dh,0BEh,0D5h,0E5h
+ db 0B7h,0EFh,0DCh,07Ch,021h,0DFh,0DBh,00Bh
+ db 0D4h,0D2h,0D3h,086h,042h,0E2h,0D4h,0F1h
+ db 0F8h,0B3h,0DDh,068h,06Eh,083h,0DAh,01Fh
+ db 0CDh,016h,0BEh,081h,05Bh,026h,0B9h,0F6h
+ db 0E1h,077h,0B0h,06Fh,077h,047h,0B7h,018h
+ db 0E6h,05Ah,008h,088h,070h,06Ah,00Fh,0FFh
+ db 0CAh,03Bh,006h,066h,05Ch,00Bh,001h,011h
+ db 0FFh,09Eh,065h,08Fh,069h,0AEh,062h,0F8h
+ db 0D3h,0FFh,06Bh,061h,045h,0CFh,06Ch,016h
+ db 078h,0E2h,00Ah,0A0h,0EEh,0D2h,00Dh,0D7h
+ db 054h,083h,004h,04Eh,0C2h,0B3h,003h,039h
+ db 061h,026h,067h,0A7h,0F7h,016h,060h,0D0h
+ db 04Dh,047h,069h,049h,0DBh,077h,06Eh,03Eh
+ db 04Ah,06Ah,0D1h,0AEh,0DCh,05Ah,0D6h,0D9h
+ db 066h,00Bh,0DFh,040h,0F0h,03Bh,0D8h,037h
+ db 053h,0AEh,0BCh,0A9h,0C5h,09Eh,0BBh,0DEh
+ db 07Fh,0CFh,0B2h,047h,0E9h,0FFh,0B5h,030h
+ db 01Ch,0F2h,0BDh,0BDh,08Ah,0C2h,0BAh,0CAh
+ db 030h,093h,0B3h,053h,0A6h,0A3h,0B4h,024h
+ db 005h,036h,0D0h,0BAh,093h,006h,0D7h,0CDh
+ db 029h,057h,0DEh,054h,0BFh,067h,0D9h,023h
+ db 02Eh,07Ah,066h,0B3h,0B8h,04Ah,061h,0C4h
+ db 002h,01Bh,068h,05Dh,094h,02Bh,06Fh,02Ah
+ db 037h,0BEh,00Bh,0B4h,0A1h,08Eh,00Ch,0C3h
+ db 01Bh,0DFh,005h,05Ah,08Dh,0EFh,002h,02Dh
;
; uninitialized storage
;
@@ -2237,6 +2408,7 @@
ds 24
mtchfcb:
ds 11
+; note that as indicated above, bitbuf must be the byte before bleft
bitbuf: ds 1
vars:
bleft: ds 1
@@ -2250,7 +2422,7 @@
ds 1
D_shift:
ds 1
-V: ds 1
+urV: ds 1
nchar: ds 1
lchar: ds 1
ExState:
@@ -2311,5 +2483,5 @@
disttr: ds 4 * nrdist
endtr:
ds 8192 + 2 - (endtr - lenld)
-
+endaddr: ; must be no vars/data beyond this point
end
--------------- unzip152-rdbybits-and-crc32tab.diff ---------------

And I may as well include the C code to generate the table, again
based on degzip_portable.c:

-------------------- gentable.c --------------------
#include <stdio.h>

int main(void)
{
unsigned long c,i,j;

for(i=0;i<256;i++)
{
if((i&1)==0) printf("\tdb\t");
c=i;
for(j=0;j<8;j++)
c=(c>>1)^((c&1)?0xedb88320:0);
printf("%03Xh,%03Xh,%03Xh,%03Xh",
c&255,(c>>8)&255,(c>>16)&255,(c>>24)&255);
putchar(((i&1)==1)?'\n':',');
}
}
-------------------- gentable.c --------------------

-Rus.
Martin
2020-10-15 03:42:40 UTC
Permalink
Post by Russell Marks
Post by Russell Marks
[...]
Post by Tony Nicholson
https://github.com/agn453/UNZIP-CPM-Z80
[...]
Post by Russell Marks
I have to say - maybe directing this more at Tony/Martin? - just
looking at the code generally my first thought on optimising for speed
Here we go again. :-) I thought a constant in the degzip_portable.c
that the deflate code is based on looked familiar, and it is - so I
ported over the table-based CRC code from that. Now this is
"expensive" as the table is 1k long and pushes the COM file slightly
past 5k, but extracting my deflate test zip is 32% quicker with this
(combined with my previous changes) as compared with UNZIP152. It
would be possible to construct the table at runtime of course, but on
a Z80 I imagine a precalculated table might be for the best.
(While I'm posting I may as well note that I ran the version *before*
this table-CRC change against every zip on the Walnut Creek CP/M CD
earlier on, with no CRC errors. Obviously the goal there was to check
the non-deflate code is still working ok, and it seems to be.)
Studying your work lets me really feel your engineering. Great art!

And as UNZIP15 is based on the code from David Goodenough, your code
can be applied to any member of the family with little effort.

Took my favorite unzip tool and had fun... :-)
One can directly feel the difference!

Thanks!
Martin
Russell Marks
2020-10-15 10:59:30 UTC
Permalink
Martin <***@so.its.invalid> wrote:

[Re: unzip changes]
Post by Martin
And as UNZIP15 is based on the code from David Goodenough, your code
can be applied to any member of the family with little effort.
Yes, hopefully.
Post by Martin
One can directly feel the difference!
Thanks!
Well thanks to you and Tony for the deflate port etc., and of course
Keir Fraser for that C original. :-)

-Rus.
Tony Nicholson
2020-10-15 21:50:06 UTC
Permalink
Post by Russell Marks
[Re: unzip changes]
Post by Martin
And as UNZIP15 is based on the code from David Goodenough, your code
can be applied to any member of the family with little effort.
Yes, hopefully.
Post by Martin
One can directly feel the difference!
Thanks!
Well thanks to you and Tony for the deflate port etc., and of course
Keir Fraser for that C original. :-)
-Rus.
Thanks Rus for your efforts.

Martin, Keir and others deserve the credit more than me. You could say I'm
the consolidator of the patches! Later today I'll put together another release
on GitHub with your latest optimisations.

Tony
Tony Nicholson
2020-10-15 23:38:04 UTC
Permalink
On Friday, October 16, 2020 at 8:50:07 AM UTC+11, Tony Nicholson wrote:

[snip]
... Later today I'll put together another release
on GitHub with your latest optimisations.
I've bumped the version number to v1.5-3 and the latest updated sourcefile as UNZIP153.Z80.

Source and Z80 CP/M binary are available in a CP/M format library file from

https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip153.lbr

The README at https://github.com/agn453/UNZIP-CPM-Z80 summarises the latest
changes.

Tony
Russell Marks
2020-10-16 11:53:48 UTC
Permalink
Post by Tony Nicholson
[snip]
... Later today I'll put together another release
on GitHub with your latest optimisations.
I've bumped the version number to v1.5-3 and the latest updated sourcefile as UNZIP153.Z80.
Source and Z80 CP/M binary are available in a CP/M format library file from
https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip153.lbr
The README at https://github.com/agn453/UNZIP-CPM-Z80 summarises the latest
changes.
Thanks. FWIW, for a direct raw download the LBR link seems to need to be:

https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip153.lbr?raw=true

Re: the change description, I think the speedup applies generally to
all methods - I haven't modified the inflate code itself aside from
changing some readbits calls to rdbybits. Similarly for the CRC, I
believe that's calculated for all methods. But that's nitpicking
really, and it's true that my speed testing has been focused
exclusively on the deflate method - using a test zip containing the
previous unzip source and binaries. :-)

(Admittedly using an emulated Kaypro 10 as I have been is almost ideal
for emphasising the effect of any changes - with its slow CPU and fast
hard disk it will inevitably be CPU-bound. It's just it happens to
also be a convenient CPU-speed-limiting test case with RTC emulation,
making benchmarking easier.)


I've been working on another small speedup, it's 9% faster than
UNZIP153 on that test zip so far, by using a separate small
read-one-bit routine and table-based final byte unrolling for
rdbybits. But this isn't exactly a massive improvement so I'll give it
a little while and see if I can think of anything else before posting
more. (For example I can't help thinking the RRD instruction ought to
be usable somehow, I'm just not sure how exactly.)

I wonder, would it be reasonable to use a macro in unzip? And if so,
what sort of syntax is accepted by the appropriate assemblers? I'm
curious because that read-one-bit routine goes like this:

;
; rd1bit - faster version which reads a single bit only
;
rd1bit:
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
rd1blp: dec h
jp p,rd1b1 ; skip if new byte not needed yet
call getbyte
ld l,a ; new bitbuf
ld h,7 ; 8 bits left, pre-dec'd
rd1b1: xor a
rr l
ld (bitbuf),hl ; update bitbuf/bleft
ld h,a ; A still zero
rla ; return bit in HL and A
ld l,a
ret

For the times when that doesn't need to read a new byte, it takes 97
cycles total if you include the original CALL to the routine. Using a
macro to remove the CALL/RET overhead it would take 70, which would be
a fair bit quicker (no pun intended) and might be worth the extra 80
bytes total for the four places it's used. But if assembler syntax
varies for macros, I suppose it could be more trouble than it's worth.


On a different but related subject - I spotted what seems to be a
separate Z80 unzip implementation taking quite a different approach,
in the SymbOS OS's unzip app. The licence is far from clear to me, so
I would be very reluctant to use anything directly, but source appears
to be freely downloadable from symbos.org at least. It's probably not
that useful to us in practice, if nothing else I can't see a single
app on the website with a listed memory requirement of less than 128k,
but I thought it was interesting to see that a seemingly-independent
Z80 inflate implementation exists (albeit possibly based on gzip or
zlib).

-Rus.
Tony Nicholson
2020-10-16 22:11:49 UTC
Permalink
On Friday, October 16, 2020 at 10:53:50 PM UTC+11, Russell Marks wrote:

[snip]
Post by Russell Marks
https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip153.lbr?raw=true
I've always done the right-click and Save link as... with my various web
browsers - but I'll add this to the README for the next update.
Post by Russell Marks
Re: the change description, I think the speedup applies generally to
all methods - I haven't modified the inflate code itself aside from
changing some readbits calls to rdbybits. Similarly for the CRC, I
believe that's calculated for all methods. But that's nitpicking
really, and it's true that my speed testing has been focused
exclusively on the deflate method - using a test zip containing the
previous unzip source and binaries. :-)
My bad - I'll adjust the text accordingly. If you have a GitHub account
you can also use the web interface to edit files in the repository then
submit them as a change-request (commit changes). You're welcome
to try this if you wish!
Post by Russell Marks
I've been working on another small speedup, it's 9% faster than
UNZIP153 on that test zip so far, by using a separate small
read-one-bit routine and table-based final byte unrolling for
rdbybits. But this isn't exactly a massive improvement so I'll give it
a little while and see if I can think of anything else before posting
more. (For example I can't help thinking the RRD instruction ought to
be usable somehow, I'm just not sure how exactly.)
I wonder, would it be reasonable to use a macro in unzip? And if so,
what sort of syntax is accepted by the appropriate assemblers? I'm
I use ZSM4 natively (Hector Peraza's Z80/Z180/Z280 Macro Assembler)
and this is fully compatible with the Microsoft M80 Macro Assembler.
Documentation is at -

https://github.com/hperaza/ZSM4/blob/master/docs/zsm4.pdf

Other asssemblers (such as the SLR ones) or cross-compilation may need
tweaks. I tend to use SIMH AltairZ80 on my Mac then push the binaries
via Kermit or XMODEM to my various real hardware (S-100, Z180, Z280, etc) to
test.
Post by Russell Marks
;
; rd1bit - faster version which reads a single bit only
;
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
rd1blp: dec h
jp p,rd1b1 ; skip if new byte not needed yet
call getbyte
ld l,a ; new bitbuf
ld h,7 ; 8 bits left, pre-dec'd
rd1b1: xor a
rr l
ld (bitbuf),hl ; update bitbuf/bleft
ld h,a ; A still zero
rla ; return bit in HL and A
ld l,a
ret
For the times when that doesn't need to read a new byte, it takes 97
cycles total if you include the original CALL to the routine. Using a
macro to remove the CALL/RET overhead it would take 70, which would be
a fair bit quicker (no pun intended) and might be worth the extra 80
bytes total for the four places it's used. But if assembler syntax
varies for macros, I suppose it could be more trouble than it's worth.
Bear in mind that implementations on Z80 compatible enhanced
processors (e.g. the Z180 and Z280 or the ZX Spectrum Next FPGA)
have different cycle times (shorter).

I'm all for squeezing performance out of things.

Tony
Martin
2020-10-17 05:52:01 UTC
Permalink
Post by Tony Nicholson
[snip]
Post by Russell Marks
https://github.com/agn453/UNZIP-CPM-Z80/blob/master/unzip/unzip153.lbr?raw=true
I've always done the right-click and Save link as... with my various web
browsers - but I'll add this to the README for the next update.
The raw download link usually is:
https://raw.githubusercontent.com/username/reponame/branch/path/to/file

So, here:
https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip153.lbr

Martin
Russell Marks
2020-10-17 10:55:42 UTC
Permalink
[...]
Post by Tony Nicholson
Post by Russell Marks
Re: the change description, I think the speedup applies generally to
all methods - I haven't modified the inflate code itself aside from
changing some readbits calls to rdbybits. Similarly for the CRC, I
believe that's calculated for all methods. But that's nitpicking
really, and it's true that my speed testing has been focused
exclusively on the deflate method - using a test zip containing the
previous unzip source and binaries. :-)
My bad - I'll adjust the text accordingly. If you have a GitHub account
you can also use the web interface to edit files in the repository then
submit them as a change-request (commit changes). You're welcome
to try this if you wish!
Perhaps I should make one somewhen, but I haven't yet.
Post by Tony Nicholson
Post by Russell Marks
I wonder, would it be reasonable to use a macro in unzip? And if so,
what sort of syntax is accepted by the appropriate assemblers? I'm
I use ZSM4 natively (Hector Peraza's Z80/Z180/Z280 Macro Assembler)
and this is fully compatible with the Microsoft M80 Macro Assembler.
Documentation is at -
https://github.com/hperaza/ZSM4/blob/master/docs/zsm4.pdf
Just looking at ZSM4 and Unix zmac initially then, it looks like the
basic macro..endm syntax is the same, but the local-label syntax
differs. This is frustrating as I only need one label for a jp - I
could hard-code a relative jump as data easily enough, but I don't
think "jp p,NN" has a relative form. I could refer to the current
address, but I'm not sure if zmac gives you a way to do that and the
ZSM4 documentation doesn't obviously mention it either. So I think
I'll just leave it as-is for now, as it probably wouldn't be a very
noticeable difference overall anyway (mainly being significant in the
context of this one routine/macro only).
Post by Tony Nicholson
Post by Russell Marks
; rd1bit - faster version which reads a single bit only
[...]
Post by Tony Nicholson
Post by Russell Marks
For the times when that doesn't need to read a new byte, it takes 97
cycles total if you include the original CALL to the routine. Using a
macro to remove the CALL/RET overhead it would take 70, which would be
a fair bit quicker (no pun intended) and might be worth the extra 80
bytes total for the four places it's used. But if assembler syntax
varies for macros, I suppose it could be more trouble than it's worth.
Bear in mind that implementations on Z80 compatible enhanced
processors (e.g. the Z180 and Z280 or the ZX Spectrum Next FPGA)
have different cycle times (shorter).
I suppose so, but I think realistically I have to target the Z80 and
hope results are basically useful on other CPUs. Also, it seems to me
to make sense to optimise for the slowest CPU (or emulated CPU), which
I imagine would generally be the Z80. Still, I'll change my references
to "cycles" to say "Z80 cycles" in the comments.

-Rus.
Udo Munk
2020-10-17 13:34:17 UTC
Permalink
...
Post by Russell Marks
differs. This is frustrating as I only need one label for a jp - I
could hard-code a relative jump as data easily enough, but I don't
think "jp p,NN" has a relative form. I could refer to the current
address, but I'm not sure if zmac gives you a way to do that and the
ZSM4 documentation doesn't obviously mention it either. So I think
...

You can avoid that with jp p,$+-offset, you have to count the bytes for offset
your self and adjust if size changes between the jp location and the destination.
Russell Marks
2020-10-17 16:25:05 UTC
Permalink
Post by Udo Munk
...
Post by Russell Marks
differs. This is frustrating as I only need one label for a jp - I
could hard-code a relative jump as data easily enough, but I don't
think "jp p,NN" has a relative form. I could refer to the current
address, but I'm not sure if zmac gives you a way to do that and the
ZSM4 documentation doesn't obviously mention it either. So I think
...
You can avoid that with jp p,$+-offset, you have to count the bytes
for offset your self and adjust if size changes between the jp
location and the destination.
Ah, I see - this does seem to work with Unix zmac at least. The jump
is only over the jp itself and six more bytes, so that doesn't seem
too bad to manage if other assemblers accept the $+offset syntax.

Trying that now it seems to bring the changes to 12% faster than
UNZIP153 on my test zip, only a slight improvement on the previous 9%
but more than I'd expected to see overall.

-Rus.
Udo Munk
2020-10-17 17:10:37 UTC
Permalink
Post by Russell Marks
Ah, I see - this does seem to work with Unix zmac at least. The jump
is only over the jp itself and six more bytes, so that doesn't seem
too bad to manage if other assemblers accept the $+offset syntax.
Should work with any assembler, $ usually is the current address, I don't know
a 8080 or Z80 assembler where this won't work.
Russell Marks
2020-10-18 17:06:21 UTC
Permalink
Post by Russell Marks
I've been working on another small speedup, it's 9% faster than
UNZIP153 on that test zip so far, by using a separate small
read-one-bit routine and table-based final byte unrolling for
rdbybits. But this isn't exactly a massive improvement so I'll give it
a little while and see if I can think of anything else before posting
more. (For example I can't help thinking the RRD instruction ought to
be usable somehow, I'm just not sure how exactly.)
Well, RRD was a non-starter, it's possible to use but not really in a
way which improves the speed. Or not that I spotted at least.

Anyway, what I did instead was mostly concentrate on backporting some
changes from unzip 1.8, which helpfully includes a simple but large
speedup merely by checking ^C less often. I was also able to adjust my
rdbybits change to avoid needing a table with no obvious impact on the
speed (by using self-modifying code and an unrolled loop), and
included rd1bit as a macro without needing to use any local symbols.

In addition, I fixed a buffer overrun not covered by unzip 1.8
changes, generally commented some of the code a bit more, and included
a provisional version-change/changes-list at the top to possibly
simplify things for Tony. :-)

As for speed, in Virtual Kaypro my test zip now extracts 38% faster
than UNZIP153 (or 57% faster than UNZIP152), mostly due to the ^C
change. And I've repeated my testing against the Walnut Creek CP/M CD
zips, with no CRC errors.

This is probably the lot from me as far as optimisations go.
Eventually, I *might* try to add long-filenames support of some sort -
i.e. support for extracting zips with embedded paths in the filenames,
which can currently be a major pain to extract. This version does at
least tweak things enough to make that easier to do in future, for
up-to-255-char filenames (which isn't perfect but could be of some
use).

--------------- unzip153-to-unzip153-new.diff ---------------
--- unzip153.z80 2020-10-16 09:36:40.000000000 +0100
+++ unzip153-new.z80 2020-10-18 17:44:11.242184290 +0100
@@ -3,9 +3,34 @@
; Dissolves MS-DOS ZIP files.
;
Vers equ 15
-Revisn equ 3 ;;v1.5-3
+Revisn equ 4 ;;v1.5-4
;
;
+; Version 1.5-4 -- October 18, 2020 -- Russell Marks
+; Further slight optimisations to bit-readers,
+; fix long-filename buffer overrun,
+; backport some unzip 1.8 changes (buffer-overrun fixes, bit 7 strip on
+; output filenames, less frequent ^C checking, and low-memory message),
+; and add various comments.
+;
+; Use self-modifying code with an unrolled end loop in rdbybits,
+; and add a "rd1bit" macro, for about a 12% speed improvement
+; on overall extraction time compared to version 1.5-3.
+; Fix long-filename buffer overrun, for filenames longer than
+; 255 characters.
+; Backport Howard Goldstein's buffer-overrun fixes from
+; unzip 1.8 - previously, a 255-byte buffer was repeatedly used
+; to read up-to-65535-byte inputs.
+; Strip bit 7 on output filenames as in unzip 1.8.
+; Check ^C less frequently as in unzip 1.8 (but every 16 bytes
+; rather than every 128 there which was excessive and made
+; little difference to speed), for a further 30% speed
+; improvement on overall extraction time (or 38% total).
+; Adopt low-memory message from unzip 1.8.
+; Add some basic comments to code/data based mostly on the
+; https://en.wikipedia.org/wiki/Zip_(file_format)
+; format description.
+;
; Version 1.5-3 -- October 15, 2020 -- Russell Marks
; More optimisations to improve the speed of the UnDeflate
; method.
@@ -115,7 +140,7 @@
;
; Other
;
-STRSIZ equ 256
+STRSIZ equ 256 ; must be 256 exactly, see plfh for why
DLE equ 144
max_bits equ 13
init_bits equ 9
@@ -172,10 +197,7 @@
ld de,endaddr
or a
sbc hl,de ; check endaddr is less (i.e. hl is >=)
- jr nc,wasfil
- call ilprt
- db 'Low mem',0 ; just a short error msg to give the idea
- jp exit
+ jr c,nomem
;
wasfil: ld de,altfcb
ld a,(de) ; output drive given?
@@ -217,8 +239,12 @@
call bdos ; try and open ZIP file
inc a
jr nz,openok ; ok
- call ilprt ; complain and fall through to exit
+ call ilprt
db 'Couldn''t find ZIP file',CR,LF,0
+ jr exit
+;
+nomem: call ilprt ; complain and fall through to exit
+ db 'Not enough memory!',CR,LF,0
;
; All exits point here for possible future enhancements, such
; as elimination of warm boot.
@@ -229,14 +255,22 @@
db 'Bad signature in ZIP file',CR,LF,0
jr exit
;
+; Judging from https://en.wikipedia.org/wiki/Zip_(file_format)
+; this appears to read the file in a technically incorrect way,
+; by relying on the local file header only (as a zip-fixing
+; program might), and simply skipping past the central directory
+; entirely. This leaves us potentially extracting deleted files,
+; for example. It's probably not a real problem in most
+; cases, but it seemed worth noting.
+;
openok: call getword
- ld de,-(('K' shl 8) + 'P')
+ ld de,-(('K' shl 8) + 'P') ; magic number
add hl,de
ld a,h
or l
jr nz,sigerr
call getword
- dec l
+ dec l ; check for 01,02 (central directory)
jr nz,nocfhs
dec h
dec h
@@ -244,7 +278,7 @@
call pcfh
jr openok
;
-nocfhs: dec l
+nocfhs: dec l ; check for 03,04 (local file header)
dec l
jr nz,nolfhs
ld a,h
@@ -253,7 +287,7 @@
call plfh
jr openok
;
-nolfhs: dec l
+nolfhs: dec l ; check for 05,06 (end of central dir.)
dec l
jr nz,sigerr
ld a,h
@@ -262,7 +296,18 @@
call pecd
jr exit
;
-pcfh: ld b,12
+; (The belated-CRC type (07,08) is apparently not supported.)
+;
+; pcfh/pecd are not truly required, they only serve to skip past
+; the central directory and end-of-central-directory blocks. But
+; they do arguably serve as a small additional check of file
+; integrity. It would be faster to simply exit when we spot the
+; central directory signature (since the CD/EOCD are by definition
+; the last two things) - for large files this might be noticeable.
+;
+; pcfh - skip past central directory
+;
+pcfh: ld b,12 ; skip ahead to filename length entry
pcfhl1: push bc
call getword
pop bc
@@ -274,35 +319,39 @@
call getword
pop de
pop bc
- push hl
- push de
- push bc
- ld b,6
+ push hl ; file comment length
+ push de ; extra field length
+ push bc ; filename length
+ ld b,6 ; skip ahead to filename
pcfhl2: push bc
call getword
pop bc
djnz pcfhl2
pop hl
- ld de,junk
- call getstring
+ call skpstring ; skip past filename
pop hl
- ld de,junk
- call getstring
+ call skpstring ; skip past extra field
pop hl
- ld de,junk
- call getstring
+ call skpstring ; skip past file comment
ret
;
-pecd: ld b,8
+; pecd - skip past end-of-central-directory
+;
+pecd: ld b,8 ; skip ahead to comment length
pecdl: push bc
call getword
pop bc
djnz pecdl
- call getword
- ld de,junk
- call getstring
+ call getword ; comment length
+ call skpstring ; skip past comment
ret
;
+; plfh - read local file header, then extract/check file
+;
+; NB: As mentioned above, this is technically not the correct
+; approach to take (but it's almost certainly faster this way
+; and will do the right thing for most zips).
+;
plfh: ld de,lfh
ld hl,endlfh-lfh
call getstring
@@ -311,12 +360,41 @@
ld bc,33
ld (hl),b
ldir
+;
+; Read filename from LFH into "junk". Filenames of >255 (sic)
+; characters will be skipped after the 255th char.
+; Required as the format allows 65535-char filenames. (!)
+;
ld de,junk
ld hl,(fnl)
- call getstring
- ld de,junk + 20
+ ld a,h
+ or a
+ jr z,plfh2
+ ld hl,STRSIZ-1 ; 255, allow for trailing zero byte
+plfh2: call getstring ; rets DE pointing past last char read
+ ex de,hl
+ ld de,junk
+ or a
+ sbc hl,de
+ ex de,hl ; DE=number of chars read already
+ ld hl,(fnl)
+ or a ; probably unnecessary, but for clarity
+ sbc hl,de
+ call skpstring ; skip the rest of any long filename
ld hl,(efl)
- call getstring
+ call skpstring ; skip extra field
+;
+; Now that filenames of <=255 chars are retained to this point,
+; there is the possibility of adding some kind of support for
+; zips with embedded paths in filenames (which are very common).
+; It may be worth skipping ahead to the last directory separator
+; if present, the question is which ones to support; obviously
+; "\" and "/" are the most common examples. But this also risks
+; breaking some CP/M filenames. Maybe only do it optionally?
+; Or have an override to disable path-checking?
+;
+; Anyway, for now it's just a basic copy of filename to FCB.
+;
ld de,junk
ld hl,opfn
ld b,8
@@ -517,7 +595,6 @@
getcode:
ld a,(codesize)
readbits:
- push bc ; may not need to save bc?
ld b,a
ld c,80h ; bits rotate into C and A
xor a ; (rra is 4 cycles vs 8 for others)
@@ -542,15 +619,17 @@
bitret: ld (bitbuf),hl ; update bitbuf/bleft
ld h,c ; return bits in HL and A
ld l,a
- pop bc
ret
;
-; rdbybits - faster version of readbits for <=8 bits
+; rdbybits - faster version of readbits for <=8 bits.
+; Due to the implementation this must not ever be called with A>8.
+; (No caller seems to require saving BC, so I removed that for both
+; this and readbits.)
;
rdbybits:
- push bc ; may not need to save bc?
+ ld (rdbyop+1),a ; modify jr instruction at rdbyop
ld b,a
- ld a,80h ; bits rotate into A (rra faster)
+ xor a ; bits rotate into A (rra faster)
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
rdbylp: dec h
jp p,rdby1 ; skip if new byte not needed yet
@@ -563,17 +642,41 @@
ld a,c
rdby1: rr l
rra
- jr c,rdbyrt
djnz rdbylp
- or a ; clear carry flag initially
-rdby2: rra ; safe as dropped bits are all zeroes
- jp nc,rdby2 ; jp likely faster in this case
-rdbyrt: ld (bitbuf),hl ; update bitbuf/bleft
- ld h,0 ; return bits in HL and A
- ld l,a
- pop bc
+ ld (bitbuf),hl ; update bitbuf/bleft
+ or a
+rdbyop: jr rdbyr8
+rdbyr8: rra ; 8x rra, not all are used in practice but
+ rra ; this arrangement simplifies code above
+ rra
+ rra
+ rra
+ rra
+ rra
+ rra
+ ld h,b ; B still zero after the final djnz
+ ld l,a ; return bits in HL and A
ret
;
+; rd1bit - faster version which reads a single bit only.
+; The jp instruction here is awkward, due to differing
+; local-symbol syntax between assemblers.
+;
+rd1bit macro
+ ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
+ dec h
+ jp p,$+9 ; jump to "xor a", past jp op plus 6 bytes:
+ call getbyte ; (3 bytes)
+ ld l,a ; (1 byte) new bitbuf
+ ld h,7 ; (2 bytes) 8 bits left, pre-dec'd
+ xor a ; jp op above jumps here
+ rr l
+ ld (bitbuf),hl ; update bitbuf/bleft
+ ld h,a ; A still zero
+ rla ; return bit in HL and A
+ ld l,a
+ endm
+;
scanfn: ld a,(de)
cp '.'
jr z,nocopy
@@ -582,6 +685,7 @@
inc de
dec b
jp m,scanfn
+ and 7fh ; remove high bit
ld (hl),a
inc hl
jr scanfn
@@ -607,6 +711,8 @@
inc hl
jr pstr
;
+; getstring must return DE pointing just past the last byte read.
+;
getstring:
ld a,h
or l
@@ -622,6 +728,16 @@
dec hl
jr getstring
;
+skpstring:
+ ld a,h
+ or l
+ ret z
+ push hl
+ call getbyte
+ pop hl
+ dec hl
+ jr skpstring
+;
getword:
call getbyte
push af
@@ -1094,8 +1210,7 @@
call rdbybits
jr ur3
;
-ur2: ld a,1
- call rdbybits
+ur2: rd1bit
dec l
jr z,ur4
call slenlch
@@ -1118,7 +1233,7 @@
or a
jr nz,ur5
ld a,l
- cp DLE
+ cp DLE
jr nz,ur9
ld a,1
ld (ExState),a
@@ -1374,7 +1489,7 @@
or c
ret z
dec bc
- add hl,de
+ add hl,de
ld a,(lbl)
cp (iy + _bitlength)
jr z,gt2
@@ -1467,8 +1582,7 @@
rt1: push hl
push de
push bc
- ld a,1
- call rdbybits
+ rd1bit
pop af
push af
or a
@@ -1584,8 +1698,7 @@
nextsymbol:
ld (treep),hl
nsloop: push hl
- ld a,1
- call rdbybits
+ rd1bit
pop hl
or a
jr z,nsleft
@@ -2020,8 +2133,7 @@
and 1
ret nz

- ld a,1
- call rdbybits
+ rd1bit
push af

ld a,2
@@ -2082,7 +2194,7 @@

udnext: pop af
or a
- jr z,udloop
+ jp z,udloop
ret

udpret: pop af
@@ -2100,8 +2212,15 @@
jr udbskp
;
; ckcon -- checks console for character; aborts if ^C
+; Only really checks every 16 calls, since this is called
+; from getbyte for every input byte.
;
-ckcon: ld e,0FFh ; check for character
+ckcon: ld a,1 ; modified below
+ dec a
+ and 15
+ ld (ckcon+1),a ; update LD A instruction above
+ ret nz ; check every 16 calls only
+ ld e,0FFh ; check for character
ld c,dircon
call bdos
or a
@@ -2112,7 +2231,7 @@
or a
jr z,ckcon1 ; (no)
call setout
-ckcon0: ld de,opfcb
+ckcon0: ld de,opfcb ; ckcon0 jumped to for write error
ld c,fclose ; yes, close it
call bdos
ld de,opfcb
@@ -2404,18 +2523,19 @@
outusr: ds 1
mode: ds 1
junk: ds STRSIZ
-lfh:
-vnte: ds 2
-gpbf: ds 2
-cm: ds 2
-lmft: ds 2
-lmfd: ds 2
-crc: ds 4
-cs: ds 4
-ucs: ds 4
-fnl: ds 2
-efl: ds 2
-endlfh: ds 1
+lfh: ; data read from local file header
+vnte: ds 2 ; version
+gpbf: ds 2 ; general purpose bit flag
+cm: ds 2 ; compression method
+lmft: ds 2 ; file last modification time
+lmfd: ds 2 ; file last modification date
+crc: ds 4 ; CRC-32 of uncompressed data
+cs: ds 4 ; compressed size
+ucs: ds 4 ; uncompressed size
+fnl: ds 2 ; file name length
+efl: ds 2 ; extra field length
+endlfh: ds 1 ; marker for end of lfh data; also,
+ ; zero byte is written here by getstring
opfcb: ds 1 ; output file control block
opfn: ds 8
opext: ds 3
--------------- unzip153-to-unzip153-new.diff ---------------

-Rus.
Tony Nicholson
2020-10-18 22:49:16 UTC
Permalink
Rus's latest optimisations to UNZIP for CP/M Z80 are up on GitHub.
I've bumped the release to v1.5-4.

https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip154.lbr

For now, I've kept each of the releases as snapshots in separately
named files (rather than keep the source file name the same and let
GitHub keep track of the changes). I may re-organise things in
future to make it easier for everyone to "just grab the latest version",
and if they're interested in the changes they can use the History
feature of GitHub to view the context differences.

Comments/suggestions/complaints welcome :)

Tony
Russell Marks
2020-10-19 12:50:13 UTC
Permalink
Post by Tony Nicholson
Rus's latest optimisations to UNZIP for CP/M Z80 are up on GitHub.
I've bumped the release to v1.5-4.
https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip154.lbr
For now, I've kept each of the releases as snapshots in separately
named files (rather than keep the source file name the same and let
GitHub keep track of the changes). I may re-organise things in
future to make it easier for everyone to "just grab the latest version",
and if they're interested in the changes they can use the History
feature of GitHub to view the context differences.
Comments/suggestions/complaints welcome :)
I suppose having recent derivatives of both 1.5 and 1.8 is confusing,
but it does arguably make sense. At least I narrowed the difference
somewhat with these changes (then messed that up again by adding
comments :-)).

-Rus.
Lawrence Nelson
2020-10-26 20:24:08 UTC
Permalink
Post by Russell Marks
Post by Tony Nicholson
Rus's latest optimisations to UNZIP for CP/M Z80 are up on GitHub.
I've bumped the release to v1.5-4.
https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip154.lbr
For now, I've kept each of the releases as snapshots in separately
named files (rather than keep the source file name the same and let
GitHub keep track of the changes). I may re-organise things in
future to make it easier for everyone to "just grab the latest version",
and if they're interested in the changes they can use the History
feature of GitHub to view the context differences.
Comments/suggestions/complaints welcome :)
I suppose having recent derivatives of both 1.5 and 1.8 is confusing,
but it does arguably make sense. At least I narrowed the difference
somewhat with these changes (then messed that up again by adding
comments :-)).
-Rus.
Many thanks to Martin, Russell Marks, Tony Nicholson et al for the fantastic job in making UNZIP truly usable on a CP/M machines. I have downloaded v1.52 and v1.54 from Tony's Github page and they work as advertised! l applied Martin's changes that resulted in v1.52 by hand to v1.8 to produce v1.82. Here are some timings I measured on v1.52, 1.54 & v1.82 to extract UNZIP18.Z80 from UNZIP18.ZIP
Version Time
1.52 14.55s
1.54 0.82s
1.82 0.28s
So v1.52 is really slow! Main differences between v1.52 and v1.82 involve use of ZSLIB's bit oriented I/O with variable sized buffers and the use of direct console IO and the frequency of Cntrl-C sampling. v1.82 divides sampling frequency by 128. That v1.82 to almost 3x faster than v1.54 was unexpected. Ran the timings several times and got the same results. Currently I am trying to apply Russ's changes to v1.82 to produce v1.84 and will report the results as soon as available.

-Lars
Tony Nicholson
2020-10-30 04:39:31 UTC
Permalink
On Tuesday, October 27, 2020 at 7:24:09 AM UTC+11, Lawrence Nelson wrote:

[snip]
Many thanks to Martin, Russell Marks, Tony Nicholson et al for the fantastic job in making UNZIP truly usable on a CP/M machines. I have downloaded v1.52 and v1.54 from Tony's Github page and they work as advertised! l applied Martin's changes that resulted in v1.52 by hand to v1.8 to produce v1.82. Here are some timings I measured on v1.52, 1.54 & v1.82 to extract UNZIP18.Z80 from UNZIP18.ZIP
Version Time
1.52 14.55s
1.54 0.82s
1.82 0.28s
So v1.52 is really slow! Main differences between v1.52 and v1.82 involve use of ZSLIB's bit oriented I/O with variable sized buffers and the use of direct console IO and the frequency of Cntrl-C sampling. v1.82 divides sampling frequency by 128. That v1.82 to almost 3x faster than v1.54 was unexpected. Ran the timings several times and got the same results. Currently I am trying to apply Russ's changes to v1.82 to produce v1.84 and will report the results as soon as available.
-Lars
Thanks to Lars Nelson, I've just included the Z-system (ZCPR 3.x) version v1.82 source
and binaries to the GitHub repository at

https://github.com/agn453/UNZIP-CPM-Z80

and a direct download link for the CP/M .LBR file is

https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip182.lbr

[For testing, I re-linked this natively under Z3PLUS Vers. 1.02 (the Z-System for
CP/M-Plus) with Z-system libraries from LIBS45A.LBR and ZSLIB36.LBR using the
ZSM4 assembler and Digital Research's LINK - see the submit file UNZIP182.SUB.
The binary produced is identical to the one Lars provided.]

Enjoy!

Tony
Tony Nicholson
2020-10-30 23:00:02 UTC
Permalink
Post by Tony Nicholson
Thanks to Lars Nelson, I've just included the Z-system (ZCPR 3.x) version v1.82 source
and binaries to the GitHub repository at
https://github.com/agn453/UNZIP-CPM-Z80
and a direct download link for the CP/M .LBR file is
https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip182.lbr
The v1.84 version for Z-system is now up on GitHub too (along with
copies of the Z-System library distributions used to build it). It has
the changes from v1.5-4 ported (with the exception of the filename >255
character check).

https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip184.lbr

Tony
Tony Nicholson
2021-06-15 23:41:25 UTC
Permalink
There's been an update to fix an issue with the Z-System version of the CP/M
Z80 version of UNZIP. Previously some .ZIP files caused the program to hang
when trying to extract all files.

You'll find the latest version V1.8-7 available on my GitHub repository at

https://github.com/agn453/UNZIP-CPM-Z80

or download the source-code and updated binary in a CP/M library file
from

https://raw.githubusercontent.com/agn453/UNZIP-CPM-Z80/master/unzip/unzip187.lbr

Thanks go to Lars Nelson for finding and fixing this.

Tony
Tony Nicholson
2023-12-29 00:46:40 UTC
Permalink
Bumping this old thread of mine as a heads up -

I've just updated

https://github.com/agn453/UNZIP-CPM-Z80

to include Lars Nelson's latest fixes for the CP/M native version
of ZIP by Jonathon Harston.

This is a utility to create ZIP files under CP/M (using the Stored
file method only - with no compression) that you can use to move a
group of files back to your PC/Mac/Linux machine, then use the
built-in unzip to extract them).

You can grab ZIP101.COM (and the source code from zip101.lbr) from
GitHub and maybe upgrade your UNZIP utility from there too.

Have a Happy New Year all!

Tony

Martin
2020-10-17 09:56:43 UTC
Permalink
Am 10/15/2020 04:02 AM, Russell Marks schrieb:
[...]
Post by Russell Marks
And I may as well include the C code to generate the table, again
-------------------- gentable.c --------------------
#include <stdio.h>
int main(void)
{
unsigned long c,i,j;
for(i=0;i<256;i++)
{
if((i&1)==0) printf("\tdb\t");
c=i;
for(j=0;j<8;j++)
c=(c>>1)^((c&1)?0xedb88320:0);
printf("%03Xh,%03Xh,%03Xh,%03Xh",
c&255,(c>>8)&255,(c>>16)&255,(c>>24)&255);
putchar(((i&1)==1)?'\n':',');
}
}
-------------------- gentable.c --------------------
-Rus.
Hitech-C as a 16-bit compiler does not translate this as expected.

c=(c>>1)^((c&1)?0xedb88320:0);

The constants inside the right part of the XOR are obviously evaluated as "int" (16-bits).
The unsigned extension to "long" before the XOR doesn't help, the high word remains zero.

To fix it, change the line above to:

c=(c>>1)^((c&1)?0xedb88320L:0L);
Russell Marks
2020-10-17 16:22:40 UTC
Permalink
[...]
Post by Martin
Post by Russell Marks
And I may as well include the C code to generate the table, again
[...]
Post by Martin
Post by Russell Marks
c=(c>>1)^((c&1)?0xedb88320:0);
[...]
Post by Martin
Hitech-C as a 16-bit compiler does not translate this as expected.
I'll admit I hadn't considered the 16-bit case, which seems ironic
given the context. :-) Thanks for the fix.

-Rus.
Loading...