vcswatch reports that
this package seems to have new commits in its VCS but has
not yet updated debian/changelog. You should consider updating
the Debian changelog and uploading this new version into the archive.
Here are the relevant commit logs:
commit 594cf0da393fb7f039617853ea0582b211e55e59
Author: Xavier Roche <roche@httrack.com>
Date: Mon Jun 22 22:27:18 2026 +0200
debian: override embedded-library for bundled minizip, lint under debian:sid (#419)
httrack statically links its own patched minizip (src/minizip): it carries a
zipFlush() API the system libminizip lacks, which htscache.c uses to flush the
cache .zip so an interrupted crawl leaves a valid archive, plus Android and
old-zlib portability fixes. The system library can't be substituted until that
lands upstream, so add justified lintian overrides for the resulting
embedded-library tag on libhttrack3 and proxytrack.
The tag never showed in CI because the deb job built and linted on the Ubuntu
runner, whose lintian predates the minizip fingerprint; it only fires on the
newer lintian the Debian buildds and UDD run. Build and lint the package inside
a debian:sid container instead, matching the upload target. That also keeps the
archive legal: a Debian dpkg-deb writes xz members where an Ubuntu host defaults
to zstd, which Debian's lintian rejects as a malformed deb. And being unable to
unpack a zstd member, lintian never scans the binaries the embedded-library
check reads, so the override would otherwise have looked unused.
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
commit 3845cd1fb328febd43d59c84ce14e48f7903742e
Author: Xavier Roche <roche@httrack.com>
Date: Mon Jun 22 21:18:53 2026 +0200
Store the DNS cache in a coucal hashtable (#420)
The resolver cache was a hand-rolled singly-linked list with a dummy head
node: O(n) lookup, O(n^2) build, and each record carried its own next
pointer plus an inline copy of the hostname key. Swap it for coucal, the
hashtable already used for the backing cache and the ready slots, keyed by
hostname with the address record as the value.
coucal owns the records (freed through a value handler on coucal_delete)
and dups the key itself, so t_dnscache sheds both its next link and its
inline iadr string and becomes a pure address record. The state field
keeps the same pointer width (t_dnscache* -> coucal), so the installed
htsopt.h layout and the ABI are unchanged.
Behaviour is identical: same -1/0/>0 lookup contract, same negative
caching, same resolve-once semantics, all under the existing
opt->state.lock (coucal is not internally serialized against the FTP/web
threads). The DNS self-test exercises the full contract black-box and
passes unchanged.
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
commit 94bffb080495e46d0408ce7e7aa59703f966c507
Author: Xavier Roche <roche@httrack.com>
Date: Mon Jun 22 20:56:18 2026 +0200
Fall back to the next address when a connect fails or stalls (#418)
* Resolve hosts to multiple addresses and cache the full list
The DNS cache kept a single address per host and the resolver copied only
the head of getaddrinfo's result, discarding the rest. That leaves no
fallback when the chosen address is unreachable (e.g. a blackholed IPv6 on a
dual-stack host) -- the root of the "stuck on connect" stalls.
Widen t_dnscache to hold up to HTS_MAXADDRNUM addresses in resolver order and
walk ai_next when resolving. New hts_dns_resolve_all() exposes the list;
hts_dns_resolve2() still returns the first address, so existing callers are
unchanged. newhttp_addr() connects to a chosen candidate index (newhttp() is
the index-0 wrapper), for the connect-fallback path added next.
No ABI change: t_dnscache is engine-internal (httrackp holds only a pointer;
no plugin reads its fields) and the htsblk/lien_back layout is untouched.
The DNS self-test now covers the list path: count, resolver order, the
family filter, and clamping past HTS_MAXADDRNUM.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
* Fall back to the next address when a connect fails or stalls
A slot connected to a single resolved address and waited the full slot
timeout (default 120s) if that address was dead -- a blackholed IPv6 on a
dual-stack host would stall the whole mirror. With the cache now holding the
full address list, retry the next address instead of failing.
In back_wait, a connecting slot probes the resolved address count once, then:
on a refused/failed connect (a new SO_ERROR check at connect completion, since
a failed non-blocking connect is reported writable too) it falls back
immediately; on a stalled connect it falls back after a short per-candidate
deadline (min(timeout, 10s)) rather than the full timeout. The last candidate
keeps the full timeout, so single-address hosts are unchanged. Per-slot state
(current index, count, connect start) lives in struct_back, parallel to the
slot array -- no htsblk/lien_back layout change, so the ABI is untouched.
back_connect_fallback_due() (the deadline decision) and newhttp_addr()'s
address selection are unit-tested through the DNS mock.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
* Add HTTRACK_DEBUG_RESOLVE and a deterministic connect-fallback test
Exercising the connect fallback needs a host that resolves to a dead address
first and a live one next, deterministically and offline. A true SYN
black-hole can't be simulated without root, but a refused address can.
HTTRACK_DEBUG_RESOLVE="host:ip[,ip...]" pins a host's resolution to a fixed
address list (curl --resolve style), reusing the PR1 resolver seam: an
addrinfo backend that synthesizes the listed addresses for the named host and
delegates other hosts to libc (copying into its own allocations so one
freeaddrinfo frees both). It is a debug/test hook, inert unless the env var is
set, and IPv6-build-only like the rest of the resolver list.
The new local crawl test binds the server to 127.0.0.1 and resolves a host to
127.0.0.2 (refused) then 127.0.0.1: the mirror only succeeds via the fallback.
A V6_SUPPORT substitution (mirroring HTTPS_SUPPORT) lets it skip on non-IPv6
builds.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
---------
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
commit a5c86e7e89d1ea32665ba512698b8fcb87573412
Author: Xavier Roche <roche@httrack.com>
Date: Mon Jun 22 19:03:55 2026 +0200
Add a mockable resolver backend and a DNS resolver/cache self-test (#417)
Route the getaddrinfo resolver through a swappable backend pair
(getaddrinfo/freeaddrinfo) that defaults to the libc resolver, so the
self-test can script DNS answers in-process with no network. The pair is
needed because a fake allocates its own addrinfo chain and must free it
with a matching deallocator.
Drive it from a new 'httrack -#D' debug subcommand backed by
htsdns_selftest.c: a scripted getaddrinfo checks address family,
single-address selection, the -@i4/-@i6 family filter, negative caching,
and that a cached host is resolved only once. tests/01_engine-dns.test
runs it.
No behavior change: the default backend is the libc resolver, one
indirect call on the cold resolve path. The seam stays internal (no
HTSEXT_API), so the exported ABI is unchanged. This is groundwork for the
multi-address record and connect fallback that fix the dead-IPv6 connect
stall; the dual-stack assertion pins today's single-address behavior and
will widen with that change.
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
commit 54f5717057aa6b0ccbd0fd3525cb031d73407a24
Author: Xavier Roche <roche@httrack.com>
Date: Mon Jun 22 09:03:59 2026 +0200
Silence coucal hashtable stats on the default log handler (#416)
coucal_delete() logs a per-table stats summary at info level. For tables
without their own handler (webhttrack's NewLangStr/NewLangStrKeys), these
went through default_coucal_loghandler, which printed every level, so
plain `webhttrack` startup dumped two "hashtable ... summary:" lines to
the console.
Drop info-and-below messages there unless debugging is on (hts_dgb_init,
i.e. HTS_LOG / hts_debug); warnings and critical errors still always
print.
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
commit 40fc9de360dc7fe69641dd2825b58eb542de88b6
Author: Xavier Roche <roche@httrack.com>
Date: Mon Jun 22 08:04:19 2026 +0200
debian: rewrite copyright in DEP-5 format, credit all authors (#415)
The free-form debian/copyright credited only Xavier Roche, but the source
tree bundles code from other authors under additional licenses that went
unlisted: Yann Philippot (src/htsjava.*, GPL-3+), the minizip code by
Gilles Vollant, Mathias Svensson, Even Rouault and Info-ZIP (Zlib),
Colin Plumb's md5.c (public domain), and the coucal library (BSD-3-clause)
with Austin Appleby's murmurhash3.h (public domain).
Convert to machine-readable DEP-5 and add a Files stanza per component, as
requested in the Debian NEW review of 3.49.9-1.
Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>