merge 'master'

This commit is contained in:
Mozi 2025-01-28 18:41:56 +00:00
commit 7695a04cac
79 changed files with 2655 additions and 2108 deletions

1
.gitignore vendored
View file

@ -92,6 +92,7 @@ updates_key.pem
*.class
*.isorted
*.stackdump
uv.lock
# Generated
AUTHORS

View file

@ -707,3 +707,32 @@ Sakura286
SamDecrock
stratus-ss
subrat-lima
gitninja1234
jkruse
xiaomac
wesson09
Crypto90
MutantPiggieGolem1
Sanceilaks
Strkmn
0x9fff00
4ft35t
7x11x13
b5i
cotko
d3d9
Dioarya
finch71
hexahigh
InvalidUsernameException
jixunmoe
knackku
krandor
kvk-2015
lonble
msm595
n10dollar
NecroRomnt
pjrobertson
subsense
test20140

View file

@ -4,6 +4,163 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2025.01.26
#### Core changes
- [Fix float comparison values in format filters](https://github.com/yt-dlp/yt-dlp/commit/f7d071e8aa3bf67ed7e0f881e749ca9ab50b3f8f) ([#11880](https://github.com/yt-dlp/yt-dlp/issues/11880)) by [bashonly](https://github.com/bashonly), [Dioarya](https://github.com/Dioarya)
- **utils**: `sanitize_path`: [Fix some incorrect behavior](https://github.com/yt-dlp/yt-dlp/commit/fc12e724a3b4988cfc467d2981887dde48c26b69) ([#11923](https://github.com/yt-dlp/yt-dlp/issues/11923)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **1tv**: [Support sport1tv.ru domain](https://github.com/yt-dlp/yt-dlp/commit/61ae5dc34ac775d6c122575e21ef2153b1273a2b) ([#11889](https://github.com/yt-dlp/yt-dlp/issues/11889)) by [kvk-2015](https://github.com/kvk-2015)
- **abematv**: [Support season extraction](https://github.com/yt-dlp/yt-dlp/commit/c709cc41cbc16edc846e0a431cfa8508396d4cb6) ([#11771](https://github.com/yt-dlp/yt-dlp/issues/11771)) by [middlingphys](https://github.com/middlingphys)
- **bilibili**
- [Support space `/lists/` URLs](https://github.com/yt-dlp/yt-dlp/commit/465167910407449354eb48e9861efd0819f53eb5) ([#11964](https://github.com/yt-dlp/yt-dlp/issues/11964)) by [c-basalt](https://github.com/c-basalt)
- [Support space video list extraction without login](https://github.com/yt-dlp/yt-dlp/commit/78912ed9c81f109169b828c397294a6cf8eacf41) ([#12089](https://github.com/yt-dlp/yt-dlp/issues/12089)) by [grqz](https://github.com/grqz)
- **bilibilidynamic**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/9676b05715b61c8c5dd5598871e60d8807fb1a86) ([#11838](https://github.com/yt-dlp/yt-dlp/issues/11838)) by [finch71](https://github.com/finch71), [grqz](https://github.com/grqz)
- **bluesky**: [Prefer source format](https://github.com/yt-dlp/yt-dlp/commit/ccda63934df7de2823f0834218c4254c7c4d2e4c) ([#12154](https://github.com/yt-dlp/yt-dlp/issues/12154)) by [0x9fff00](https://github.com/0x9fff00)
- **crunchyroll**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/ff44ed53061e065804da6275d182d7928cc03a5e) ([#12195](https://github.com/yt-dlp/yt-dlp/issues/12195)) by [seproDev](https://github.com/seproDev)
- **dropout**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/164368610456e2d96b279f8b120dea08f7b1d74f) ([#12102](https://github.com/yt-dlp/yt-dlp/issues/12102)) by [bashonly](https://github.com/bashonly)
- **eggs**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/20c765d02385a105c8ef13b6f7a737491d29c19a) ([#11904](https://github.com/yt-dlp/yt-dlp/issues/11904)) by [seproDev](https://github.com/seproDev), [subsense](https://github.com/subsense)
- **funimation**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/cdcf1e86726b8fa44f7e7126bbf1c18e1798d25c) ([#12167](https://github.com/yt-dlp/yt-dlp/issues/12167)) by [doe1080](https://github.com/doe1080)
- **goodgame**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e7cc02b14d8d323f805d14325a9c95593a170d28) ([#12173](https://github.com/yt-dlp/yt-dlp/issues/12173)) by [NecroRomnt](https://github.com/NecroRomnt)
- **lbry**: [Support signed URLs](https://github.com/yt-dlp/yt-dlp/commit/de30f652ffb7623500215f5906844f2ae0d92c7b) ([#12138](https://github.com/yt-dlp/yt-dlp/issues/12138)) by [seproDev](https://github.com/seproDev)
- **naver**: [Fix m3u8 formats extraction](https://github.com/yt-dlp/yt-dlp/commit/b3007c44cdac38187fc6600de76959a7079a44d1) ([#12037](https://github.com/yt-dlp/yt-dlp/issues/12037)) by [kclauhk](https://github.com/kclauhk)
- **nest**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/1ef3ee7500c4ab8c26f7fdc5b0ad1da4d16eec8e) ([#11747](https://github.com/yt-dlp/yt-dlp/issues/11747)) by [pabs3](https://github.com/pabs3), [seproDev](https://github.com/seproDev)
- **niconico**: series: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bc88b904cd02314da41ce1b2fdf046d0680fe965) ([#11822](https://github.com/yt-dlp/yt-dlp/issues/11822)) by [test20140](https://github.com/test20140)
- **nrk**
- [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/89198bb23b4d03e0473ac408bfb50d67c2f71165) ([#12069](https://github.com/yt-dlp/yt-dlp/issues/12069)) by [hexahigh](https://github.com/hexahigh)
- [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/45732e2590a1bd0bc9608f5eb68c59341ca84f02) ([#12193](https://github.com/yt-dlp/yt-dlp/issues/12193)) by [hexahigh](https://github.com/hexahigh)
- **patreon**: [Extract attachment filename as `alt_title`](https://github.com/yt-dlp/yt-dlp/commit/e2e73b5c65593ec0a5e685663e6ec0f4aaffc1f1) ([#12000](https://github.com/yt-dlp/yt-dlp/issues/12000)) by [msm595](https://github.com/msm595)
- **pbs**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/13825ab77815ee6e1603abbecbb9f3795057b93c) ([#12024](https://github.com/yt-dlp/yt-dlp/issues/12024)) by [dirkf](https://github.com/dirkf), [krandor](https://github.com/krandor), [n10dollar](https://github.com/n10dollar)
- **piramidetv**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/af2c821d74049b519895288aca23cee81fc4b049) ([#10777](https://github.com/yt-dlp/yt-dlp/issues/10777)) by [HobbyistDev](https://github.com/HobbyistDev), [kclauhk](https://github.com/kclauhk), [seproDev](https://github.com/seproDev)
- **redgifs**: [Support `/ifr/` URLs](https://github.com/yt-dlp/yt-dlp/commit/4850ce91d163579fa615c3c0d44c9bd64682c22b) ([#11805](https://github.com/yt-dlp/yt-dlp/issues/11805)) by [invertico](https://github.com/invertico)
- **rtvslo.si**: show: [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/3fc46086562857d5493cbcff687f76e4e4ed303f) ([#12136](https://github.com/yt-dlp/yt-dlp/issues/12136)) by [cotko](https://github.com/cotko)
- **senategov**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/68221ecc87c6a3f3515757bac2a0f9674a38e3f2) ([#9361](https://github.com/yt-dlp/yt-dlp/issues/9361)) by [Grabien](https://github.com/Grabien), [seproDev](https://github.com/seproDev)
- **soundcloud**
- [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/6d304133ab32bcd1eb78ff1467f1a41dd9b66c33) ([#11945](https://github.com/yt-dlp/yt-dlp/issues/11945)) by [7x11x13](https://github.com/7x11x13)
- user: [Add `/comments` page support](https://github.com/yt-dlp/yt-dlp/commit/7bfb4f72e490310d2681c7f4815218a2ebbc73ee) ([#11999](https://github.com/yt-dlp/yt-dlp/issues/11999)) by [7x11x13](https://github.com/7x11x13)
- **subsplash**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/5d904b077d2f58ae44bdf208d2dcfcc3ff8347f5) ([#11054](https://github.com/yt-dlp/yt-dlp/issues/11054)) by [seproDev](https://github.com/seproDev), [subrat-lima](https://github.com/subrat-lima)
- **theatercomplextownppv**: [Support `live` URLs](https://github.com/yt-dlp/yt-dlp/commit/797d2472a299692e01ad1500e8c3b7bc1daa7fe4) ([#11720](https://github.com/yt-dlp/yt-dlp/issues/11720)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/9ff330948c92f6b2e1d9c928787362ab19cd6c62) ([#12142](https://github.com/yt-dlp/yt-dlp/issues/12142)) by [jixunmoe](https://github.com/jixunmoe)
- **vimp**: Playlist: [Add support for tags](https://github.com/yt-dlp/yt-dlp/commit/d4f5be1735c8feaeb3308666e0b878e9782f529d) ([#11688](https://github.com/yt-dlp/yt-dlp/issues/11688)) by [FestplattenSchnitzel](https://github.com/FestplattenSchnitzel)
- **weibo**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/a567f97b62ae9f6d6f5a9376c361512ab8dceda2) ([#12088](https://github.com/yt-dlp/yt-dlp/issues/12088)) by [4ft35t](https://github.com/4ft35t)
- **xhamster**: [Various improvements](https://github.com/yt-dlp/yt-dlp/commit/3b99a0f0e07f0120ab416f34a8f5ab75d4fdf1d1) ([#11738](https://github.com/yt-dlp/yt-dlp/issues/11738)) by [knackku](https://github.com/knackku)
- **xiaohongshu**: [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/f9f24ae376a9eaca777816479a4a29f6f0ce7681) ([#12147](https://github.com/yt-dlp/yt-dlp/issues/12147)) by [seproDev](https://github.com/seproDev)
- **youtube**
- [Download `tv` client Innertube config](https://github.com/yt-dlp/yt-dlp/commit/326fb1ffaf4e8349f1fe8ba2a81839652e044bff) ([#12168](https://github.com/yt-dlp/yt-dlp/issues/12168)) by [coletdjnz](https://github.com/coletdjnz)
- [Extract `media_type` for livestreams](https://github.com/yt-dlp/yt-dlp/commit/421bc72103d1faed473a451299cd17d6abb433bb) ([#11605](https://github.com/yt-dlp/yt-dlp/issues/11605)) by [nosoop](https://github.com/nosoop)
- [Restore convenience workarounds](https://github.com/yt-dlp/yt-dlp/commit/f0d4b8a5d6354b294bc9631cf15a7160b7bad5de) ([#12181](https://github.com/yt-dlp/yt-dlp/issues/12181)) by [bashonly](https://github.com/bashonly)
- [Update `ios` player client](https://github.com/yt-dlp/yt-dlp/commit/de82acf8769282ce321a86737ecc1d4bef0e82a7) ([#12155](https://github.com/yt-dlp/yt-dlp/issues/12155)) by [b5i](https://github.com/b5i)
- [Use different PO token for GVS and Player](https://github.com/yt-dlp/yt-dlp/commit/6b91d232e316efa406035915532eb126fbaeea38) ([#12090](https://github.com/yt-dlp/yt-dlp/issues/12090)) by [coletdjnz](https://github.com/coletdjnz)
- tab: [Improve shorts title extraction](https://github.com/yt-dlp/yt-dlp/commit/76ac023ff02f06e8c003d104f02a03deeddebdcd) ([#11997](https://github.com/yt-dlp/yt-dlp/issues/11997)) by [bashonly](https://github.com/bashonly), [d3d9](https://github.com/d3d9)
- **zdf**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/bb69f5dab79fb32c4ec0d50e05f7fa26d05d54ba) ([#11041](https://github.com/yt-dlp/yt-dlp/issues/11041)) by [InvalidUsernameException](https://github.com/InvalidUsernameException)
#### Misc. changes
- **cleanup**: Miscellaneous: [3b45319](https://github.com/yt-dlp/yt-dlp/commit/3b4531934465580be22937fecbb6e1a3a9e2334f) by [bashonly](https://github.com/bashonly), [lonble](https://github.com/lonble), [pjrobertson](https://github.com/pjrobertson), [seproDev](https://github.com/seproDev)
### 2025.01.15
#### Extractor changes
- **youtube**: [Do not use `web_creator` as a default client](https://github.com/yt-dlp/yt-dlp/commit/c8541f8b13e743fcfa06667530d13fee8686e22a) ([#12087](https://github.com/yt-dlp/yt-dlp/issues/12087)) by [bashonly](https://github.com/bashonly)
### 2025.01.12
#### Core changes
- [Fix filename sanitization with `--no-windows-filenames`](https://github.com/yt-dlp/yt-dlp/commit/8346b549150003df988538e54c9d8bc4de568979) ([#11988](https://github.com/yt-dlp/yt-dlp/issues/11988)) by [bashonly](https://github.com/bashonly)
- [Validate retries values are non-negative](https://github.com/yt-dlp/yt-dlp/commit/1f4e1e85a27c5b43e34d7706cfd88ffce1b56a4a) ([#11927](https://github.com/yt-dlp/yt-dlp/issues/11927)) by [Strkmn](https://github.com/Strkmn)
#### Extractor changes
- **drtalks**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/1f489f4a45691cac3f9e787d22a3a8a086229ba6) ([#10831](https://github.com/yt-dlp/yt-dlp/issues/10831)) by [pzhlkj6612](https://github.com/pzhlkj6612), [seproDev](https://github.com/seproDev)
- **plvideo**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3c14e9191f3035b9a729d1d87bc0381f42de57cf) ([#10657](https://github.com/yt-dlp/yt-dlp/issues/10657)) by [Sanceilaks](https://github.com/Sanceilaks), [seproDev](https://github.com/seproDev)
- **vine**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/e2ef4fece6c9742d1733e3bae408c4787765f78c) ([#11700](https://github.com/yt-dlp/yt-dlp/issues/11700)) by [allendema](https://github.com/allendema)
- **xiaohongshu**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/763ed06ee69f13949397897bd42ff2ec3dc3d384) ([#11806](https://github.com/yt-dlp/yt-dlp/issues/11806)) by [HobbyistDev](https://github.com/HobbyistDev)
- **youtube**
- [Fix DASH formats incorrectly skipped in some situations](https://github.com/yt-dlp/yt-dlp/commit/0b6b7742c2e7f2a1fcb0b54ef3dd484bab404b3f) ([#11910](https://github.com/yt-dlp/yt-dlp/issues/11910)) by [coletdjnz](https://github.com/coletdjnz)
- [Refactor cookie auth](https://github.com/yt-dlp/yt-dlp/commit/75079f4e3f7dce49b61ef01da7adcd9876a0ca3b) ([#11989](https://github.com/yt-dlp/yt-dlp/issues/11989)) by [coletdjnz](https://github.com/coletdjnz)
- [Use `tv` instead of `mweb` client by default](https://github.com/yt-dlp/yt-dlp/commit/712d2abb32f59b2d246be2901255f84f1a4c30b3) ([#12059](https://github.com/yt-dlp/yt-dlp/issues/12059)) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **cleanup**: Miscellaneous: [dade5e3](https://github.com/yt-dlp/yt-dlp/commit/dade5e35c89adaad04408bfef766820dbca06ebe) by [grqz](https://github.com/grqz), [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
### 2024.12.23
#### Core changes
- [Don't sanitize filename on Unix when `--no-windows-filenames`](https://github.com/yt-dlp/yt-dlp/commit/6fc85f617a5850307fd5b258477070e6ee177796) ([#9591](https://github.com/yt-dlp/yt-dlp/issues/9591)) by [pukkandan](https://github.com/pukkandan)
- **update**
- [Check 64-bitness when upgrading ARM builds](https://github.com/yt-dlp/yt-dlp/commit/b91c3925c2059970daa801cb131c0c2f4f302e72) ([#11819](https://github.com/yt-dlp/yt-dlp/issues/11819)) by [bashonly](https://github.com/bashonly)
- [Fix endless update loop for `linux_exe` builds](https://github.com/yt-dlp/yt-dlp/commit/3d3ee458c1fe49dd5ebd7651a092119d23eb7000) ([#11827](https://github.com/yt-dlp/yt-dlp/issues/11827)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- **soundcloud**: [Various fixes](https://github.com/yt-dlp/yt-dlp/commit/d298693b1b266d198e8eeecb90ea17c4a031268f) ([#11820](https://github.com/yt-dlp/yt-dlp/issues/11820)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Add age-gate workaround for some embeddable videos](https://github.com/yt-dlp/yt-dlp/commit/09a6c687126f04e243fcb105a828787efddd1030) ([#11821](https://github.com/yt-dlp/yt-dlp/issues/11821)) by [bashonly](https://github.com/bashonly)
- [Fix `uploader_id` extraction](https://github.com/yt-dlp/yt-dlp/commit/1a8851b689763e5173b96f70f8a71df0e4a44b66) ([#11818](https://github.com/yt-dlp/yt-dlp/issues/11818)) by [bashonly](https://github.com/bashonly)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/65cf46cddd873fd229dbb0fc0689bca4c201c6b6) ([#11893](https://github.com/yt-dlp/yt-dlp/issues/11893)) by [bashonly](https://github.com/bashonly)
- [Skip iOS formats that require PO Token](https://github.com/yt-dlp/yt-dlp/commit/9f42e68a74f3f00b0253fe70763abd57cac4237b) ([#11890](https://github.com/yt-dlp/yt-dlp/issues/11890)) by [coletdjnz](https://github.com/coletdjnz)
### 2024.12.13
#### Extractor changes
- **patreon**: campaign: [Support /c/ URLs](https://github.com/yt-dlp/yt-dlp/commit/bc262bcad4d3683ceadf61a7eb87e233e72adef3) ([#11756](https://github.com/yt-dlp/yt-dlp/issues/11756)) by [bashonly](https://github.com/bashonly)
- **soundcloud**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/f4d3e9e6dc25077b79849a31a2f67f93fdc01e62) ([#11777](https://github.com/yt-dlp/yt-dlp/issues/11777)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Fix `release_date` extraction](https://github.com/yt-dlp/yt-dlp/commit/d5e2a379f2adcb28bc48c7d9e90716d7278f89d2) ([#11759](https://github.com/yt-dlp/yt-dlp/issues/11759)) by [MutantPiggieGolem1](https://github.com/MutantPiggieGolem1)
- [Fix signature function extraction for `2f1832d2`](https://github.com/yt-dlp/yt-dlp/commit/5460cd91891bf613a2065e2fc278d9903c37a127) ([#11801](https://github.com/yt-dlp/yt-dlp/issues/11801)) by [bashonly](https://github.com/bashonly)
- [Prioritize original language over auto-dubbed audio](https://github.com/yt-dlp/yt-dlp/commit/dc3c4fddcc653989dae71fc563d82a308fc898cc) ([#11803](https://github.com/yt-dlp/yt-dlp/issues/11803)) by [bashonly](https://github.com/bashonly)
- search_url: [Fix playlist searches](https://github.com/yt-dlp/yt-dlp/commit/f6c73aad5f1a67544bea137ebd9d1e22e0e56567) ([#11782](https://github.com/yt-dlp/yt-dlp/issues/11782)) by [Crypto90](https://github.com/Crypto90)
#### Misc. changes
- **cleanup**: [Make more playlist entries lazy](https://github.com/yt-dlp/yt-dlp/commit/54216696261bc07cacd9a837c501d9e0b7fed09e) ([#11763](https://github.com/yt-dlp/yt-dlp/issues/11763)) by [seproDev](https://github.com/seproDev)
### 2024.12.06
#### Core changes
- **cookies**: [Add `--cookies-from-browser` support for MS Store Firefox](https://github.com/yt-dlp/yt-dlp/commit/354cb4026cf2191e1a130ec2a627b95cabfbc60a) ([#11731](https://github.com/yt-dlp/yt-dlp/issues/11731)) by [wesson09](https://github.com/wesson09)
#### Extractor changes
- **bilibili**: [Fix HD formats extraction](https://github.com/yt-dlp/yt-dlp/commit/fca3eb5f8be08d5fab2e18b45b7281a12e566725) ([#11734](https://github.com/yt-dlp/yt-dlp/issues/11734)) by [grqz](https://github.com/grqz)
- **soundcloud**: [Fix formats extraction](https://github.com/yt-dlp/yt-dlp/commit/2feb28028ee48f2185d2d95076e62accb09b9e2e) ([#11742](https://github.com/yt-dlp/yt-dlp/issues/11742)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Fix `n` sig extraction for player `3bb1f723`](https://github.com/yt-dlp/yt-dlp/commit/a95ee6d8803fca9157adecf63732ab58bf87fd88) ([#11750](https://github.com/yt-dlp/yt-dlp/issues/11750)) by [bashonly](https://github.com/bashonly) (With fixes in [4bd2655](https://github.com/yt-dlp/yt-dlp/commit/4bd2655398aed450456197a6767639114a24eac2))
- [Fix signature function extraction](https://github.com/yt-dlp/yt-dlp/commit/4c85ccd1366c88cf93982f8350f58eed17355981) ([#11751](https://github.com/yt-dlp/yt-dlp/issues/11751)) by [bashonly](https://github.com/bashonly)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/2e49c789d3eebc39af8910705d65a98bca0e4c4f) ([#11724](https://github.com/yt-dlp/yt-dlp/issues/11724)) by [bashonly](https://github.com/bashonly)
### 2024.12.03
#### Core changes
- [Add `playlist_webpage_url` field](https://github.com/yt-dlp/yt-dlp/commit/7d6c259a03bc4707a319e5e8c6eff0278707874b) ([#11613](https://github.com/yt-dlp/yt-dlp/issues/11613)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- [Handle fragmented formats in `_remove_duplicate_formats`](https://github.com/yt-dlp/yt-dlp/commit/e0500cbf796323551bbabe5b8ed8c75a511ba47a) ([#11637](https://github.com/yt-dlp/yt-dlp/issues/11637)) by [Grub4K](https://github.com/Grub4K)
- **bilibili**
- [Always try to extract HD formats](https://github.com/yt-dlp/yt-dlp/commit/dc1687648077c5bf64863b307ecc5ab7e029bd8d) ([#10559](https://github.com/yt-dlp/yt-dlp/issues/10559)) by [grqz](https://github.com/grqz)
- [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/239f5f36fe04603bec59c8b975f6a792f10246db) ([#11667](https://github.com/yt-dlp/yt-dlp/issues/11667)) by [grqz](https://github.com/grqz) (With fixes in [f05a1cd](https://github.com/yt-dlp/yt-dlp/commit/f05a1cd1492fc98dc8d80d2081d632a1879913d2) by [bashonly](https://github.com/bashonly), [grqz](https://github.com/grqz))
- [Fix subtitles and chapters extraction](https://github.com/yt-dlp/yt-dlp/commit/a13a336aa6f906812701abec8101b73b73db8ff7) ([#11708](https://github.com/yt-dlp/yt-dlp/issues/11708)) by [xiaomac](https://github.com/xiaomac)
- **chaturbate**: [Fix support for non-public streams](https://github.com/yt-dlp/yt-dlp/commit/4b5eec0aaa7c02627f27a386591b735b90e681a8) ([#11624](https://github.com/yt-dlp/yt-dlp/issues/11624)) by [jkruse](https://github.com/jkruse)
- **dacast**: [Fix HLS AES formats extraction](https://github.com/yt-dlp/yt-dlp/commit/0a0d80800b9350d1a4c4b18d82cfb77ffbc3c507) ([#11644](https://github.com/yt-dlp/yt-dlp/issues/11644)) by [bashonly](https://github.com/bashonly)
- **dropbox**: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/00dcde728635633eee969ad4d498b9f233c4a94e) ([#11636](https://github.com/yt-dlp/yt-dlp/issues/11636)) by [bashonly](https://github.com/bashonly)
- **duoplay**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/62cba8a1bedbfc0ddde7267ae57b72bf5f7ea7b1) ([#11588](https://github.com/yt-dlp/yt-dlp/issues/11588)) by [bashonly](https://github.com/bashonly), [glensc](https://github.com/glensc)
- **facebook**: [Support more groups URLs](https://github.com/yt-dlp/yt-dlp/commit/e0f1ae813b36e783e2348ba2a1566e12f5cd8f6e) ([#11576](https://github.com/yt-dlp/yt-dlp/issues/11576)) by [grqz](https://github.com/grqz)
- **instagram**: [Support `share` URLs](https://github.com/yt-dlp/yt-dlp/commit/360aed810ad85db950df586282d256516c98cd2d) ([#11677](https://github.com/yt-dlp/yt-dlp/issues/11677)) by [grqz](https://github.com/grqz)
- **microsoftembed**: [Make format extraction non fatal](https://github.com/yt-dlp/yt-dlp/commit/2bea7936323ca4b6f3b9b1fdd892566223e30efa) ([#11654](https://github.com/yt-dlp/yt-dlp/issues/11654)) by [seproDev](https://github.com/seproDev)
- **mitele**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/cd0f934604587ed793e9177f6a127e5dcf99a7dd) ([#11683](https://github.com/yt-dlp/yt-dlp/issues/11683)) by [DarkZeros](https://github.com/DarkZeros)
- **stripchat**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/16336c51d0848a6868a4fa04e749fa03548b4913) ([#11596](https://github.com/yt-dlp/yt-dlp/issues/11596)) by [gitninja1234](https://github.com/gitninja1234)
- **tiktok**: [Deprioritize animated thumbnails](https://github.com/yt-dlp/yt-dlp/commit/910ecc422930bca14e2abe4986f5f92359e3cea8) ([#11645](https://github.com/yt-dlp/yt-dlp/issues/11645)) by [bashonly](https://github.com/bashonly)
- **vk**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/c038a7b187ba24360f14134842a7a2cf897c33b1) ([#11715](https://github.com/yt-dlp/yt-dlp/issues/11715)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Adjust player clients for site changes](https://github.com/yt-dlp/yt-dlp/commit/0d146c1e36f467af30e87b7af651bdee67b73500) ([#11663](https://github.com/yt-dlp/yt-dlp/issues/11663)) by [bashonly](https://github.com/bashonly)
- tab: [Fix playlists tab extraction](https://github.com/yt-dlp/yt-dlp/commit/fe70f20aedf528fdee332131bc9b6710e54e6f10) ([#11615](https://github.com/yt-dlp/yt-dlp/issues/11615)) by [seproDev](https://github.com/seproDev)
#### Networking changes
- **Request Handler**: websockets: [Support websockets 14.0+](https://github.com/yt-dlp/yt-dlp/commit/c7316373c0a886f65a07a51e50ee147bb3294c85) ([#11616](https://github.com/yt-dlp/yt-dlp/issues/11616)) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **cleanup**
- [Bump ruff to 0.8.x](https://github.com/yt-dlp/yt-dlp/commit/d8fb3490863653182864d2a53522f350d67a9ff8) ([#11608](https://github.com/yt-dlp/yt-dlp/issues/11608)) by [seproDev](https://github.com/seproDev)
- Miscellaneous
- [ccf0a6b](https://github.com/yt-dlp/yt-dlp/commit/ccf0a6b86b7f68a75463804fe485ec240b8635f0) by [bashonly](https://github.com/bashonly), [pzhlkj6612](https://github.com/pzhlkj6612)
- [2b67ac3](https://github.com/yt-dlp/yt-dlp/commit/2b67ac300ac8b44368fb121637d1743cea8c5b6b) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2024.11.18
#### Important changes

View file

@ -613,8 +613,7 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
--no-restrict-filenames Allow Unicode characters, "&" and spaces in
filenames (default)
--windows-filenames Force filenames to be Windows-compatible
--no-windows-filenames Make filenames Windows-compatible only if
using Windows (default)
--no-windows-filenames Sanitize filenames only minimally
--trim-filenames LENGTH Limit the filename length (excluding
extension) to the specified number of
characters
@ -1761,7 +1760,7 @@ $ yt-dlp --replace-in-metadata "title,uploader" "[ _]" "-"
# EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=mediaconnect,web;formats=incomplete" --extractor-args "funimation:version=uncut"`
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=tv,mweb;formats=incomplete" --extractor-args "twitter:api=syndication"`
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
@ -1770,19 +1769,19 @@ The following extractors use this feature:
#### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `mediaconnect`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,mweb` is used, and `web_creator` is added as needed for age-gated videos when account age verification is required. Similarly, the `_music` variants are added for `music.youtube.com` URLs. Some clients, such as `web` and `android`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. You can use `all` to use all the clients, and `default` for the default clients. You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=all,-web`
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8)
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8), `missing_pot` (include formats that require a PO Token but are missing one)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use for requesting video playback. Comma seperated list of PO Tokens in the format `CLIENT+PO_TOKEN`, e.g. `youtube:po_token=web+XXX,android+YYY`
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
#### youtubetab (YouTube playlists, channels, feeds, etc.)
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)
@ -1796,13 +1795,6 @@ The following extractors use this feature:
* `is_live`: Bypass live HLS detection and manually set `live_status` - a value of `false` will set `not_live`, any other value (or no value) will set `is_live`
* `impersonate`: Target(s) to try and impersonate with the initial webpage request; e.g. `generic:impersonate=safari,chrome-110`. Use `generic:impersonate` to impersonate any available target, and use `generic:impersonate=false` to disable impersonation (default)
#### funimation
* `language`: Audio languages to extract, e.g. `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
#### crunchyrollbeta (Crunchyroll)
* `hardsub`: One or more hardsub versions to extract (in order of preference), or `all` (default: `None` = no hardsubs will be extracted), e.g. `crunchyrollbeta:hardsub=en-US,de-DE`
#### vikichannel
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`
@ -1860,7 +1852,7 @@ The following extractors use this feature:
* `cdn`: One or more CDN IDs to use with the API call for stream URLs, e.g. `gcp_cdn`, `gs_cdn_pc_app`, `gs_cdn_mobile_web`, `gs_cdn_pc_web`
#### soundcloud
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{extension}` (omitting the bitrate), e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known extensions include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
* `formats`: Formats to request from the API. Requested values should be in the format of `{protocol}_{codec}`, e.g. `hls_opus,http_aac`. The `*` character functions as a wildcard, e.g. `*_mp3`, and can be passed by itself to request all formats. Known protocols include `http`, `hls` and `hls-aes`; known codecs include `aac`, `opus` and `mp3`. Original `download` formats are always extracted. Default is `http_aac,hls_aac,http_opus,hls_opus,http_mp3,hls_mp3`
#### orfon (orf:on)
* `prefer_segments_playlist`: Prefer a playlist of program segments instead of a single complete video when available. If individual segments are desired, use `--concat-playlist never --extractor-args "orfon:prefer_segments_playlist"`

View file

@ -239,5 +239,11 @@
"action": "add",
"when": "52c0ffe40ad6e8404d93296f575007b05b04c686",
"short": "[priority] **Login with OAuth is no longer supported for YouTube**\nDue to a change made by the site, yt-dlp is no longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)"
},
{
"action": "change",
"when": "76ac023ff02f06e8c003d104f02a03deeddebdcd",
"short": "[ie/youtube:tab] Improve shorts title extraction (#11997)",
"authors": ["bashonly", "d3d9"]
}
]

View file

@ -76,7 +76,7 @@ dev = [
]
static-analysis = [
"autopep8~=2.0",
"ruff~=0.7.0",
"ruff~=0.9.0",
]
test = [
"pytest~=8.1",
@ -186,6 +186,7 @@ ignore = [
"E501", # line-too-long
"E731", # lambda-assignment
"E741", # ambiguous-variable-name
"UP031", # printf-string-formatting
"UP036", # outdated-version-block
"B006", # mutable-argument-default
"B008", # function-call-in-default-argument
@ -194,6 +195,7 @@ ignore = [
"B023", # function-uses-loop-variable (false positives)
"B028", # no-explicit-stacklevel
"B904", # raise-without-from-inside-except
"A005", # stdlib-module-shadowing
"C401", # unnecessary-generator-set
"C402", # unnecessary-generator-dict
"PIE790", # unnecessary-placeholder
@ -258,9 +260,6 @@ select = [
"A002", # builtin-argument-shadowing
"C408", # unnecessary-collection-call
]
"yt_dlp/jsinterp.py" = [
"UP031", # printf-string-formatting
]
[tool.ruff.lint.isort]
known-first-party = [

View file

@ -171,6 +171,7 @@
- **BilibiliCheese**
- **BilibiliCheeseSeason**
- **BilibiliCollectionList**
- **BiliBiliDynamic**
- **BilibiliFavoritesList**
- **BiliBiliPlayer**
- **BilibiliPlaylist**
@ -303,10 +304,6 @@
- **CrowdBunker**
- **CrowdBunkerChannel**
- **Crtvg**
- **crunchyroll**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:artist**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:music**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:playlist**: [*crunchyroll*](## "netrc machine")
- **CSpan**: C-SPAN
- **CSpanCongress**
- **CtsNews**: 華視新聞
@ -374,6 +371,7 @@
- **Dropbox**
- **Dropout**: [*dropout*](## "netrc machine")
- **DropoutSeason**
- **DrTalks**
- **DrTuber**
- **drtv**
- **drtv:live**
@ -392,6 +390,8 @@
- **Ebay**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
- **eggs:artist**
- **eggs:single**
- **EinsUndEinsTV**: [*1und1tv*](## "netrc machine")
- **EinsUndEinsTVLive**: [*1und1tv*](## "netrc machine")
- **EinsUndEinsTVRecordings**: [*1und1tv*](## "netrc machine")
@ -476,9 +476,6 @@
- **FrontendMastersCourse**: [*frontendmasters*](## "netrc machine")
- **FrontendMastersLesson**: [*frontendmasters*](## "netrc machine")
- **FujiTVFODPlus7**
- **Funimation**: [*funimation*](## "netrc machine")
- **funimation:page**: [*funimation*](## "netrc machine")
- **funimation:show**: [*funimation*](## "netrc machine")
- **Funk**
- **Funker530**
- **Fux**
@ -891,6 +888,8 @@
- **nebula:video**: [*watchnebula*](## "netrc machine")
- **NekoHacker**
- **NerdCubedFeed**
- **Nest**
- **NestClip**
- **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV
@ -1070,6 +1069,8 @@
- **Pinkbike**
- **Pinterest**
- **PinterestCollection**
- **PiramideTV**
- **PiramideTVChannel**
- **pixiv:sketch**
- **pixiv:sketch:user**
- **Pladform**
@ -1086,6 +1087,7 @@
- **pluralsight**: [*pluralsight*](## "netrc machine")
- **pluralsight:course**
- **PlutoTV**: (**Currently broken**)
- **PlVideo**: Платформа
- **PodbayFM**
- **PodbayFMChannel**
- **Podchaser**
@ -1394,6 +1396,8 @@
- **StretchInternet**
- **Stripchat**
- **stv:player**
- **Subsplash**
- **subsplash:playlist**
- **Substack**
- **SunPorno**
- **sverigesradio:episode**
@ -1641,8 +1645,6 @@
- **Vimm:stream**
- **ViMP**
- **ViMP:Playlist**
- **Vine**
- **vine:user**
- **Viously**
- **Viqeo**: (**Currently broken**)
- **Viu**

View file

@ -486,11 +486,11 @@ class TestFormatSelection(unittest.TestCase):
def test_format_filtering(self):
formats = [
{'format_id': 'A', 'filesize': 500, 'width': 1000},
{'format_id': 'B', 'filesize': 1000, 'width': 500},
{'format_id': 'C', 'filesize': 1000, 'width': 400},
{'format_id': 'D', 'filesize': 2000, 'width': 600},
{'format_id': 'E', 'filesize': 3000},
{'format_id': 'A', 'filesize': 500, 'width': 1000, 'aspect_ratio': 1.0},
{'format_id': 'B', 'filesize': 1000, 'width': 500, 'aspect_ratio': 1.33},
{'format_id': 'C', 'filesize': 1000, 'width': 400, 'aspect_ratio': 1.5},
{'format_id': 'D', 'filesize': 2000, 'width': 600, 'aspect_ratio': 1.78},
{'format_id': 'E', 'filesize': 3000, 'aspect_ratio': 0.56},
{'format_id': 'F'},
{'format_id': 'G', 'filesize': 1000000},
]
@ -549,6 +549,31 @@ class TestFormatSelection(unittest.TestCase):
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts, [])
ydl = YDL({'format': 'best[aspect_ratio=1]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'A')
ydl = YDL({'format': 'all[aspect_ratio > 1.00]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['D', 'C', 'B'])
ydl = YDL({'format': 'all[aspect_ratio < 1.00]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['E'])
ydl = YDL({'format': 'best[aspect_ratio=1.5]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'C')
ydl = YDL({'format': 'all[aspect_ratio!=1]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['E', 'D', 'C', 'B'])
@patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', False)
def test_default_format_spec_without_ffmpeg(self):
ydl = YDL({})
@ -761,6 +786,13 @@ class TestYoutubeDL(unittest.TestCase):
test('%(width)06d.%%(ext)s', 'NA.%(ext)s')
test('%%(width)06d.%(ext)s', '%(width)06d.mp4')
# Sanitization options
test('%(title3)s', (None, 'foobartest'))
test('%(title5)s', (None, 'aei_A'), restrictfilenames=True)
test('%(title3)s', (None, 'foo_bar_test'), windowsfilenames=False, restrictfilenames=True)
if sys.platform != 'win32':
test('%(title3)s', (None, 'foobar\\test'), windowsfilenames=False)
# ID sanitization
test('%(id)s', '_abcd', info={'id': '_abcd'})
test('%(some_id)s', '_abcd', info={'some_id': '_abcd'})

View file

@ -249,17 +249,36 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_path('abc/def...'), 'abc\\def..#')
self.assertEqual(sanitize_path('abc.../def'), 'abc..#\\def')
self.assertEqual(sanitize_path('abc.../def...'), 'abc..#\\def..#')
self.assertEqual(sanitize_path('../abc'), '..\\abc')
self.assertEqual(sanitize_path('../../abc'), '..\\..\\abc')
self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc')
self.assertEqual(sanitize_path('\\abc'), '\\abc')
self.assertEqual(sanitize_path('C:abc'), 'C:abc')
self.assertEqual(sanitize_path('C:abc\\..\\'), 'C:..')
self.assertEqual(sanitize_path('C:\\abc:%(title)s.%(ext)s'), 'C:\\abc#%(title)s.%(ext)s')
# Check with nt._path_normpath if available
try:
import nt
nt_path_normpath = getattr(nt, '_path_normpath', None)
except Exception:
nt_path_normpath = None
for test, expected in [
('C:\\', 'C:\\'),
('../abc', '..\\abc'),
('../../abc', '..\\..\\abc'),
('./abc', 'abc'),
('./../abc', '..\\abc'),
('\\abc', '\\abc'),
('C:abc', 'C:abc'),
('C:abc\\..\\', 'C:'),
('C:abc\\..\\def\\..\\..\\', 'C:..'),
('C:\\abc\\xyz///..\\def\\', 'C:\\abc\\def'),
('abc/../', '.'),
('./abc/../', '.'),
]:
result = sanitize_path(test)
assert result == expected, f'{test} was incorrectly resolved'
assert result == sanitize_path(result), f'{test} changed after sanitizing again'
if nt_path_normpath:
assert result == nt_path_normpath(test), f'{test} does not match nt._path_normpath'
def test_sanitize_url(self):
self.assertEqual(sanitize_url('//foo.bar'), 'http://foo.bar')
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')

View file

@ -68,6 +68,16 @@ _SIG_TESTS = [
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'AOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL2QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/3bb1f723/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'MyOSJXtKI3m-uME_jv7-pT12gOFC02RFkGoqWpzE0Cs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
),
(
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q',
),
]
_NSIG_TESTS = [
@ -183,6 +193,14 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/b12cc44b/player_ias.vflset/en_US/base.js',
'keLa5R2U00sR9SQK', 'N1OGyujjEwMnLw',
),
(
'https://www.youtube.com/s/player/3bb1f723/player_ias.vflset/en_US/base.js',
'gK15nzVyaXE9RsMP3z', 'ZFFWFLPWx9DEgQ',
),
(
'https://www.youtube.com/s/player/2f1832d2/player_ias.vflset/en_US/base.js',
'YWt1qdbe8SAfkoPHW5d', 'RrRjWQOJmBiP',
),
]
@ -254,8 +272,11 @@ def signature(jscode, sig_input):
def n_sig(jscode, sig_input):
funcname = YoutubeIE(FakeYDL())._extract_n_function_name(jscode)
return JSInterpreter(jscode).call_function(funcname, sig_input)
ie = YoutubeIE(FakeYDL())
funcname = ie._extract_n_function_name(jscode)
jsi = JSInterpreter(jscode)
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname)))
return func([sig_input])
make_sig_test = t_factory(

View file

@ -266,7 +266,9 @@ class YoutubeDL:
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names
trim_file_name: Limit length of filename (extension excluded)
windowsfilenames: Force the filenames to be windows compatible
windowsfilenames: True: Force filenames to be Windows compatible
False: Sanitize filenames only minimally
This option has no effect when running on Windows
ignoreerrors: Do not stop on download/postprocessing errors.
Can be 'only_download' to ignore only download errors.
Default is 'only_download' for CLI, but False for API
@ -281,7 +283,10 @@ class YoutubeDL:
lazy_playlist: Process playlist entries as they are received.
matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles.
logger: Log messages to a logging.Logger instance.
logger: A class having a `debug`, `warning` and `error` function where
each has a single string parameter, the message to be logged.
For compatibility reasons, both debug and info messages are passed to `debug`.
A debug message will have a prefix of `[debug] ` to discern it from info messages.
logtostderr: Print everything to stderr instead of stdout.
consoletitle: Display progress in the console window's titlebar.
writedescription: Write the video description to a .description file
@ -1116,7 +1121,7 @@ class YoutubeDL:
def raise_no_formats(self, info, forced=False, *, msg=None):
has_drm = info.get('_has_drm')
ignored, expected = self.params.get('ignore_no_formats_error'), bool(msg)
msg = msg or has_drm and 'This video is DRM protected' or 'No video formats found!'
msg = msg or (has_drm and 'This video is DRM protected') or 'No video formats found!'
if forced or not ignored:
raise ExtractorError(msg, video_id=info['id'], ie=info['extractor'],
expected=has_drm or ignored or expected)
@ -1192,8 +1197,7 @@ class YoutubeDL:
def prepare_outtmpl(self, outtmpl, info_dict, sanitize=False):
""" Make the outtmpl and info_dict suitable for substitution: ydl.escape_outtmpl(outtmpl) % info_dict
@param sanitize Whether to sanitize the output as a filename.
For backward compatibility, a function can also be passed
@param sanitize Whether to sanitize the output as a filename
"""
info_dict.setdefault('epoch', int(time.time())) # keep epoch consistent once set
@ -1309,14 +1313,23 @@ class YoutubeDL:
na = self.params.get('outtmpl_na_placeholder', 'NA')
def filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames')):
def filename_sanitizer(key, value, restricted):
return sanitize_filename(str(value), restricted=restricted, is_id=(
bool(re.search(r'(^|[_.])id(\.|$)', key))
if 'filename-sanitization' in self.params['compat_opts']
else NO_DEFAULT))
sanitizer = sanitize if callable(sanitize) else filename_sanitizer
sanitize = bool(sanitize)
if callable(sanitize):
self.deprecation_warning('Passing a callable "sanitize" to YoutubeDL.prepare_outtmpl is deprecated')
elif not sanitize:
pass
elif (sys.platform != 'win32' and not self.params.get('restrictfilenames')
and self.params.get('windowsfilenames') is False):
def sanitize(key, value):
return str(value).replace('/', '\u29F8').replace('\0', '')
else:
def sanitize(key, value):
return filename_sanitizer(key, value, restricted=self.params.get('restrictfilenames'))
def _dumpjson_default(obj):
if isinstance(obj, (set, LazyList)):
@ -1399,13 +1412,13 @@ class YoutubeDL:
if sanitize:
# If value is an object, sanitize might convert it to a string
# So we convert it to repr first
# So we manually convert it before sanitizing
if fmt[-1] == 'r':
value, fmt = repr(value), str_fmt
elif fmt[-1] == 'a':
value, fmt = ascii(value), str_fmt
if fmt[-1] in 'csra':
value = sanitizer(last_field, value)
value = sanitize(last_field, value)
key = '{}\0{}'.format(key.replace('%', '%\0'), outer_mobj.group('format'))
TMPL_DICT[key] = value
@ -2108,7 +2121,7 @@ class YoutubeDL:
m = operator_rex.fullmatch(filter_spec)
if m:
try:
comparison_value = int(m.group('value'))
comparison_value = float(m.group('value'))
except ValueError:
comparison_value = parse_filesize(m.group('value'))
if comparison_value is None:
@ -2196,7 +2209,7 @@ class YoutubeDL:
def _default_format_spec(self, info_dict):
prefer_best = (
self.params['outtmpl']['default'] == '-'
or info_dict.get('is_live') and not self.params.get('live_from_start'))
or (info_dict.get('is_live') and not self.params.get('live_from_start')))
def can_merge():
merger = FFmpegMergerPP(self)
@ -2365,7 +2378,7 @@ class YoutubeDL:
vexts=[f['ext'] for f in video_fmts],
aexts=[f['ext'] for f in audio_fmts],
preferences=(try_call(lambda: self.params['merge_output_format'].split('/'))
or self.params.get('prefer_free_formats') and ('webm', 'mkv')))
or (self.params.get('prefer_free_formats') and ('webm', 'mkv'))))
filtered = lambda *keys: filter(None, (traverse_obj(fmt, *keys) for fmt in formats_info))
@ -3541,8 +3554,8 @@ class YoutubeDL:
and info_dict.get('container') == 'm4a_dash',
'writing DASH m4a. Only some players support this container',
FFmpegFixupM4aPP)
ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
ffmpeg_fixup((downloader == 'hlsnative' and not self.params.get('hls_use_mpegts'))
or (info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None),
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
FFmpegFixupM3u8PP)
ffmpeg_fixup(downloader == 'dashsegments'

View file

@ -261,9 +261,11 @@ def validate_options(opts):
elif value in ('inf', 'infinite'):
return float('inf')
try:
return int(value)
int_value = int(value)
except (TypeError, ValueError):
validate(False, f'{name} retry count', value)
validate_positive(f'{name} retry count', int_value)
return int_value
opts.retries = parse_retries('download', opts.retries)
opts.fragment_retries = parse_retries('fragment', opts.fragment_retries)
@ -1062,7 +1064,7 @@ def _real_main(argv=None):
# If we only have a single process attached, then the executable was double clicked
# When using `pyinstaller` with `--onefile`, two processes get attached
is_onefile = hasattr(sys, '_MEIPASS') and os.path.basename(sys._MEIPASS).startswith('_MEI')
if attached_processes == 1 or is_onefile and attached_processes == 2:
if attached_processes == 1 or (is_onefile and attached_processes == 2):
print(parser._generate_error_message(
'Do not double-click the executable, instead call it from a command line.\n'
'Please read the README for further information on how to use yt-dlp: '
@ -1109,9 +1111,9 @@ def main(argv=None):
from .extractor import gen_extractors, list_extractors
__all__ = [
'main',
'YoutubeDL',
'parse_options',
'gen_extractors',
'list_extractors',
'main',
'parse_options',
]

View file

@ -534,19 +534,17 @@ def ghash(subkey, data):
__all__ = [
'aes_cbc_decrypt',
'aes_cbc_decrypt_bytes',
'aes_ctr_decrypt',
'aes_decrypt_text',
'aes_decrypt',
'aes_ecb_decrypt',
'aes_gcm_decrypt_and_verify',
'aes_gcm_decrypt_and_verify_bytes',
'aes_cbc_encrypt',
'aes_cbc_encrypt_bytes',
'aes_ctr_decrypt',
'aes_ctr_encrypt',
'aes_decrypt',
'aes_decrypt_text',
'aes_ecb_decrypt',
'aes_ecb_encrypt',
'aes_encrypt',
'aes_gcm_decrypt_and_verify',
'aes_gcm_decrypt_and_verify_bytes',
'key_expansion',
'pad_block',
'pkcs7_padding',

View file

@ -195,7 +195,10 @@ def _extract_firefox_cookies(profile, container, logger):
def _firefox_browser_dirs():
if sys.platform in ('cygwin', 'win32'):
yield os.path.expandvars(R'%APPDATA%\Mozilla\Firefox\Profiles')
yield from map(os.path.expandvars, (
R'%APPDATA%\Mozilla\Firefox\Profiles',
R'%LOCALAPPDATA%\Packages\Mozilla.Firefox_n80bbvh6b1yt2\LocalCache\Roaming\Mozilla\Firefox\Profiles',
))
elif sys.platform == 'darwin':
yield os.path.expanduser('~/Library/Application Support/Firefox/Profiles')
@ -1276,8 +1279,8 @@ class YoutubeDLCookieJar(http.cookiejar.MozillaCookieJar):
def _really_save(self, f, ignore_discard, ignore_expires):
now = time.time()
for cookie in self:
if (not ignore_discard and cookie.discard
or not ignore_expires and cookie.is_expired(now)):
if ((not ignore_discard and cookie.discard)
or (not ignore_expires and cookie.is_expired(now))):
continue
name, value = cookie.name, cookie.value
if value is None:

View file

@ -119,12 +119,12 @@ class HlsFD(FragmentFD):
self.to_screen(f'[{self.FD_NAME}] Fragment downloads will be delegated to {real_downloader.get_basename()}')
def is_ad_fragment_start(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
return ((s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s)
or (s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad')))
def is_ad_fragment_end(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s
or s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment'))
return ((s.startswith('#ANVATO-SEGMENT-INFO') and 'type=master' in s)
or (s.startswith('#UPLYNK-SEGMENT') and s.endswith(',segment')))
fragments = []

View file

@ -123,8 +123,8 @@ class YoutubeLiveChatFD(FragmentFD):
data,
lambda x: x['continuationContents']['liveChatContinuation'], dict) or {}
func = (info_dict['protocol'] == 'youtube_live_chat' and parse_actions_live
or frag_index == 1 and try_refresh_replay_beginning
func = ((info_dict['protocol'] == 'youtube_live_chat' and parse_actions_live)
or (frag_index == 1 and try_refresh_replay_beginning)
or parse_actions_replay)
return (True, *func(live_chat_continuation))
except HTTPError as err:

View file

@ -259,6 +259,7 @@ from .bilibili import (
BilibiliCheeseIE,
BilibiliCheeseSeasonIE,
BilibiliCollectionListIE,
BiliBiliDynamicIE,
BilibiliFavoritesListIE,
BiliBiliIE,
BiliBiliPlayerIE,
@ -443,12 +444,6 @@ from .crowdbunker import (
CrowdBunkerIE,
)
from .crtvg import CrtvgIE
from .crunchyroll import (
CrunchyrollArtistIE,
CrunchyrollBetaIE,
CrunchyrollBetaShowIE,
CrunchyrollMusicIE,
)
from .cspan import (
CSpanCongressIE,
CSpanIE,
@ -558,6 +553,7 @@ from .dropout import (
DropoutIE,
DropoutSeasonIE,
)
from .drtalks import DrTalksIE
from .drtuber import DrTuberIE
from .drtv import (
DRTVIE,
@ -587,6 +583,10 @@ from .egghead import (
EggheadCourseIE,
EggheadLessonIE,
)
from .eggs import (
EggsArtistIE,
EggsIE,
)
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .elementorembed import ElementorEmbedIE
@ -702,11 +702,6 @@ from .frontendmasters import (
FrontendMastersLessonIE,
)
from .fujitv import FujiTVFODPlus7IE
from .funimation import (
FunimationIE,
FunimationPageIE,
FunimationShowIE,
)
from .funk import FunkIE
from .funker530 import Funker530IE
from .fuyintv import FuyinTVIE
@ -1281,6 +1276,10 @@ from .nebula import (
)
from .nekohacker import NekoHackerIE
from .nerdcubed import NerdCubedFeedIE
from .nest import (
NestClipIE,
NestIE,
)
from .neteasemusic import (
NetEaseMusicAlbumIE,
NetEaseMusicDjRadioIE,
@ -1535,6 +1534,10 @@ from .pinterest import (
PinterestCollectionIE,
PinterestIE,
)
from .piramidetv import (
PiramideTVChannelIE,
PiramideTVIE,
)
from .pixivsketch import (
PixivSketchIE,
PixivSketchUserIE,
@ -1554,6 +1557,7 @@ from .pluralsight import (
PluralsightIE,
)
from .plutotv import PlutoTVIE
from .plvideo import PlVideoIE
from .podbayfm import (
PodbayFMChannelIE,
PodbayFMIE,
@ -1984,6 +1988,10 @@ from .streetvoice import StreetVoiceIE
from .stretchinternet import StretchInternetIE
from .stripchat import StripchatIE
from .stv import STVPlayerIE
from .subsplash import (
SubsplashIE,
SubsplashPlaylistIE,
)
from .substack import SubstackIE
from .sunporno import SunPornoIE
from .sverigesradio import (
@ -2357,10 +2365,6 @@ from .vimm import (
VimmIE,
VimmRecordingIE,
)
from .vine import (
VineIE,
VineUserIE,
)
from .viously import ViouslyIE
from .viqeo import ViqeoIE
from .viu import (

View file

@ -421,14 +421,15 @@ class AbemaTVIE(AbemaTVBaseIE):
class AbemaTVTitleIE(AbemaTVBaseIE):
_VALID_URL = r'https?://abema\.tv/video/title/(?P<id>[^?/]+)'
_VALID_URL = r'https?://abema\.tv/video/title/(?P<id>[^?/#]+)/?(?:\?(?:[^#]+&)?s=(?P<season>[^&#]+))?'
_PAGE_SIZE = 25
_TESTS = [{
'url': 'https://abema.tv/video/title/90-1597',
'url': 'https://abema.tv/video/title/90-1887',
'info_dict': {
'id': '90-1597',
'id': '90-1887',
'title': 'シャッフルアイランド',
'description': 'md5:61b2425308f41a5282a926edda66f178',
},
'playlist_mincount': 2,
}, {
@ -436,41 +437,54 @@ class AbemaTVTitleIE(AbemaTVBaseIE):
'info_dict': {
'id': '193-132',
'title': '真心が届く~僕とスターのオフィス・ラブ!?~',
'description': 'md5:9b59493d1f3a792bafbc7319258e7af8',
},
'playlist_mincount': 16,
}, {
'url': 'https://abema.tv/video/title/25-102',
'url': 'https://abema.tv/video/title/25-1nzan-whrxe',
'info_dict': {
'id': '25-102',
'title': 'ソードアート・オンライン アリシゼーション',
'id': '25-1nzan-whrxe',
'title': 'ソードアート・オンライン',
'description': 'md5:c094904052322e6978495532bdbf06e6',
},
'playlist_mincount': 24,
'playlist_mincount': 25,
}, {
'url': 'https://abema.tv/video/title/26-2mzbynr-cph?s=26-2mzbynr-cph_s40',
'info_dict': {
'title': '〈物語〉シリーズ',
'id': '26-2mzbynr-cph',
'description': 'md5:e67873de1c88f360af1f0a4b84847a52',
},
'playlist_count': 59,
}]
def _fetch_page(self, playlist_id, series_version, page):
def _fetch_page(self, playlist_id, series_version, season_id, page):
query = {
'seriesVersion': series_version,
'offset': str(page * self._PAGE_SIZE),
'order': 'seq',
'limit': str(self._PAGE_SIZE),
}
if season_id:
query['seasonId'] = season_id
programs = self._call_api(
f'v1/video/series/{playlist_id}/programs', playlist_id,
note=f'Downloading page {page + 1}',
query={
'seriesVersion': series_version,
'offset': str(page * self._PAGE_SIZE),
'order': 'seq',
'limit': str(self._PAGE_SIZE),
})
query=query)
yield from (
self.url_result(f'https://abema.tv/video/episode/{x}')
for x in traverse_obj(programs, ('programs', ..., 'id')))
def _entries(self, playlist_id, series_version):
def _entries(self, playlist_id, series_version, season_id):
return OnDemandPagedList(
functools.partial(self._fetch_page, playlist_id, series_version),
functools.partial(self._fetch_page, playlist_id, series_version, season_id),
self._PAGE_SIZE)
def _real_extract(self, url):
playlist_id = self._match_id(url)
playlist_id, season_id = self._match_valid_url(url).group('id', 'season')
series_info = self._call_api(f'v1/video/series/{playlist_id}', playlist_id)
return self.playlist_result(
self._entries(playlist_id, series_info['version']), playlist_id=playlist_id,
self._entries(playlist_id, series_info['version'], season_id), playlist_id=playlist_id,
playlist_title=series_info.get('title'),
playlist_description=series_info.get('content'))

View file

@ -232,7 +232,7 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
error = self._parse_json(e.cause.response.read(), video_id)
message = error.get('message')
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
if e.cause.status == 403 and error.get('code') == 'player-bad-geolocation-country':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message)
else:

View file

@ -4,7 +4,9 @@ import hashlib
import itertools
import json
import math
import random
import re
import string
import time
import urllib.parse
import uuid
@ -18,7 +20,6 @@ from ..utils import (
InAdvancePagedList,
OnDemandPagedList,
bool_or_none,
clean_html,
determine_ext,
filter_dict,
float_or_none,
@ -63,7 +64,7 @@ class BilibiliBaseIE(InfoExtractor):
'support_formats', lambda _, v: v['quality'] not in parsed_qualities))], delim=', ')
if missing_formats:
self.to_screen(
f'Format(s) {missing_formats} are missing; you have to login or '
f'Format(s) {missing_formats} are missing; you have to '
f'become a premium member to download them. {self._login_hint()}')
def extract_formats(self, play_info):
@ -165,14 +166,18 @@ class BilibiliBaseIE(InfoExtractor):
params['w_rid'] = hashlib.md5(f'{query}{self._get_wbi_key(video_id)}'.encode()).hexdigest()
return params
def _download_playinfo(self, bvid, cid, headers=None, qn=None):
params = {'bvid': bvid, 'cid': cid, 'fnval': 4048}
if qn:
params['qn'] = qn
def _download_playinfo(self, bvid, cid, headers=None, query=None):
params = {'bvid': bvid, 'cid': cid, 'fnval': 4048, **(query or {})}
if self.is_logged_in:
params.pop('try_look', None)
if qn := params.get('qn'):
note = f'Downloading video format {qn} for cid {cid}'
else:
note = f'Downloading video formats for cid {cid}'
return self._download_json(
'https://api.bilibili.com/x/player/wbi/playurl', bvid,
query=self._sign_wbi(params, bvid), headers=headers,
note=f'Downloading video formats for cid {cid} {qn or ""}')['data']
query=self._sign_wbi(params, bvid), headers=headers, note=note)['data']
def json2srt(self, json_data):
srt_data = ''
@ -191,7 +196,7 @@ class BilibiliBaseIE(InfoExtractor):
}
video_info = self._download_json(
'https://api.bilibili.com/x/player/v2', video_id,
'https://api.bilibili.com/x/player/wbi/v2', video_id,
query={'aid': aid, 'cid': cid} if aid else {'bvid': video_id, 'cid': cid},
note=f'Extracting subtitle info {cid}', headers=self._HEADERS)
if traverse_obj(video_info, ('data', 'need_login_subtitle')):
@ -207,7 +212,7 @@ class BilibiliBaseIE(InfoExtractor):
def _get_chapters(self, aid, cid):
chapters = aid and cid and self._download_json(
'https://api.bilibili.com/x/player/v2', aid, query={'aid': aid, 'cid': cid},
'https://api.bilibili.com/x/player/wbi/v2', aid, query={'aid': aid, 'cid': cid},
note='Extracting chapters', fatal=False, headers=self._HEADERS)
return traverse_obj(chapters, ('data', 'view_points', ..., {
'title': 'content',
@ -286,7 +291,7 @@ class BilibiliBaseIE(InfoExtractor):
('data', 'interaction', 'graph_version', {int_or_none}))
cid_edges = self._get_divisions(video_id, graph_version, {1: {'cid': cid}}, 1)
for cid, edges in cid_edges.items():
play_info = self._download_playinfo(video_id, cid, headers=headers)
play_info = self._download_playinfo(video_id, cid, headers=headers, query={'try_look': 1})
yield {
**metainfo,
'id': f'{video_id}_{cid}',
@ -639,40 +644,29 @@ class BiliBiliIE(BilibiliBaseIE):
headers['Referer'] = url
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)
if traverse_obj(initial_state, ('error', 'trueCode')) == -403:
self.raise_login_required()
if traverse_obj(initial_state, ('error', 'trueCode')) == -404:
raise ExtractorError(
'This video may be deleted or geo-restricted. '
'You might want to try a VPN or a proxy server (with --proxy)', expected=True)
is_festival = 'videoData' not in initial_state
if is_festival:
video_data = initial_state['videoInfo']
else:
play_info_obj = self._search_json(
r'window\.__playinfo__\s*=', webpage, 'play info', video_id, fatal=False)
if not play_info_obj:
if traverse_obj(initial_state, ('error', 'trueCode')) == -403:
self.raise_login_required()
if traverse_obj(initial_state, ('error', 'trueCode')) == -404:
raise ExtractorError(
'This video may be deleted or geo-restricted. '
'You might want to try a VPN or a proxy server (with --proxy)', expected=True)
play_info = traverse_obj(play_info_obj, ('data', {dict}))
if not play_info:
if traverse_obj(play_info_obj, 'code') == 87007:
toast = get_element_by_class('tips-toast', webpage) or ''
msg = clean_html(
f'{get_element_by_class("belongs-to", toast) or ""}'
+ (get_element_by_class('level', toast) or ''))
raise ExtractorError(
f'This is a supporter-only video: {msg}. {self._login_hint()}', expected=True)
raise ExtractorError('Failed to extract play info')
video_data = initial_state['videoData']
video_id, title = video_data['bvid'], video_data.get('title')
# Bilibili anthologies are similar to playlists but all videos share the same video ID as the anthology itself.
page_list_json = not is_festival and traverse_obj(
page_list_json = (not is_festival and traverse_obj(
self._download_json(
'https://api.bilibili.com/x/player/pagelist', video_id,
fatal=False, query={'bvid': video_id, 'jsonp': 'jsonp'},
note='Extracting videos in anthology', headers=headers),
'data', expected_type=list) or []
'data', expected_type=list)) or []
is_anthology = len(page_list_json) > 1
part_id = int_or_none(parse_qs(url).get('p', [None])[-1])
@ -691,8 +685,6 @@ class BiliBiliIE(BilibiliBaseIE):
festival_info = {}
if is_festival:
play_info = self._download_playinfo(video_id, cid, headers=headers)
festival_info = traverse_obj(initial_state, {
'uploader': ('videoInfo', 'upName'),
'uploader_id': ('videoInfo', 'upMid', {str_or_none}),
@ -727,62 +719,79 @@ class BiliBiliIE(BilibiliBaseIE):
self._get_interactive_entries(video_id, cid, metainfo, headers=headers), **metainfo,
duration=traverse_obj(initial_state, ('videoData', 'duration', {int_or_none})),
__post_extractor=self.extract_comments(aid))
else:
formats = self.extract_formats(play_info)
if not traverse_obj(play_info, ('dash')):
# we only have legacy formats and need additional work
has_qn = lambda x: x in traverse_obj(formats, (..., 'quality'))
for qn in traverse_obj(play_info, ('accept_quality', lambda _, v: not has_qn(v), {int})):
formats.extend(traverse_obj(
self.extract_formats(self._download_playinfo(video_id, cid, headers=headers, qn=qn)),
lambda _, v: not has_qn(v['quality'])))
self._check_missing_formats(play_info, formats)
flv_formats = traverse_obj(formats, lambda _, v: v['fragments'])
if flv_formats and len(flv_formats) < len(formats):
# Flv and mp4 are incompatible due to `multi_video` workaround, so drop one
if not self._configuration_arg('prefer_multi_flv'):
dropped_fmts = ', '.join(
f'{f.get("format_note")} ({f.get("format_id")})' for f in flv_formats)
formats = traverse_obj(formats, lambda _, v: not v.get('fragments'))
if dropped_fmts:
self.to_screen(
f'Dropping incompatible flv format(s) {dropped_fmts} since mp4 is available. '
'To extract flv, pass --extractor-args "bilibili:prefer_multi_flv"')
else:
formats = traverse_obj(
# XXX: Filtering by extractor-arg is for testing purposes
formats, lambda _, v: v['quality'] == int(self._configuration_arg('prefer_multi_flv')[0]),
) or [max(flv_formats, key=lambda x: x['quality'])]
play_info = None
if self.is_logged_in:
play_info = traverse_obj(
self._search_json(r'window\.__playinfo__\s*=', webpage, 'play info', video_id, default=None),
('data', {dict}))
if not play_info:
play_info = self._download_playinfo(video_id, cid, headers=headers, query={'try_look': 1})
formats = self.extract_formats(play_info)
if traverse_obj(formats, (0, 'fragments')):
# We have flv formats, which are individual short videos with their own timestamps and metainfo
# Binary concatenation corrupts their timestamps, so we need a `multi_video` workaround
return {
**metainfo,
'_type': 'multi_video',
'entries': [{
'id': f'{metainfo["id"]}_{idx}',
'title': metainfo['title'],
'http_headers': metainfo['http_headers'],
'formats': [{
**fragment,
'format_id': formats[0].get('format_id'),
}],
'subtitles': self.extract_subtitles(video_id, cid) if idx == 0 else None,
'__post_extractor': self.extract_comments(aid) if idx == 0 else None,
} for idx, fragment in enumerate(formats[0]['fragments'])],
'duration': float_or_none(play_info.get('timelength'), scale=1000),
}
else:
return {
**metainfo,
'formats': formats,
'duration': float_or_none(play_info.get('timelength'), scale=1000),
'chapters': self._get_chapters(aid, cid),
'subtitles': self.extract_subtitles(video_id, cid),
'__post_extractor': self.extract_comments(aid),
}
if video_data.get('is_upower_exclusive'):
high_level = traverse_obj(initial_state, ('elecFullInfo', 'show_info', 'high_level', {dict})) or {}
msg = f'{join_nonempty("title", "sub_title", from_dict=high_level, delim="")}. {self._login_hint()}'
if not formats:
raise ExtractorError(f'This is a supporter-only video: {msg}', expected=True)
if '试看' in traverse_obj(play_info, ('accept_description', ..., {str})):
self.report_warning(
f'This is a supporter-only video, only the preview will be extracted: {msg}',
video_id=video_id)
if not traverse_obj(play_info, 'dash'):
# we only have legacy formats and need additional work
has_qn = lambda x: x in traverse_obj(formats, (..., 'quality'))
for qn in traverse_obj(play_info, ('accept_quality', lambda _, v: not has_qn(v), {int})):
formats.extend(traverse_obj(
self.extract_formats(self._download_playinfo(video_id, cid, headers=headers, query={'qn': qn})),
lambda _, v: not has_qn(v['quality'])))
self._check_missing_formats(play_info, formats)
flv_formats = traverse_obj(formats, lambda _, v: v['fragments'])
if flv_formats and len(flv_formats) < len(formats):
# Flv and mp4 are incompatible due to `multi_video` workaround, so drop one
if not self._configuration_arg('prefer_multi_flv'):
dropped_fmts = ', '.join(
f'{f.get("format_note")} ({f.get("format_id")})' for f in flv_formats)
formats = traverse_obj(formats, lambda _, v: not v.get('fragments'))
if dropped_fmts:
self.to_screen(
f'Dropping incompatible flv format(s) {dropped_fmts} since mp4 is available. '
'To extract flv, pass --extractor-args "bilibili:prefer_multi_flv"')
else:
formats = traverse_obj(
# XXX: Filtering by extractor-arg is for testing purposes
formats, lambda _, v: v['quality'] == int(self._configuration_arg('prefer_multi_flv')[0]),
) or [max(flv_formats, key=lambda x: x['quality'])]
if traverse_obj(formats, (0, 'fragments')):
# We have flv formats, which are individual short videos with their own timestamps and metainfo
# Binary concatenation corrupts their timestamps, so we need a `multi_video` workaround
return {
**metainfo,
'_type': 'multi_video',
'entries': [{
'id': f'{metainfo["id"]}_{idx}',
'title': metainfo['title'],
'http_headers': metainfo['http_headers'],
'formats': [{
**fragment,
'format_id': formats[0].get('format_id'),
}],
'subtitles': self.extract_subtitles(video_id, cid) if idx == 0 else None,
'__post_extractor': self.extract_comments(aid) if idx == 0 else None,
} for idx, fragment in enumerate(formats[0]['fragments'])],
'duration': float_or_none(play_info.get('timelength'), scale=1000),
}
return {
**metainfo,
'formats': formats,
'duration': float_or_none(play_info.get('timelength'), scale=1000),
'chapters': self._get_chapters(aid, cid),
'subtitles': self.extract_subtitles(video_id, cid),
'__post_extractor': self.extract_comments(aid),
}
class BiliBiliBangumiIE(BilibiliBaseIE):
@ -860,10 +869,16 @@ class BiliBiliBangumiIE(BilibiliBaseIE):
self.raise_login_required('This video is for premium members only')
headers['Referer'] = url
play_info = self._download_json(
'https://api.bilibili.com/pgc/player/web/v2/playurl', episode_id,
'Extracting episode', query={'fnval': '4048', 'ep_id': episode_id},
headers=headers)
play_info = (
self._search_json(
r'playurlSSRData\s*=', webpage, 'embedded page info', episode_id,
end_pattern='\n', default=None)
or self._download_json(
'https://api.bilibili.com/pgc/player/web/v2/playurl', episode_id,
'Extracting episode', query={'fnval': 12240, 'ep_id': episode_id},
headers=headers))
premium_only = play_info.get('code') == -10403
play_info = traverse_obj(play_info, ('result', 'video_info', {dict})) or {}
@ -1164,28 +1179,26 @@ class BilibiliSpaceBaseIE(BilibiliBaseIE):
class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)(?P<video>/video)?/?(?:[?#]|$)'
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)(?P<video>(?:/upload)?/video)?/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://space.bilibili.com/3985676/video',
'info_dict': {
'id': '3985676',
},
'playlist_mincount': 178,
'skip': 'login required',
}, {
'url': 'https://space.bilibili.com/313580179/video',
'info_dict': {
'id': '313580179',
},
'playlist_mincount': 92,
'skip': 'login required',
}]
def _real_extract(self, url):
playlist_id, is_video_url = self._match_valid_url(url).group('id', 'video')
if not is_video_url:
self.to_screen('A channel URL was given. Only the channel\'s videos will be downloaded. '
'To download audios, add a "/audio" to the URL')
'To download audios, add a "/upload/audio" to the URL')
def fetch_page(page_idx):
query = {
@ -1198,6 +1211,12 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
'ps': 30,
'tid': 0,
'web_location': 1550101,
'dm_img_list': '[]',
'dm_img_str': base64.b64encode(
''.join(random.choices(string.printable, k=random.randint(16, 64))).encode())[:-2].decode(),
'dm_cover_img_str': base64.b64encode(
''.join(random.choices(string.printable, k=random.randint(32, 128))).encode())[:-2].decode(),
'dm_img_inter': '{"ds":[],"wh":[6093,6631,31],"of":[430,760,380]}',
}
try:
@ -1208,14 +1227,14 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 412:
raise ExtractorError(
'Request is blocked by server (412), please add cookies, wait and try later.', expected=True)
'Request is blocked by server (412), please wait and try later.', expected=True)
raise
status_code = response['code']
if status_code == -401:
raise ExtractorError(
'Request is blocked by server (401), please add cookies, wait and try later.', expected=True)
elif status_code == -352 and not self.is_logged_in:
self.raise_login_required('Request is rejected, you need to login to access playlist')
'Request is blocked by server (401), please wait and try later.', expected=True)
elif status_code == -352:
raise ExtractorError('Request is rejected by server (352)', expected=True)
elif status_code != 0:
raise ExtractorError(f'Request failed ({status_code}): {response.get("message") or "Unknown error"}')
return response['data']
@ -1237,9 +1256,9 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
class BilibiliSpaceAudioIE(BilibiliSpaceBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)/audio'
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)/(?:upload/)?audio'
_TESTS = [{
'url': 'https://space.bilibili.com/313580179/audio',
'url': 'https://space.bilibili.com/313580179/upload/audio',
'info_dict': {
'id': '313580179',
},
@ -1262,7 +1281,8 @@ class BilibiliSpaceAudioIE(BilibiliSpaceBaseIE):
}
def get_entries(page_data):
for entry in page_data.get('data', []):
# data is None when the playlist is empty
for entry in page_data.get('data') or []:
yield self.url_result(f'https://www.bilibili.com/audio/au{entry["id"]}', BilibiliAudioIE, entry['id'])
metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries)
@ -1286,30 +1306,43 @@ class BilibiliSpaceListBaseIE(BilibiliSpaceBaseIE):
class BilibiliCollectionListIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/collectiondetail/?\?sid=(?P<sid>\d+)'
_VALID_URL = [
r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/collectiondetail/?\?sid=(?P<sid>\d+)',
r'https?://space\.bilibili\.com/(?P<mid>\d+)/lists/(?P<sid>\d+)',
]
_TESTS = [{
'url': 'https://space.bilibili.com/2142762/channel/collectiondetail?sid=57445',
'url': 'https://space.bilibili.com/2142762/lists/3662502?type=season',
'info_dict': {
'id': '2142762_57445',
'title': '【完结】《底特律 变人》全结局流程解说',
'description': '',
'id': '2142762_3662502',
'title': '合集·《黑神话悟空》流程解说',
'description': '黑神话悟空 相关节目',
'uploader': '老戴在此',
'uploader_id': '2142762',
'timestamp': int,
'upload_date': str,
'thumbnail': 'https://archive.biliimg.com/bfs/archive/e0e543ae35ad3df863ea7dea526bc32e70f4c091.jpg',
'thumbnail': 'https://archive.biliimg.com/bfs/archive/22302e17dc849dd4533606d71bc89df162c3a9bf.jpg',
},
'playlist_mincount': 31,
'playlist_mincount': 62,
}, {
'url': 'https://space.bilibili.com/2142762/lists/3662502',
'only_matching': True,
}, {
'url': 'https://space.bilibili.com/2142762/channel/collectiondetail?sid=57445',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if BilibiliSeriesListIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
mid, sid = self._match_valid_url(url).group('mid', 'sid')
playlist_id = f'{mid}_{sid}'
def fetch_page(page_idx):
return self._download_json(
'https://api.bilibili.com/x/polymer/space/seasons_archives_list',
playlist_id, note=f'Downloading page {page_idx}',
'https://api.bilibili.com/x/polymer/web-space/seasons_archives_list',
playlist_id, note=f'Downloading page {page_idx}', headers={'Referer': url},
query={'mid': mid, 'season_id': sid, 'page_num': page_idx + 1, 'page_size': 30})['data']
def get_metadata(page_data):
@ -1336,9 +1369,12 @@ class BilibiliCollectionListIE(BilibiliSpaceListBaseIE):
class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/seriesdetail/?\?\bsid=(?P<sid>\d+)'
_VALID_URL = [
r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/seriesdetail/?\?\bsid=(?P<sid>\d+)',
r'https?://space\.bilibili\.com/(?P<mid>\d+)/lists/(?P<sid>\d+)/?\?(?:[^#]+&)?type=series(?:[&#]|$)',
]
_TESTS = [{
'url': 'https://space.bilibili.com/1958703906/channel/seriesdetail?sid=547718&ctype=0',
'url': 'https://space.bilibili.com/1958703906/lists/547718?type=series',
'info_dict': {
'id': '1958703906_547718',
'title': '直播回放',
@ -1351,6 +1387,9 @@ class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
'modified_date': str,
},
'playlist_mincount': 513,
}, {
'url': 'https://space.bilibili.com/1958703906/channel/seriesdetail?sid=547718&ctype=0',
'only_matching': True,
}]
def _real_extract(self, url):
@ -1369,7 +1408,7 @@ class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
def fetch_page(page_idx):
return self._download_json(
'https://api.bilibili.com/x/series/archives',
playlist_id, note=f'Downloading page {page_idx}',
playlist_id, note=f'Downloading page {page_idx}', headers={'Referer': url},
query={'mid': mid, 'series_id': sid, 'pn': page_idx + 1, 'ps': 30})['data']
def get_metadata(page_data):
@ -1848,6 +1887,47 @@ class BiliBiliPlayerIE(InfoExtractor):
ie=BiliBiliIE.ie_key(), video_id=video_id)
class BiliBiliDynamicIE(InfoExtractor):
_VALID_URL = r'https?://(?:t\.bilibili\.com|(?:www\.)?bilibili\.com/opus)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://t.bilibili.com/998134289197432852',
'info_dict': {
'id': 'BV1TAmBYVEJr',
'ext': 'mp4',
'uploader_id': '1192648858',
'comment_count': int,
'_old_archive_ids': ['bilibili 113457567568273_part1'],
'thumbnail': 'http://i2.hdslb.com/bfs/archive/50091efd965d9f13ff6814f7ad374f90ab21e77d.jpg',
'duration': 929.238,
'upload_date': '20241110',
'uploader': '何同学工作室',
'like_count': int,
'view_count': int,
'title': '美国小朋友就玩这个何同学工作室11月开箱',
'description': '本期产品信息:\n机器狗\n气味模拟器\nCloudboom Strike LS\n无弦吉他\n蓝牙磁带音箱\n神奇画板',
'timestamp': 1731232800,
'tags': list,
'chapters': list,
},
}]
def _real_extract(self, url):
post_id = self._match_id(url)
# Without the newer chrome UA, the API will return an error (-352)
post_data = self._download_json(
'https://api.bilibili.com/x/polymer/web-dynamic/v1/detail', post_id,
query={'id': post_id}, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
})
video_url = traverse_obj(post_data, (
'data', 'item', (None, 'orig'), 'modules', 'module_dynamic',
(('major', ('archive', 'pgc')), ('additional', ('reserve', 'common'))),
'jump_url', {url_or_none}, any, {self._proto_relative_url}))
if not video_url or (self.suitable(video_url) and post_id == self._match_id(video_url)):
raise ExtractorError('No valid video URL found', expected=True)
return self.url_result(video_url)
class BiliIntlBaseIE(InfoExtractor):
_API_URL = 'https://api.bilibili.tv/intl/gateway'
_NETRC_MACHINE = 'biliintl'

View file

@ -88,7 +88,7 @@ class BlueskyIE(InfoExtractor):
},
}, {
'url': 'https://bsky.app/profile/de1.pds.tentacle.expert/post/3l3w4tnezek2e',
'md5': '1af9c7fda061cf7593bbffca89e43d1c',
'md5': 'cc0110ed1f6b0247caac8234cc1e861d',
'info_dict': {
'id': '3l3w4tnezek2e',
'ext': 'mp4',
@ -133,6 +133,8 @@ class BlueskyIE(InfoExtractor):
'channel_follower_count': int,
'categories': ['Entertainment'],
'tags': [],
'chapters': list,
'heatmap': 'count:100',
},
'add_ie': ['Youtube'],
}, {
@ -184,14 +186,14 @@ class BlueskyIE(InfoExtractor):
},
},
}, {
'url': 'https://bsky.app/profile/alt.bun.how/post/3l7rdfxhyds2f',
'url': 'https://bsky.app/profile/cinny.bun.how/post/3l7rdfxhyds2f',
'md5': '8775118b235cf9fa6b5ad30f95cda75c',
'info_dict': {
'id': '3l7rdfxhyds2f',
'ext': 'mp4',
'uploader': 'cinnamon',
'uploader_id': 'alt.bun.how',
'uploader_url': 'https://bsky.app/profile/alt.bun.how',
'uploader_id': 'cinny.bun.how',
'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
'channel_url': 'https://bsky.app/profile/did:plc:7x6rtuenkuvxq3zsvffp2ide',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
@ -284,17 +286,19 @@ class BlueskyIE(InfoExtractor):
services, ('service', lambda _, x: x['type'] == 'AtprotoPersonalDataServer',
'serviceEndpoint', {url_or_none}, any)) or 'https://bsky.social'
def _real_extract(self, url):
handle, video_id = self._match_valid_url(url).group('handle', 'id')
post = self._download_json(
def _extract_post(self, handle, post_id):
return self._download_json(
'https://public.api.bsky.app/xrpc/app.bsky.feed.getPostThread',
video_id, query={
'uri': f'at://{handle}/app.bsky.feed.post/{video_id}',
post_id, query={
'uri': f'at://{handle}/app.bsky.feed.post/{post_id}',
'depth': 0,
'parentHeight': 0,
})['thread']['post']
def _real_extract(self, url):
handle, video_id = self._match_valid_url(url).group('handle', 'id')
post = self._extract_post(handle, video_id)
entries = []
# app.bsky.embed.video.view/app.bsky.embed.external.view
entries.extend(self._extract_videos(post, video_id))
@ -341,6 +345,7 @@ class BlueskyIE(InfoExtractor):
formats.append({
'format_id': 'blob',
'quality': 1,
'url': update_url_query(
self._BLOB_URL_TMPL.format(endpoint), {'did': did, 'cid': video_cid}),
**traverse_obj(root, (*embed_path, 'aspectRatio', {

View file

@ -31,6 +31,7 @@ from ..utils import (
update_url_query,
url_or_none,
)
from ..utils.traversal import traverse_obj
class BrightcoveLegacyIE(InfoExtractor):
@ -935,8 +936,8 @@ class BrightcoveNewIE(BrightcoveNewBaseIE):
if content_type == 'playlist':
return self.playlist_result(
[self._parse_brightcove_metadata(vid, vid.get('id'), headers)
for vid in json_data.get('videos', []) if vid.get('id')],
(self._parse_brightcove_metadata(vid, vid['id'], headers)
for vid in traverse_obj(json_data, ('videos', lambda _, v: v['id']))),
json_data.get('id'), json_data.get('name'),
json_data.get('description'))

View file

@ -59,16 +59,15 @@ class ChaturbateIE(InfoExtractor):
'Accept': 'application/json',
}, fatal=False, impersonate=True) or {}
status = response.get('room_status')
if status != 'public':
if error := self._ERROR_MAP.get(status):
raise ExtractorError(error, expected=True)
self.report_warning('Falling back to webpage extraction')
return None
m3u8_url = response.get('url')
if not m3u8_url:
self.raise_geo_restricted()
status = response.get('room_status')
if error := self._ERROR_MAP.get(status):
raise ExtractorError(error, expected=True)
if status == 'public':
self.raise_geo_restricted()
self.report_warning(f'Got status "{status}" from API; falling back to webpage extraction')
return None
return {
'id': video_id,

View file

@ -1854,12 +1854,26 @@ class InfoExtractor:
@staticmethod
def _remove_duplicate_formats(formats):
format_urls = set()
seen_urls = set()
seen_fragment_urls = set()
unique_formats = []
for f in formats:
if f['url'] not in format_urls:
format_urls.add(f['url'])
fragments = f.get('fragments')
if callable(fragments):
unique_formats.append(f)
elif fragments:
fragment_urls = frozenset(
fragment.get('url') or urljoin(f['fragment_base_url'], fragment['path'])
for fragment in fragments)
if fragment_urls not in seen_fragment_urls:
seen_fragment_urls.add(fragment_urls)
unique_formats.append(f)
elif f['url'] not in seen_urls:
seen_urls.add(f['url'])
unique_formats.append(f)
formats[:] = unique_formats
def _is_valid_url(self, url, video_id, item='video', headers={}):
@ -3789,7 +3803,7 @@ class InfoExtractor:
def mark_watched(self, *args, **kwargs):
if not self.get_param('mark_watched', False):
return
if self.supports_login() and self._get_login_info()[0] is not None or self._cookies_passed:
if (self.supports_login() and self._get_login_info()[0] is not None) or self._cookies_passed:
self._mark_watched(*args, **kwargs)
def _mark_watched(self, *args, **kwargs):

View file

@ -1,692 +0,0 @@
import base64
import uuid
from .common import InfoExtractor
from ..networking import Request
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
format_field,
int_or_none,
jwt_decode_hs256,
parse_age_limit,
parse_count,
parse_iso8601,
qualities,
time_seconds,
traverse_obj,
url_or_none,
urlencode_postdata,
)
class CrunchyrollBaseIE(InfoExtractor):
_BASE_URL = 'https://www.crunchyroll.com'
_API_BASE = 'https://api.crunchyroll.com'
_NETRC_MACHINE = 'crunchyroll'
_SWITCH_USER_AGENT = 'Crunchyroll/1.8.0 Nintendo Switch/12.3.12.0 UE4/4.27'
_REFRESH_TOKEN = None
_AUTH_HEADERS = None
_AUTH_EXPIRY = None
_API_ENDPOINT = None
_BASIC_AUTH = 'Basic ' + base64.b64encode(':'.join((
't-kdgp2h8c3jub8fn0fq',
'yfLDfMfrYvKXh4JXS1LEI2cCqu1v5Wan',
)).encode()).decode()
_IS_PREMIUM = None
_LOCALE_LOOKUP = {
'ar': 'ar-SA',
'de': 'de-DE',
'': 'en-US',
'es': 'es-419',
'es-es': 'es-ES',
'fr': 'fr-FR',
'it': 'it-IT',
'pt-br': 'pt-BR',
'pt-pt': 'pt-PT',
'ru': 'ru-RU',
'hi': 'hi-IN',
}
def _set_auth_info(self, response):
CrunchyrollBaseIE._IS_PREMIUM = 'cr_premium' in traverse_obj(response, ('access_token', {jwt_decode_hs256}, 'benefits', ...))
CrunchyrollBaseIE._AUTH_HEADERS = {'Authorization': response['token_type'] + ' ' + response['access_token']}
CrunchyrollBaseIE._AUTH_EXPIRY = time_seconds(seconds=traverse_obj(response, ('expires_in', {float_or_none}), default=300) - 10)
def _request_token(self, headers, data, note='Requesting token', errnote='Failed to request token'):
try:
return self._download_json(
f'{self._BASE_URL}/auth/v1/token', None, note=note, errnote=errnote,
headers=headers, data=urlencode_postdata(data), impersonate=True)
except ExtractorError as error:
if not isinstance(error.cause, HTTPError) or error.cause.status != 403:
raise
if target := error.cause.response.extensions.get('impersonate'):
raise ExtractorError(f'Got HTTP Error 403 when using impersonate target "{target}"')
raise ExtractorError(
'Request blocked by Cloudflare. '
'Install the required impersonation dependency if possible, '
'or else navigate to Crunchyroll in your browser, '
'then pass the fresh cookies (with --cookies-from-browser or --cookies) '
'and your browser\'s User-Agent (with --user-agent)', expected=True)
def _perform_login(self, username, password):
if not CrunchyrollBaseIE._REFRESH_TOKEN:
CrunchyrollBaseIE._REFRESH_TOKEN = self.cache.load(self._NETRC_MACHINE, username)
if CrunchyrollBaseIE._REFRESH_TOKEN:
return
try:
login_response = self._request_token(
headers={'Authorization': self._BASIC_AUTH}, data={
'username': username,
'password': password,
'grant_type': 'password',
'scope': 'offline_access',
}, note='Logging in', errnote='Failed to log in')
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 401:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
CrunchyrollBaseIE._REFRESH_TOKEN = login_response['refresh_token']
self.cache.store(self._NETRC_MACHINE, username, CrunchyrollBaseIE._REFRESH_TOKEN)
self._set_auth_info(login_response)
def _update_auth(self):
if CrunchyrollBaseIE._AUTH_HEADERS and CrunchyrollBaseIE._AUTH_EXPIRY > time_seconds():
return
auth_headers = {'Authorization': self._BASIC_AUTH}
if CrunchyrollBaseIE._REFRESH_TOKEN:
data = {
'refresh_token': CrunchyrollBaseIE._REFRESH_TOKEN,
'grant_type': 'refresh_token',
'scope': 'offline_access',
}
else:
data = {'grant_type': 'client_id'}
auth_headers['ETP-Anonymous-ID'] = uuid.uuid4()
try:
auth_response = self._request_token(auth_headers, data)
except ExtractorError as error:
username, password = self._get_login_info()
if not username or not isinstance(error.cause, HTTPError) or error.cause.status != 400:
raise
self.to_screen('Refresh token has expired. Re-logging in')
CrunchyrollBaseIE._REFRESH_TOKEN = None
self.cache.store(self._NETRC_MACHINE, username, None)
self._perform_login(username, password)
return
self._set_auth_info(auth_response)
def _locale_from_language(self, language):
config_locale = self._configuration_arg('metadata', ie_key=CrunchyrollBetaIE, casesense=True)
return config_locale[0] if config_locale else self._LOCALE_LOOKUP.get(language)
def _call_base_api(self, endpoint, internal_id, lang, note=None, query={}):
self._update_auth()
if not endpoint.startswith('/'):
endpoint = f'/{endpoint}'
query = query.copy()
locale = self._locale_from_language(lang)
if locale:
query['locale'] = locale
return self._download_json(
f'{self._BASE_URL}{endpoint}', internal_id, note or f'Calling API: {endpoint}',
headers=CrunchyrollBaseIE._AUTH_HEADERS, query=query)
def _call_api(self, path, internal_id, lang, note='api', query={}):
if not path.startswith(f'/content/v2/{self._API_ENDPOINT}/'):
path = f'/content/v2/{self._API_ENDPOINT}/{path}'
try:
result = self._call_base_api(
path, internal_id, lang, f'Downloading {note} JSON ({self._API_ENDPOINT})', query=query)
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 404:
return None
raise
if not result:
raise ExtractorError(f'Unexpected response when downloading {note} JSON')
return result
def _extract_chapters(self, internal_id):
# if no skip events are available, a 403 xml error is returned
skip_events = self._download_json(
f'https://static.crunchyroll.com/skip-events/production/{internal_id}.json',
internal_id, note='Downloading chapter info', fatal=False, errnote=False)
if not skip_events:
return None
chapters = []
for event in ('recap', 'intro', 'credits', 'preview'):
start = traverse_obj(skip_events, (event, 'start', {float_or_none}))
end = traverse_obj(skip_events, (event, 'end', {float_or_none}))
# some chapters have no start and/or ending time, they will just be ignored
if start is None or end is None:
continue
chapters.append({'title': event.capitalize(), 'start_time': start, 'end_time': end})
return chapters
def _extract_stream(self, identifier, display_id=None):
if not display_id:
display_id = identifier
self._update_auth()
headers = {**CrunchyrollBaseIE._AUTH_HEADERS, 'User-Agent': self._SWITCH_USER_AGENT}
try:
stream_response = self._download_json(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/{identifier}/console/switch/play',
display_id, note='Downloading stream info', errnote='Failed to download stream info', headers=headers)
except ExtractorError as error:
if self.get_param('ignore_no_formats_error'):
self.report_warning(error.orig_msg)
return [], {}
elif isinstance(error.cause, HTTPError) and error.cause.status == 420:
raise ExtractorError(
'You have reached the rate-limit for active streams; try again later', expected=True)
raise
available_formats = {'': ('', '', stream_response['url'])}
for hardsub_lang, stream in traverse_obj(stream_response, ('hardSubs', {dict.items}, lambda _, v: v[1]['url'])):
available_formats[hardsub_lang] = (f'hardsub-{hardsub_lang}', hardsub_lang, stream['url'])
requested_hardsubs = [('' if val == 'none' else val) for val in (self._configuration_arg('hardsub') or ['none'])]
hardsub_langs = [lang for lang in available_formats if lang]
if hardsub_langs and 'all' not in requested_hardsubs:
full_format_langs = set(requested_hardsubs)
self.to_screen(f'Available hardsub languages: {", ".join(hardsub_langs)}')
self.to_screen(
'To extract formats of a hardsub language, use '
'"--extractor-args crunchyrollbeta:hardsub=<language_code or all>". '
'See https://github.com/yt-dlp/yt-dlp#crunchyrollbeta-crunchyroll for more info',
only_once=True)
else:
full_format_langs = set(map(str.lower, available_formats))
audio_locale = traverse_obj(stream_response, ('audioLocale', {str}))
hardsub_preference = qualities(requested_hardsubs[::-1])
formats, subtitles = [], {}
for format_id, hardsub_lang, stream_url in available_formats.values():
if hardsub_lang.lower() in full_format_langs:
adaptive_formats, dash_subs = self._extract_mpd_formats_and_subtitles(
stream_url, display_id, mpd_id=format_id, headers=CrunchyrollBaseIE._AUTH_HEADERS,
fatal=False, note=f'Downloading {f"{format_id} " if hardsub_lang else ""}MPD manifest')
self._merge_subtitles(dash_subs, target=subtitles)
else:
continue # XXX: Update this if meta mpd formats work; will be tricky with token invalidation
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = audio_locale
f['quality'] = hardsub_preference(hardsub_lang.lower())
formats.extend(adaptive_formats)
for locale, subtitle in traverse_obj(stream_response, (('subtitles', 'captions'), {dict.items}, ...)):
subtitles.setdefault(locale, []).append(traverse_obj(subtitle, {'url': 'url', 'ext': 'format'}))
# Invalidate stream token to avoid rate-limit
error_msg = 'Unable to invalidate stream token; you may experience rate-limiting'
if stream_token := stream_response.get('token'):
self._request_webpage(Request(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/token/{identifier}/{stream_token}/inactive',
headers=headers, method='PATCH'), display_id, 'Invalidating stream token', error_msg, fatal=False)
else:
self.report_warning(error_msg)
return formats, subtitles
class CrunchyrollCmsBaseIE(CrunchyrollBaseIE):
_API_ENDPOINT = 'cms'
_CMS_EXPIRY = None
def _call_cms_api_signed(self, path, internal_id, lang, note='api'):
if not CrunchyrollCmsBaseIE._CMS_EXPIRY or CrunchyrollCmsBaseIE._CMS_EXPIRY <= time_seconds():
response = self._call_base_api('index/v2', None, lang, 'Retrieving signed policy')['cms_web']
CrunchyrollCmsBaseIE._CMS_QUERY = {
'Policy': response['policy'],
'Signature': response['signature'],
'Key-Pair-Id': response['key_pair_id'],
}
CrunchyrollCmsBaseIE._CMS_BUCKET = response['bucket']
CrunchyrollCmsBaseIE._CMS_EXPIRY = parse_iso8601(response['expires']) - 10
if not path.startswith('/cms/v2'):
path = f'/cms/v2{CrunchyrollCmsBaseIE._CMS_BUCKET}/{path}'
return self._call_base_api(
path, internal_id, lang, f'Downloading {note} JSON (signed cms)', query=CrunchyrollCmsBaseIE._CMS_QUERY)
class CrunchyrollBetaIE(CrunchyrollCmsBaseIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'''(?x)
https?://(?:beta\.|www\.)?crunchyroll\.com/
(?:(?P<lang>\w{2}(?:-\w{2})?)/)?
watch/(?!concert|musicvideo)(?P<id>\w+)'''
_TESTS = [{
# Premium only
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'info_dict': {
'id': 'GY2P1Q98Y',
'ext': 'mp4',
'duration': 1380.241,
'timestamp': 1459632600,
'description': 'md5:a022fbec4fbb023d43631032c91ed64b',
'title': 'World Trigger Episode 73 To the Future',
'upload_date': '20160402',
'series': 'World Trigger',
'series_id': 'GR757DMKY',
'season': 'World Trigger',
'season_id': 'GR9P39NJ6',
'season_number': 1,
'episode': 'To the Future',
'episode_number': 73,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'chapters': 'count:2',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {
'skip_download': 'm3u8',
'extractor_args': {'crunchyrollbeta': {'hardsub': ['de-DE']}},
'format': 'bv[format_id~=hardsub]',
},
}, {
# Premium only
'url': 'https://www.crunchyroll.com/watch/GYE5WKQGR',
'info_dict': {
'id': 'GYE5WKQGR',
'ext': 'mp4',
'duration': 366.459,
'timestamp': 1476788400,
'description': 'md5:74b67283ffddd75f6e224ca7dc031e76',
'title': 'SHELTER Porter Robinson presents Shelter the Animation',
'upload_date': '20161018',
'series': 'SHELTER',
'series_id': 'GYGG09WWY',
'season': 'SHELTER',
'season_id': 'GR09MGK4R',
'season_number': 1,
'episode': 'Porter Robinson presents Shelter the Animation',
'episode_number': 0,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': True},
}, {
'url': 'https://www.crunchyroll.com/watch/GJWU2VKK3/cherry-blossom-meeting-and-a-coming-blizzard',
'info_dict': {
'id': 'GJWU2VKK3',
'ext': 'mp4',
'duration': 1420.054,
'description': 'md5:2d1c67c0ec6ae514d9c30b0b99a625cd',
'title': 'The Ice Guy and His Cool Female Colleague Episode 1 Cherry Blossom Meeting and a Coming Blizzard',
'series': 'The Ice Guy and His Cool Female Colleague',
'series_id': 'GW4HM75NP',
'season': 'The Ice Guy and His Cool Female Colleague',
'season_id': 'GY9PC21VE',
'season_number': 1,
'episode': 'Cherry Blossom Meeting and a Coming Blizzard',
'episode_number': 1,
'chapters': 'count:2',
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'timestamp': 1672839000,
'upload_date': '20230104',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/GM8F313NQ',
'info_dict': {
'id': 'GM8F313NQ',
'ext': 'mp4',
'title': 'Garakowa -Restore the World-',
'description': 'md5:8d2f8b6b9dd77d87810882e7d2ee5608',
'duration': 3996.104,
'age_limit': 13,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/G62PEZ2E6',
'info_dict': {
'id': 'G62PEZ2E6',
'description': 'md5:8d2f8b6b9dd77d87810882e7d2ee5608',
'age_limit': 13,
'duration': 65.138,
'title': 'Garakowa -Restore the World-',
},
'playlist_mincount': 5,
}, {
'url': 'https://www.crunchyroll.com/de/watch/GY2P1Q98Y',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True,
}]
# We want to support lazy playlist filtering and movie listings cannot be inside a playlist
_RETURN_TYPE = 'video'
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
# We need to use unsigned API call to allow ratings query string
response = traverse_obj(self._call_api(
f'objects/{internal_id}', internal_id, lang, 'object info', {'ratings': 'true'}), ('data', 0, {dict}))
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
object_type = response.get('type')
if object_type == 'episode':
result = self._transform_episode_response(response)
elif object_type == 'movie':
result = self._transform_movie_response(response)
elif object_type == 'movie_listing':
first_movie_id = traverse_obj(response, ('movie_listing_metadata', 'first_movie_id'))
if not self._yes_playlist(internal_id, first_movie_id):
return self.url_result(f'{self._BASE_URL}/{lang}watch/{first_movie_id}', CrunchyrollBetaIE, first_movie_id)
def entries():
movies = self._call_api(f'movie_listings/{internal_id}/movies', internal_id, lang, 'movie list')
for movie_response in traverse_obj(movies, ('data', ...)):
yield self.url_result(
f'{self._BASE_URL}/{lang}watch/{movie_response["id"]}',
CrunchyrollBetaIE, **self._transform_movie_response(movie_response))
return self.playlist_result(entries(), **self._transform_movie_response(response))
else:
raise ExtractorError(f'Unknown object type {object_type}')
if not self._IS_PREMIUM and traverse_obj(response, (f'{object_type}_metadata', 'is_premium_only')):
message = f'This {object_type} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], result['subtitles'] = self._extract_stream(internal_id)
result['chapters'] = self._extract_chapters(internal_id)
def calculate_count(item):
return parse_count(''.join((item['displayed'], item.get('unit') or '')))
result.update(traverse_obj(response, ('rating', {
'like_count': ('up', {calculate_count}),
'dislike_count': ('down', {calculate_count}),
})))
return result
@staticmethod
def _transform_episode_response(data):
metadata = traverse_obj(data, (('episode_metadata', None), {dict}), get_all=False) or {}
return {
'id': data['id'],
'title': ' \u2013 '.join((
('{}{}'.format(
format_field(metadata, 'season_title'),
format_field(metadata, 'episode', ' Episode %s'))),
format_field(data, 'title'))),
**traverse_obj(data, {
'episode': ('title', {str}),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', 'thumbnail', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
**traverse_obj(metadata, {
'duration': ('duration_ms', {float_or_none(scale=1000)}),
'timestamp': ('upload_date', {parse_iso8601}),
'series': ('series_title', {str}),
'series_id': ('series_id', {str}),
'season': ('season_title', {str}),
'season_id': ('season_id', {str}),
'season_number': ('season_number', ({int}, {float_or_none})),
'episode_number': ('sequence_number', ({int}, {float_or_none})),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
'language': ('audio_locale', {str}),
}, get_all=False),
}
@staticmethod
def _transform_movie_response(data):
metadata = traverse_obj(data, (('movie_metadata', 'movie_listing_metadata', None), {dict}), get_all=False) or {}
return {
'id': data['id'],
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', 'thumbnail', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
**traverse_obj(metadata, {
'duration': ('duration_ms', {float_or_none(scale=1000)}),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
}),
}
class CrunchyrollBetaShowIE(CrunchyrollCmsBaseIE):
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'''(?x)
https?://(?:beta\.|www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
series/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
'info_dict': {
'id': 'GY19NQ2QR',
'title': 'Girl Friend BETA',
'description': 'md5:99c1b22ee30a74b536a8277ced8eb750',
# XXX: `thumbnail` does not get set from `thumbnails` in playlist
# 'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'age_limit': 14,
},
'playlist_mincount': 10,
}, {
'url': 'https://beta.crunchyroll.com/it/series/GY19NQ2QR',
'only_matching': True,
}]
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
def entries():
seasons_response = self._call_cms_api_signed(f'seasons?series_id={internal_id}', internal_id, lang, 'seasons')
for season in traverse_obj(seasons_response, ('items', ..., {dict})):
episodes_response = self._call_cms_api_signed(
f'episodes?season_id={season["id"]}', season['id'], lang, 'episode list')
for episode_response in traverse_obj(episodes_response, ('items', ..., {dict})):
yield self.url_result(
f'{self._BASE_URL}/{lang}watch/{episode_response["id"]}',
CrunchyrollBetaIE, **CrunchyrollBetaIE._transform_episode_response(episode_response))
return self.playlist_result(
entries(), internal_id,
**traverse_obj(self._call_api(f'series/{internal_id}', internal_id, lang, 'series'), ('data', 0, {
'title': ('title', {str}),
'description': ('description', {lambda x: x.replace(r'\r\n', '\n')}),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
'thumbnails': ('images', ..., ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
})))
class CrunchyrollMusicIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:music'
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
watch/(?P<type>concert|musicvideo)/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79',
'info_dict': {
'ext': 'mp4',
'id': 'MV5B02C79',
'display_id': 'egaono-hana',
'title': 'Egaono Hana',
'track': 'Egaono Hana',
'artists': ['Goose house'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C',
'info_dict': {
'ext': 'mp4',
'id': 'MV88BB7F2C',
'display_id': 'crossing-field',
'title': 'Crossing Field',
'track': 'Crossing Field',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['Anime'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135',
'info_dict': {
'ext': 'mp4',
'id': 'MC2E2AC135',
'display_id': 'live-is-smile-always-364joker-at-yokohama-arena',
'title': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'track': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'description': 'md5:747444e7e6300907b7a43f0a0503072e',
'genres': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79/egaono-hana',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135/live-is-smile-always-364joker-at-yokohama-arena',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C/crossing-field',
'only_matching': True,
}]
_API_ENDPOINT = 'music'
def _real_extract(self, url):
lang, internal_id, object_type = self._match_valid_url(url).group('lang', 'id', 'type')
path, name = {
'concert': ('concerts', 'concert info'),
'musicvideo': ('music_videos', 'music video info'),
}[object_type]
response = traverse_obj(self._call_api(f'{path}/{internal_id}', internal_id, lang, name), ('data', 0, {dict}))
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
result = self._transform_music_response(response)
if not self._IS_PREMIUM and response.get('isPremiumOnly'):
message = f'This {response.get("type") or "media"} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], _ = self._extract_stream(f'music/{internal_id}', internal_id)
return result
@staticmethod
def _transform_music_response(data):
return {
'id': data['id'],
**traverse_obj(data, {
'display_id': 'slug',
'title': 'title',
'track': 'title',
'artists': ('artist', 'name', all),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n') or None}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
'genres': ('genres', ..., 'displayValue'),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
}),
}
class CrunchyrollArtistIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:artist'
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
artist/(?P<id>\w{10})'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/artist/MA179CB50D',
'info_dict': {
'id': 'MA179CB50D',
'title': 'LiSA',
'genres': ['Anime', 'J-Pop', 'Rock'],
'description': 'md5:16d87de61a55c3f7d6c454b73285938e',
},
'playlist_mincount': 83,
}, {
'url': 'https://www.crunchyroll.com/artist/MA179CB50D/lisa',
'only_matching': True,
}]
_API_ENDPOINT = 'music'
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
response = traverse_obj(self._call_api(
f'artists/{internal_id}', internal_id, lang, 'artist info'), ('data', 0))
def entries():
for attribute, path in [('concerts', 'concert'), ('videos', 'musicvideo')]:
for internal_id in traverse_obj(response, (attribute, ...)):
yield self.url_result(f'{self._BASE_URL}/watch/{path}/{internal_id}', CrunchyrollMusicIE, internal_id)
return self.playlist_result(entries(), **self._transform_artist_response(response))
@staticmethod
def _transform_artist_response(data):
return {
'id': data['id'],
**traverse_obj(data, {
'title': 'name',
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
'genres': ('genres', ..., 'displayValue'),
}),
}

View file

@ -1,7 +1,4 @@
import time
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import int_or_none
@ -31,9 +28,6 @@ class CultureUnpluggedIE(InfoExtractor):
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
# request setClientTimezone.php to get PHPSESSID cookie which is need to get valid json data in the next request
self._request_webpage(HEADRequest(
'http://www.cultureunplugged.com/setClientTimezone.php?timeOffset=%d' % -(time.timezone / 3600)), display_id)
movie_data = self._download_json(
f'http://www.cultureunplugged.com/movie-data/cu-{video_id}.json', display_id)

View file

@ -1,3 +1,4 @@
import functools
import hashlib
import re
import time
@ -51,6 +52,15 @@ class DacastVODIE(DacastBaseIE):
'thumbnail': 'https://universe-files.dacast.com/26137208-5858-65c1-5e9a-9d6b6bd2b6c2',
},
'params': {'skip_download': 'm3u8'},
}, { # /uspaes/ in hls_url
'url': 'https://iframe.dacast.com/vod/f9823fc6-faba-b98f-0d00-4a7b50a58c5b/348c5c84-b6af-4859-bb9d-1d01009c795b',
'info_dict': {
'id': '348c5c84-b6af-4859-bb9d-1d01009c795b',
'ext': 'mp4',
'title': 'pl1-edyta-rubas-211124.mp4',
'uploader_id': 'f9823fc6-faba-b98f-0d00-4a7b50a58c5b',
'thumbnail': 'https://universe-files.dacast.com/4d0bd042-a536-752d-fc34-ad2fa44bbcbb.png',
},
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.dacast.com/support/knowledgebase/how-can-i-embed-a-video-on-my-website/',
@ -74,6 +84,15 @@ class DacastVODIE(DacastBaseIE):
'params': {'skip_download': 'm3u8'},
}]
@functools.cached_property
def _usp_signing_secret(self):
player_js = self._download_webpage(
'https://player.dacast.com/js/player.js', None, 'Downloading player JS')
# Rotates every so often, but hardcode a fallback in case of JS change/breakage before rotation
return self._search_regex(
r'\bUSP_SIGNING_SECRET\s*=\s*(["\'])(?P<secret>(?:(?!\1).)+)', player_js,
'usp signing secret', group='secret', fatal=False) or 'odnInCGqhvtyRTtIiddxtuRtawYYICZP'
def _real_extract(self, url):
user_id, video_id = self._match_valid_url(url).group('user_id', 'id')
query = {'contentId': f'{user_id}-vod-{video_id}', 'provider': 'universe'}
@ -94,10 +113,10 @@ class DacastVODIE(DacastBaseIE):
if 'DRM_EXT' in hls_url:
self.report_drm(video_id)
elif '/uspaes/' in hls_url:
# From https://player.dacast.com/js/player.js
# Ref: https://player.dacast.com/js/player.js
ts = int(time.time())
signature = hashlib.sha1(
f'{10413792000 - ts}{ts}YfaKtquEEpDeusCKbvYszIEZnWmBcSvw').digest().hex()
f'{10413792000 - ts}{ts}{self._usp_signing_secret}'.encode()).digest().hex()
hls_aes['uri'] = f'https://keys.dacast.com/uspaes/{video_id}.key?s={signature}&ts={ts}'
for retry in self.RetryManager():

View file

@ -48,32 +48,30 @@ class DropboxIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
fn = urllib.parse.unquote(url_basename(url))
title = os.path.splitext(fn)[0]
password = self.get_param('videopassword')
content_id = None
for part in self._yield_decoded_parts(webpage):
if '/sm/password' in part:
webpage = self._download_webpage(
update_url('https://www.dropbox.com/sm/password', query=part.partition('?')[2]), video_id)
content_id = self._search_regex(r'content_id=([\w.+=/-]+)', part, 'content ID')
break
if (self._og_search_title(webpage, default=None) == 'Dropbox - Password Required'
or 'Enter the password for this link' in webpage):
if password:
response = self._download_json(
'https://www.dropbox.com/sm/auth', video_id, 'POSTing video password',
headers={'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'},
data=urlencode_postdata({
'is_xhr': 'true',
't': self._get_cookies('https://www.dropbox.com')['t'].value,
'content_id': self._search_regex(r'content_id=([\w.+=/-]+)["\']', webpage, 'content id'),
'password': password,
'url': url,
}))
if response.get('status') != 'authed':
raise ExtractorError('Invalid password', expected=True)
elif not self._get_cookies('https://dropbox.com').get('sm_auth'):
if content_id:
password = self.get_param('videopassword')
if not password:
raise ExtractorError('Password protected video, use --video-password <password>', expected=True)
response = self._download_json(
'https://www.dropbox.com/sm/auth', video_id, 'POSTing video password',
data=urlencode_postdata({
'is_xhr': 'true',
't': self._get_cookies('https://www.dropbox.com')['t'].value,
'content_id': content_id,
'password': password,
'url': update_url(url, scheme='', netloc=''),
}))
if response.get('status') != 'authed':
raise ExtractorError('Invalid password', expected=True)
webpage = self._download_webpage(url, video_id)
formats, subtitles = [], {}

View file

@ -135,7 +135,7 @@ class DropoutIE(InfoExtractor):
self.raise_login_required(method='any')
raise ExtractorError(login_err, expected=True)
embed_url = self._search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
embed_url = self._html_search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
thumbnail = self._og_search_thumbnail(webpage)
watch_info = get_element_by_id('watch-info', webpage) or ''

View file

@ -0,0 +1,51 @@
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import url_or_none
from ..utils.traversal import traverse_obj
class DrTalksIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?drtalks\.com/videos/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://drtalks.com/videos/six-pillars-of-resilience-tools-for-managing-stress-and-flourishing/',
'info_dict': {
'id': '6366193757112',
'ext': 'mp4',
'uploader_id': '6314452011001',
'tags': ['resilience'],
'description': 'md5:9c6805aee237ee6de8052461855b9dda',
'timestamp': 1734546659,
'thumbnail': 'https://drtalks.com/wp-content/uploads/2024/12/Episode-82-Eva-Selhub-DrTalks-Thumbs.jpg',
'title': 'Six Pillars of Resilience: Tools for Managing Stress and Flourishing',
'duration': 2800.682,
'upload_date': '20241218',
},
}, {
'url': 'https://drtalks.com/videos/the-pcos-puzzle-mastering-metabolic-health-with-marcelle-pick/',
'info_dict': {
'id': '6364699891112',
'ext': 'mp4',
'title': 'The PCOS Puzzle: Mastering Metabolic Health with Marcelle Pick',
'description': 'md5:e87cbe00ca50135d5702787fc4043aaa',
'thumbnail': 'https://drtalks.com/wp-content/uploads/2024/11/Episode-34-Marcelle-Pick-OBGYN-NP-DrTalks.jpg',
'duration': 3515.2,
'tags': ['pcos'],
'upload_date': '20241114',
'timestamp': 1731592119,
'uploader_id': '6314452011001',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
next_data = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['data']['video']
return self.url_result(
next_data['videos']['brightcoveVideoLink'], BrightcoveNewIE, video_id,
url_transparent=True,
**traverse_obj(next_data, {
'title': ('title', {str}),
'description': ('videos', 'summury', {str}),
'thumbnail': ('featuredImage', 'node', 'sourceUrl', {url_or_none}),
}))

View file

@ -5,15 +5,16 @@ from ..utils import (
get_element_text_and_html_by_tag,
int_or_none,
join_nonempty,
parse_qs,
str_or_none,
try_call,
unified_timestamp,
)
from ..utils.traversal import traverse_obj
from ..utils.traversal import traverse_obj, value
class DuoplayIE(InfoExtractor):
_VALID_URL = r'https?://duoplay\.ee/(?P<id>\d+)/[\w-]+/?(?:\?(?:[^#]+&)?ep=(?P<ep>\d+))?'
_VALID_URL = r'https?://duoplay\.ee/(?P<id>\d+)(?:[/?#]|$)'
_TESTS = [{
'note': 'Siberi võmm S02E12',
'url': 'https://duoplay.ee/4312/siberi-vomm?ep=24',
@ -34,15 +35,16 @@ class DuoplayIE(InfoExtractor):
'episode_number': 12,
'episode_id': '24',
},
'skip': 'No video found',
}, {
'note': 'Empty title',
'url': 'https://duoplay.ee/17/uhikarotid?ep=14',
'md5': '6aca68be71112314738dd17cced7f8bf',
'md5': 'cba9f5dabf2582b224d80ac44fb80e47',
'info_dict': {
'id': '17_14',
'ext': 'mp4',
'title': 'Ühikarotid',
'thumbnail': r're:https://.+\.jpg(?:\?c=\d+)?$',
'title': 'Episode 14',
'thumbnail': r're:https?://.+\.jpg',
'description': 'md5:4719b418e058c209def41d48b601276e',
'upload_date': '20100916',
'timestamp': 1284661800,
@ -52,6 +54,8 @@ class DuoplayIE(InfoExtractor):
'season_number': 2,
'episode_id': '14',
'release_year': 2010,
'episode': 'Episode 14',
'episode_number': 14,
},
}, {
'note': 'Movie without expiry',
@ -68,10 +72,32 @@ class DuoplayIE(InfoExtractor):
'timestamp': 1671054000,
'release_year': 2018,
},
'skip': 'No video found',
}, {
'note': 'Episode url without show name',
'url': 'https://duoplay.ee/9644?ep=185',
'md5': '63f324b4fe2dbd8194dca16a6d52184a',
'info_dict': {
'id': '9644_185',
'ext': 'mp4',
'title': 'Episode 185',
'thumbnail': r're:https?://.+\.jpg',
'description': 'md5:ed25ba4e9e5d54bc291a4a0cdd241467',
'upload_date': '20241120',
'timestamp': 1732077000,
'episode': 'Episode 63',
'episode_id': '185',
'episode_number': 63,
'season': 'Season 2',
'season_number': 2,
'series': 'Telehommik',
'series_id': '9644',
},
}]
def _real_extract(self, url):
telecast_id, episode = self._match_valid_url(url).group('id', 'ep')
telecast_id = self._match_id(url)
episode = traverse_obj(parse_qs(url), ('ep', 0, {int_or_none}, {str_or_none}))
video_id = join_nonempty(telecast_id, episode, delim='_')
webpage = self._download_webpage(url, video_id)
video_player = try_call(lambda: extract_attributes(
@ -79,25 +105,33 @@ class DuoplayIE(InfoExtractor):
if not video_player or not video_player.get('manifest-url'):
raise ExtractorError('No video found', expected=True)
manifest_url = video_player['manifest-url']
session_token = self._download_json(
'https://sts.postimees.ee/session/register', video_id, 'Registering session',
'Unable to register session', headers={
'Accept': 'application/json',
'X-Original-URI': manifest_url,
})['session']
episode_attr = self._parse_json(video_player.get(':episode') or '', video_id, fatal=False) or {}
return {
'id': video_id,
'formats': self._extract_m3u8_formats(video_player['manifest-url'], video_id, 'mp4'),
'formats': self._extract_m3u8_formats(manifest_url, video_id, 'mp4', query={'s': session_token}),
**traverse_obj(episode_attr, {
'title': 'title',
'description': 'synopsis',
'title': ('title', {str}),
'description': ('synopsis', {str}),
'thumbnail': ('images', 'original'),
'timestamp': ('airtime', {lambda x: unified_timestamp(x + ' +0200')}),
'cast': ('cast', {lambda x: x.split(', ')}),
'cast': ('cast', filter, {lambda x: x.split(', ')}),
'release_year': ('year', {int_or_none}),
}),
**(traverse_obj(episode_attr, {
'title': (None, ('subtitle', ('episode_nr', {lambda x: f'Episode {x}' if x else None}))),
'series': 'title',
'title': (None, (('subtitle', {str}, filter), {value(f'Episode {episode}' if episode else None)})),
'series': ('title', {str}),
'series_id': ('telecast_id', {str_or_none}),
'season_number': ('season_id', {int_or_none}),
'episode': 'subtitle',
'episode': ('subtitle', {str}, filter),
'episode_number': ('episode_nr', {int_or_none}),
'episode_id': ('episode_id', {str_or_none}),
}, get_all=False) if episode_attr.get('category') != 'movies' else {}),

View file

@ -162,7 +162,7 @@ class DVTVIE(InfoExtractor):
items = re.findall(r'(?s)playlist\.push\(({.+?})\);', webpage)
if items:
return self.playlist_result(
[self._parse_video_metadata(i, video_id, timestamp) for i in items],
(self._parse_video_metadata(i, video_id, timestamp) for i in items),
video_id, self._html_search_meta('twitter:title', webpage))
item = self._search_regex(

155
yt_dlp/extractor/eggs.py Normal file
View file

@ -0,0 +1,155 @@
import secrets
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
int_or_none,
parse_iso8601,
str_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class EggsBaseIE(InfoExtractor):
_API_HEADERS = {
'Accept': '*/*',
'apVersion': '8.2.00',
'deviceName': 'Android',
}
def _real_initialize(self):
self._API_HEADERS['deviceId'] = secrets.token_hex(8)
def _call_api(self, endpoint, video_id):
return self._download_json(
f'https://app-front-api.eggs.mu/v1/{endpoint}', video_id,
headers=self._API_HEADERS)
def _extract_music_info(self, data):
if yt_url := traverse_obj(data, ('youtubeUrl', {url_or_none})):
return self.url_result(yt_url, ie=YoutubeIE)
artist_name = traverse_obj(data, ('artist', 'artistName', {str_or_none}))
music_id = traverse_obj(data, ('musicId', {str_or_none}))
webpage_url = None
if artist_name and music_id:
webpage_url = f'https://eggs.mu/artist/{artist_name}/song/{music_id}'
return {
'id': music_id,
'vcodec': 'none',
'webpage_url': webpage_url,
'extractor_key': EggsIE.ie_key(),
'extractor': EggsIE.IE_NAME,
**traverse_obj(data, {
'title': ('musicTitle', {str}),
'url': ('musicDataPath', {url_or_none}),
'uploader': ('artist', 'displayName', {str}),
'uploader_id': ('artist', 'artistId', {str_or_none}),
'thumbnail': ('imageDataPath', {url_or_none}),
'view_count': ('numberOfMusicPlays', {int_or_none}),
'like_count': ('numberOfLikes', {int_or_none}),
'comment_count': ('numberOfComments', {int_or_none}),
'composers': ('composer', {str}, all),
'tags': ('tags', ..., {str}),
'timestamp': ('releaseDate', {parse_iso8601}),
'artist': ('artist', 'displayName', {str}),
})}
class EggsIE(EggsBaseIE):
IE_NAME = 'eggs:single'
_VALID_URL = r'https?://eggs\.mu/artist/[^/?#]+/song/(?P<id>[\da-f-]+)'
_TESTS = [{
'url': 'https://eggs.mu/artist/32_sunny_girl/song/0e95fd1d-4d61-4d5b-8b18-6092c551da90',
'info_dict': {
'id': '0e95fd1d-4d61-4d5b-8b18-6092c551da90',
'ext': 'm4a',
'title': 'シネマと信号',
'uploader': 'Sunny Girl',
'thumbnail': r're:https?://.*\.jpg(?:\?.*)?$',
'uploader_id': '1607',
'like_count': int,
'timestamp': 1731327327,
'composers': ['橘高連太郎'],
'view_count': int,
'comment_count': int,
'artists': ['Sunny Girl'],
'upload_date': '20241111',
'tags': ['SunnyGirl', 'シネマと信号'],
},
}, {
'url': 'https://eggs.mu/artist/KAMO_3pband/song/1d4bc45f-1af6-47a9-8b30-a70cae350b4f',
'info_dict': {
'id': '80cLKA2wnoA',
'ext': 'mp4',
'title': 'KAMO「いい女だから」Audio',
'uploader': 'KAMO',
'live_status': 'not_live',
'channel_id': 'UCsHLBw2__5Q9y55skXPotOg',
'channel_follower_count': int,
'description': 'md5:d260da711ecbec3e720293dc11401b87',
'availability': 'public',
'uploader_id': '@KAMO_band',
'upload_date': '20240925',
'thumbnail': 'https://i.ytimg.com/vi/80cLKA2wnoA/maxresdefault.jpg',
'comment_count': int,
'channel_url': 'https://www.youtube.com/channel/UCsHLBw2__5Q9y55skXPotOg',
'view_count': int,
'duration': 151,
'like_count': int,
'channel': 'KAMO',
'playable_in_embed': True,
'uploader_url': 'https://www.youtube.com/@KAMO_band',
'tags': [],
'timestamp': 1727271121,
'age_limit': 0,
'categories': ['People & Blogs'],
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
song_id = self._match_id(url)
json_data = self._call_api(f'musics/{song_id}', song_id)
return self._extract_music_info(json_data)
class EggsArtistIE(EggsBaseIE):
IE_NAME = 'eggs:artist'
_VALID_URL = r'https?://eggs\.mu/artist/(?P<id>\w+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://eggs.mu/artist/32_sunny_girl',
'info_dict': {
'id': '32_sunny_girl',
'thumbnail': 'https://image-pro.eggs.mu/profile/1607.jpeg?updated_at=2024-04-03T20%3A06%3A00%2B09%3A00',
'description': 'Muddy Mine / 東京高田馬場CLUB PHASE / Gt.Vo 橘高 連太郎 / Ba.Cho 小野 ゆうき / Dr 大森 りゅうひこ',
'title': 'Sunny Girl',
},
'playlist_mincount': 18,
}, {
'url': 'https://eggs.mu/artist/KAMO_3pband',
'info_dict': {
'id': 'KAMO_3pband',
'description': '川崎発3ピースバンド',
'thumbnail': 'https://image-pro.eggs.mu/profile/35217.jpeg?updated_at=2024-11-27T16%3A31%3A50%2B09%3A00',
'title': 'KAMO',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
artist_id = self._match_id(url)
artist_data = self._call_api(f'artists/{artist_id}', artist_id)
song_data = self._call_api(f'artists/{artist_id}/musics', artist_id)
return self.playlist_result(
traverse_obj(song_data, ('data', ..., {dict}, {self._extract_music_info})),
playlist_id=artist_id, **traverse_obj(artist_data, {
'title': ('displayName', {str}),
'description': ('profile', {str}),
'thumbnail': ('imageDataPath', {url_or_none}),
}))

View file

@ -12,7 +12,7 @@ from ..utils import (
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?:sport)?1tv\.ru/(?:[^/?#]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# single format
@ -52,6 +52,9 @@ class FirstTVIE(InfoExtractor):
}, {
'url': 'http://www.1tv.ru/shows/tochvtoch-supersezon/vystupleniya/evgeniy-dyatlov-vladimir-vysockiy-koni-priveredlivye-toch-v-toch-supersezon-fragment-vypuska-ot-06-11-2016',
'only_matching': True,
}, {
'url': 'https://www.sport1tv.ru/sport/chempionat-rossii-po-figurnomu-kataniyu-2025',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -1,349 +0,0 @@
import random
import re
import string
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
join_nonempty,
js_to_json,
make_archive_id,
orderedSet,
qualities,
str_or_none,
traverse_obj,
try_get,
urlencode_postdata,
)
class FunimationBaseIE(InfoExtractor):
_NETRC_MACHINE = 'funimation'
_REGION = None
_TOKEN = None
def _get_region(self):
region_cookie = self._get_cookies('https://www.funimation.com').get('region')
region = region_cookie.value if region_cookie else self.get_param('geo_bypass_country')
return region or traverse_obj(
self._download_json(
'https://geo-service.prd.funimationsvc.com/geo/v1/region/check', None, fatal=False,
note='Checking geo-location', errnote='Unable to fetch geo-location information'),
'region') or 'US'
def _perform_login(self, username, password):
if self._TOKEN:
return
try:
data = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/auth/login/',
None, 'Logging in', data=urlencode_postdata({
'username': username,
'password': password,
}))
FunimationBaseIE._TOKEN = data['token']
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
error = self._parse_json(e.cause.response.read().decode(), None)['error']
raise ExtractorError(error, expected=True)
raise
class FunimationPageIE(FunimationBaseIE):
IE_NAME = 'funimation:page'
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:(?P<lang>[^/]+)/)?(?:shows|v)/(?P<show>[^/]+)/(?P<episode>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funimation.com/shows/attack-on-titan-junior-high/broadcast-dub-preview/',
'info_dict': {
'id': '210050',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
# Other metadata is tested in FunimationIE
},
'params': {
'skip_download': 'm3u8',
},
'add_ie': ['Funimation'],
}, {
# Not available in US
'url': 'https://www.funimation.com/shows/hacksign/role-play/',
'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}, {
'url': 'https://www.funimation.com/v/a-certain-scientific-railgun/super-powered-level-5',
'only_matching': True,
}]
def _real_initialize(self):
if not self._REGION:
FunimationBaseIE._REGION = self._get_region()
def _real_extract(self, url):
locale, show, episode = self._match_valid_url(url).group('lang', 'show', 'episode')
video_id = traverse_obj(self._download_json(
f'https://title-api.prd.funimationsvc.com/v1/shows/{show}/episodes/{episode}',
f'{show}_{episode}', query={
'deviceType': 'web',
'region': self._REGION,
'locale': locale or 'en',
}), ('videoList', ..., 'id'), get_all=False)
return self.url_result(f'https://www.funimation.com/player/{video_id}', FunimationIE.ie_key(), video_id)
class FunimationIE(FunimationBaseIE):
_VALID_URL = r'https?://(?:www\.)?funimation\.com/player/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210050',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 155,
},
'params': {
'skip_download': 'm3u8',
},
}, {
'note': 'player_id should be extracted with the relevent compat-opt',
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210051',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 155,
},
'params': {
'skip_download': 'm3u8',
'compat_opts': ['seperate-video-versions'],
},
}]
@staticmethod
def _get_experiences(episode):
for lang, lang_data in episode.get('languages', {}).items():
for video_data in lang_data.values():
for version, f in video_data.items():
yield lang, version.title(), f
def _get_episode(self, webpage, experience_id=None, episode_id=None, fatal=True):
""" Extract the episode, season and show objects given either episode/experience id """
show = self._parse_json(
self._search_regex(
r'show\s*=\s*({.+?})\s*;', webpage, 'show data', fatal=fatal),
experience_id, transform_source=js_to_json, fatal=fatal) or []
for season in show.get('seasons', []):
for episode in season.get('episodes', []):
if episode_id is not None:
if str(episode.get('episodePk')) == episode_id:
return episode, season, show
continue
for _, _, f in self._get_experiences(episode):
if f.get('experienceId') == experience_id:
return episode, season, show
if fatal:
raise ExtractorError('Unable to find episode information')
else:
self.report_warning('Unable to find episode information')
return {}, {}, {}
def _real_extract(self, url):
initial_experience_id = self._match_id(url)
webpage = self._download_webpage(
url, initial_experience_id, note=f'Downloading player webpage for {initial_experience_id}')
episode, season, show = self._get_episode(webpage, experience_id=int(initial_experience_id))
episode_id = str(episode['episodePk'])
display_id = episode.get('slug') or episode_id
formats, subtitles, thumbnails, duration = [], {}, [], 0
requested_languages, requested_versions = self._configuration_arg('language'), self._configuration_arg('version')
language_preference = qualities((requested_languages or [''])[::-1])
source_preference = qualities((requested_versions or ['uncut', 'simulcast'])[::-1])
only_initial_experience = 'seperate-video-versions' in self.get_param('compat_opts', [])
for lang, version, fmt in self._get_experiences(episode):
experience_id = str(fmt['experienceId'])
if (only_initial_experience and experience_id != initial_experience_id
or requested_languages and lang.lower() not in requested_languages
or requested_versions and version.lower() not in requested_versions):
continue
thumbnails.append({'url': fmt.get('poster')})
duration = max(duration, fmt.get('duration', 0))
format_name = f'{version} {lang} ({experience_id})'
self.extract_subtitles(
subtitles, experience_id, display_id=display_id, format_name=format_name,
episode=episode if experience_id == initial_experience_id else episode_id)
headers = {}
if self._TOKEN:
headers['Authorization'] = f'Token {self._TOKEN}'
page = self._download_json(
f'https://www.funimation.com/api/showexperience/{experience_id}/',
display_id, headers=headers, expected_status=403, query={
'pinst_id': ''.join(random.choices(string.digits + string.ascii_letters, k=8)),
}, note=f'Downloading {format_name} JSON')
sources = page.get('items') or []
if not sources:
error = try_get(page, lambda x: x['errors'][0], dict)
if error:
self.report_warning('{} said: Error {} - {}'.format(
self.IE_NAME, error.get('code'), error.get('detail') or error.get('title')))
else:
self.report_warning('No sources found for format')
current_formats = []
for source in sources:
source_url = source.get('src')
source_type = source.get('videoType') or determine_ext(source_url)
if source_type == 'm3u8':
current_formats.extend(self._extract_m3u8_formats(
source_url, display_id, 'mp4', m3u8_id='{}-{}'.format(experience_id, 'hls'), fatal=False,
note=f'Downloading {format_name} m3u8 information'))
else:
current_formats.append({
'format_id': f'{experience_id}-{source_type}',
'url': source_url,
})
for f in current_formats:
# TODO: Convert language to code
f.update({
'language': lang,
'format_note': version,
'source_preference': source_preference(version.lower()),
'language_preference': language_preference(lang.lower()),
})
formats.extend(current_formats)
if not formats and (requested_languages or requested_versions):
self.raise_no_formats(
'There are no video formats matching the requested languages/versions', expected=True, video_id=display_id)
self._remove_duplicate_formats(formats)
return {
'id': episode_id,
'_old_archive_ids': [make_archive_id(self, initial_experience_id)],
'display_id': display_id,
'duration': duration,
'title': episode['episodeTitle'],
'description': episode.get('episodeSummary'),
'episode': episode.get('episodeTitle'),
'episode_number': int_or_none(episode.get('episodeId')),
'episode_id': episode_id,
'season': season.get('seasonTitle'),
'season_number': int_or_none(season.get('seasonId')),
'season_id': str_or_none(season.get('seasonPk')),
'series': show.get('showTitle'),
'formats': formats,
'thumbnails': thumbnails,
'subtitles': subtitles,
'_format_sort_fields': ('lang', 'source'),
}
def _get_subtitles(self, subtitles, experience_id, episode, display_id, format_name):
if isinstance(episode, str):
webpage = self._download_webpage(
f'https://www.funimation.com/player/{experience_id}/', display_id,
fatal=False, note=f'Downloading player webpage for {format_name}')
episode, _, _ = self._get_episode(webpage, episode_id=episode, fatal=False)
for _, version, f in self._get_experiences(episode):
for source in f.get('sources'):
for text_track in source.get('textTracks'):
if not text_track.get('src'):
continue
sub_type = text_track.get('type').upper()
sub_type = sub_type if sub_type != 'FULL' else None
current_sub = {
'url': text_track['src'],
'name': join_nonempty(version, text_track.get('label'), sub_type, delim=' '),
}
lang = join_nonempty(text_track.get('language', 'und'),
version if version != 'Simulcast' else None,
sub_type, delim='_')
if current_sub not in subtitles.get(lang, []):
subtitles.setdefault(lang, []).append(current_sub)
return subtitles
class FunimationShowIE(FunimationBaseIE):
IE_NAME = 'funimation:show'
_VALID_URL = r'(?P<url>https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?P<locale>[^/]+)?/?shows/(?P<id>[^/?#&]+))/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.funimation.com/en/shows/sk8-the-infinity',
'info_dict': {
'id': '1315000',
'title': 'SK8 the Infinity',
},
'playlist_count': 13,
'params': {
'skip_download': True,
},
}, {
# without lang code
'url': 'https://www.funimation.com/shows/ouran-high-school-host-club/',
'info_dict': {
'id': '39643',
'title': 'Ouran High School Host Club',
},
'playlist_count': 26,
'params': {
'skip_download': True,
},
}]
def _real_initialize(self):
if not self._REGION:
FunimationBaseIE._REGION = self._get_region()
def _real_extract(self, url):
base_url, locale, display_id = self._match_valid_url(url).groups()
show_info = self._download_json(
'https://title-api.prd.funimationsvc.com/v2/shows/{}?region={}&deviceType=web&locale={}'.format(
display_id, self._REGION, locale or 'en'), display_id)
items_info = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/funimation/episodes/?limit=99999&title_id={}'.format(
show_info.get('id')), display_id)
vod_items = traverse_obj(items_info, ('items', ..., lambda k, _: re.match(r'(?i)mostRecent[AS]vod', k), 'item'))
return {
'_type': 'playlist',
'id': str_or_none(show_info['id']),
'title': show_info['name'],
'entries': orderedSet(
self.url_result(
'{}/{}'.format(base_url, vod_item.get('episodeSlug')), FunimationPageIE.ie_key(),
vod_item.get('episodeId'), vod_item.get('episodeName'))
for vod_item in sorted(vod_items, key=lambda x: x.get('episodeOrder', -1))),
}

View file

@ -1,40 +1,48 @@
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
str_or_none,
traverse_obj,
url_or_none,
)
class GoodGameIE(InfoExtractor):
IE_NAME = 'goodgame:stream'
_VALID_URL = r'https?://goodgame\.ru/channel/(?P<id>\w+)'
_VALID_URL = r'https?://goodgame\.ru/(?!channel/)(?P<id>[\w.*-]+)'
_TESTS = [{
'url': 'https://goodgame.ru/channel/Pomi/#autoplay',
'url': 'https://goodgame.ru/TGW#autoplay',
'info_dict': {
'id': 'pomi',
'id': '7998',
'ext': 'mp4',
'title': r're:Reynor vs Special \(1/2,bo3\) Wardi Spring EU \- playoff \(финальный день\) \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'channel_id': '1644',
'channel': 'Pomi',
'channel_url': 'https://goodgame.ru/channel/Pomi/',
'description': 'md5:4a87b775ee7b2b57bdccebe285bbe171',
'thumbnail': r're:^https?://.*\.jpg$',
'channel_id': '7998',
'title': r're:шоуматч Happy \(NE\) vs Fortitude \(UD\), потом ладдер и дс \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'channel_url': 'https://goodgame.ru/TGW',
'thumbnail': 'https://hls.goodgame.ru/previews/7998_240.jpg',
'uploader': 'TGW',
'channel': 'JosephStalin',
'live_status': 'is_live',
'view_count': int,
'age_limit': 18,
'channel_follower_count': int,
'uploader_id': '2899',
'concurrent_view_count': int,
},
'params': {'skip_download': 'm3u8'},
'skip': 'May not be online',
}, {
'url': 'https://goodgame.ru/Mr.Gray',
'only_matching': True,
}, {
'url': 'https://goodgame.ru/HeDoPa3yMeHue*',
'only_matching': True,
}]
def _real_extract(self, url):
channel_name = self._match_id(url)
response = self._download_json(f'https://api2.goodgame.ru/v2/streams/{channel_name}', channel_name)
player_id = response['channel']['gg_player_src']
response = self._download_json(f'https://goodgame.ru/api/4/users/{channel_name}/stream', channel_name)
player_id = response['streamkey']
formats, subtitles = [], {}
if response.get('status') == 'Live':
if response.get('status'):
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://hls.goodgame.ru/manifest/{player_id}_master.m3u8',
channel_name, 'mp4', live=True)
@ -45,13 +53,17 @@ class GoodGameIE(InfoExtractor):
'id': player_id,
'formats': formats,
'subtitles': subtitles,
'title': traverse_obj(response, ('channel', 'title')),
'channel': channel_name,
'channel_id': str_or_none(traverse_obj(response, ('channel', 'id'))),
'channel_url': response.get('url'),
'description': clean_html(traverse_obj(response, ('channel', 'description'))),
'thumbnail': traverse_obj(response, ('channel', 'thumb')),
'is_live': bool(formats),
'view_count': int_or_none(response.get('viewers')),
'age_limit': 18 if traverse_obj(response, ('channel', 'adult')) else None,
**traverse_obj(response, {
'title': ('title', {str}),
'channel': ('channelkey', {str}),
'channel_id': ('id', {str_or_none}),
'channel_url': ('link', {url_or_none}),
'uploader': ('streamer', 'username', {str}),
'uploader_id': ('streamer', 'id', {str_or_none}),
'thumbnail': ('preview', {url_or_none}, {self._proto_relative_url}),
'concurrent_view_count': ('viewers', {int_or_none}),
'channel_follower_count': ('followers', {int_or_none}),
'age_limit': ('adult', {bool}, {lambda x: 18 if x else None}),
}),
}

View file

@ -254,7 +254,7 @@ class InstagramIOSIE(InfoExtractor):
class InstagramIE(InstagramBaseIE):
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com(?:/[^/]+)?/(?:p|tv|reels?(?!/audio/))/(?P<id>[^/?#&]+))'
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com(?:/(?!share/)[^/?#]+)?/(?:p|tv|reels?(?!/audio/))/(?P<id>[^/?#&]+))'
_EMBED_REGEX = [r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1']
_TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',

View file

@ -39,7 +39,7 @@ class LaracastsBaseIE(InfoExtractor):
'description': ('body', {clean_html}),
'thumbnail': ('largeThumbnail', {url_or_none}),
'duration': ('length', {int_or_none}),
'date': ('dateSegments', 'published', {unified_strdate}),
'upload_date': ('dateSegments', 'published', {unified_strdate}),
}))
@ -54,7 +54,7 @@ class LaracastsIE(LaracastsBaseIE):
'title': 'Hello, Laravel',
'ext': 'mp4',
'duration': 519,
'date': '20240312',
'upload_date': '20240312',
'thumbnail': 'https://laracasts.s3.amazonaws.com/videos/thumbnails/youtube/30-days-to-learn-laravel-11-1.png',
'description': 'md5:ddd658bb241975871d236555657e1dd1',
'season_number': 1,

View file

@ -310,7 +310,13 @@ class LBRYIE(LBRYBaseIE):
if stream_type in self._SUPPORTED_STREAM_TYPES:
claim_id, is_live = result['claim_id'], False
streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
'get', claim_id, {
'uri': uri,
**traverse_obj(parse_qs(url), {
'signature': ('signature', 0),
'signature_ts': ('signature_ts', 0),
}),
}, 'streaming url')['streaming_url']
# GET request to v3 API returns original video/audio file if available
direct_url = re.sub(r'/api/v\d+/', '/api/v3/', streaming_url)

View file

@ -26,6 +26,7 @@ class MicrosoftEmbedIE(InfoExtractor):
'timestamp': 1631658316,
'upload_date': '20210914',
},
'expected_warnings': ['Failed to parse XML: syntax error: line 1, column 0'],
}]
_API_URL = 'https://prod-video-cms-rt-microsoft-com.akamaized.net/vhs/api/videos/'
@ -36,11 +37,11 @@ class MicrosoftEmbedIE(InfoExtractor):
formats = []
for source_type, source in metadata['streams'].items():
if source_type == 'smooth_Streaming':
formats.extend(self._extract_ism_formats(source['url'], video_id, 'mss'))
formats.extend(self._extract_ism_formats(source['url'], video_id, 'mss', fatal=False))
elif source_type == 'apple_HTTP_Live_Streaming':
formats.extend(self._extract_m3u8_formats(source['url'], video_id, 'mp4'))
formats.extend(self._extract_m3u8_formats(source['url'], video_id, 'mp4', fatal=False))
elif source_type == 'mPEG_DASH':
formats.extend(self._extract_mpd_formats(source['url'], video_id))
formats.extend(self._extract_mpd_formats(source['url'], video_id, fatal=False))
else:
formats.append({
'format_id': source_type,

View file

@ -80,9 +80,9 @@ class MiTeleIE(TelecincoBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
pre_player = self._parse_json(self._search_regex(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
webpage, 'Pre Player'), display_id)['prePlayer']
pre_player = self._search_json(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=',
webpage, 'Pre Player', display_id)['prePlayer']
title = pre_player['title']
video_info = self._parse_content(pre_player['video'], url)
content = pre_player.get('content') or {}

View file

@ -72,6 +72,7 @@ class NaverBaseIE(InfoExtractor):
'abr': int_or_none(bitrate.get('audio')),
'filesize': int_or_none(stream.get('size')),
'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
'extra_param_to_segment_url': urllib.parse.urlencode(query, doseq=True) if stream_type == 'HLS' else None,
})
extract_formats(get_list('video'), 'H264')
@ -168,6 +169,26 @@ class NaverIE(NaverBaseIE):
'duration': 277,
'thumbnail': r're:^https?://.*\.jpg',
},
}, {
'url': 'https://tv.naver.com/v/67838091',
'md5': '126ea384ab033bca59672c12cca7a6be',
'info_dict': {
'id': '67838091',
'ext': 'mp4',
'title': '[라인W 날씨] 내일 아침 서울 체감 -19도…호남·충남 대설',
'description': 'md5:fe026e25634c85845698aed4b59db5a7',
'timestamp': 1736347853,
'upload_date': '20250108',
'uploader': 'KBS뉴스',
'uploader_id': 'kbsnews',
'uploader_url': 'https://tv.naver.com/kbsnews',
'view_count': int,
'like_count': int,
'comment_count': int,
'duration': 69,
'thumbnail': r're:^https?://.*\.jpg',
},
'params': {'format': 'HLS_144P'},
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,

117
yt_dlp/extractor/nest.py Normal file
View file

@ -0,0 +1,117 @@
from .common import InfoExtractor
from ..utils import ExtractorError, float_or_none, update_url_query, url_or_none
from ..utils.traversal import traverse_obj
class NestIE(InfoExtractor):
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?live/(?P<id>\w+)'
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://video.nest.com/embedded/live/4fvYdSo8AX?autoplay=0',
'info_dict': {
'id': '4fvYdSo8AX',
'ext': 'mp4',
'title': 'startswith:Outside ',
'alt_title': 'Outside',
'description': '<null>',
'location': 'Los Angeles',
'availability': 'public',
'thumbnail': r're:https?://',
'live_status': 'is_live',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://video.nest.com/live/4fvYdSo8AX',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.pacificblue.biz/noyo-harbor-webcam/',
'info_dict': {
'id': '4fvYdSo8AX',
'ext': 'mp4',
'title': 'startswith:Outside ',
'alt_title': 'Outside',
'description': '<null>',
'location': 'Los Angeles',
'availability': 'public',
'thumbnail': r're:https?://',
'live_status': 'is_live',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
item = self._download_json(
'https://video.nest.com/api/dropcam/cameras.get_by_public_token',
video_id, query={'token': video_id})['items'][0]
uuid = item.get('uuid')
stream_domain = item.get('live_stream_host')
if not stream_domain or not uuid:
raise ExtractorError('Unable to construct playlist URL')
thumb_domain = item.get('nexus_api_nest_domain_host')
return {
'id': video_id,
**traverse_obj(item, {
'description': ('description', {str}),
'title': (('title', 'name', 'where'), {str}, filter, any),
'alt_title': ('name', {str}),
'location': ((('timezone', {lambda x: x.split('/')[1].replace('_', ' ')}), 'where'), {str}, filter, any),
}),
'thumbnail': update_url_query(
f'https://{thumb_domain}/get_image',
{'uuid': uuid, 'public': video_id}) if thumb_domain else None,
'availability': self._availability(is_private=item.get('is_public') is False),
'formats': self._extract_m3u8_formats(
f'https://{stream_domain}/nexus_aac/{uuid}/playlist.m3u8',
video_id, 'mp4', live=True, query={'public': video_id}),
'is_live': True,
}
class NestClipIE(InfoExtractor):
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?clip/(?P<id>\w+)'
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://video.nest.com/clip/f34c9dd237a44eca9a0001af685e3dff',
'info_dict': {
'id': 'f34c9dd237a44eca9a0001af685e3dff',
'ext': 'mp4',
'title': 'NestClip video #f34c9dd237a44eca9a0001af685e3dff',
'thumbnail': 'https://clips.dropcam.com/f34c9dd237a44eca9a0001af685e3dff.jpg',
'timestamp': 1735413474.468,
'upload_date': '20241228',
},
}, {
'url': 'https://video.nest.com/embedded/clip/34e0432adc3c46a98529443d8ad5aa76',
'info_dict': {
'id': '34e0432adc3c46a98529443d8ad5aa76',
'ext': 'mp4',
'title': 'Shootout at Veterans Boulevard at Fleur De Lis Drive',
'thumbnail': 'https://clips.dropcam.com/34e0432adc3c46a98529443d8ad5aa76.jpg',
'upload_date': '20230817',
'timestamp': 1692262897.191,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://video.nest.com/api/dropcam/videos.get_by_filename', video_id,
query={'filename': f'{video_id}.mp4'})
return {
'id': video_id,
**traverse_obj(data, ('items', 0, {
'title': ('title', {str}),
'thumbnail': ('thumbnail_url', {url_or_none}),
'url': ('download_url', {url_or_none}),
'timestamp': ('start_time', {float_or_none}),
})),
}

View file

@ -592,8 +592,8 @@ class NiconicoPlaylistBaseIE(InfoExtractor):
@staticmethod
def _parse_owner(item):
return {
'uploader': traverse_obj(item, ('owner', 'name')),
'uploader_id': traverse_obj(item, ('owner', 'id')),
'uploader': traverse_obj(item, ('owner', ('name', ('user', 'nickname')), {str}, any)),
'uploader_id': traverse_obj(item, ('owner', 'id', {str})),
}
def _fetch_page(self, list_id, page):
@ -666,7 +666,7 @@ class NiconicoPlaylistIE(NiconicoPlaylistBaseIE):
mylist.get('name'), mylist.get('description'), **self._parse_owner(mylist))
class NiconicoSeriesIE(InfoExtractor):
class NiconicoSeriesIE(NiconicoPlaylistBaseIE):
IE_NAME = 'niconico:series'
_VALID_URL = r'https?://(?:(?:www\.|sp\.)?nicovideo\.jp(?:/user/\d+)?|nico\.ms)/series/(?P<id>\d+)'
@ -675,6 +675,9 @@ class NiconicoSeriesIE(InfoExtractor):
'info_dict': {
'id': '110226',
'title': 'ご立派ァ!のシリーズ',
'description': '楽しそうな外人の吹き替えをさせたら終身名誉ホモガキの右に出る人はいませんね…',
'uploader': 'アルファるふぁ',
'uploader_id': '44113208',
},
'playlist_mincount': 10,
}, {
@ -682,6 +685,9 @@ class NiconicoSeriesIE(InfoExtractor):
'info_dict': {
'id': '12312',
'title': 'バトルスピリッツ お勧めカード紹介(調整中)',
'description': '',
'uploader': '野鳥',
'uploader_id': '2275360',
},
'playlist_mincount': 103,
}, {
@ -689,19 +695,21 @@ class NiconicoSeriesIE(InfoExtractor):
'only_matching': True,
}]
def _call_api(self, list_id, resource, query):
return self._download_json(
f'https://nvapi.nicovideo.jp/v2/series/{list_id}', list_id,
f'Downloading {resource}', query=query,
headers=self._API_HEADERS)['data']
def _real_extract(self, url):
list_id = self._match_id(url)
webpage = self._download_webpage(url, list_id)
series = self._call_api(list_id, 'list', {
'pageSize': 1,
})['detail']
title = self._search_regex(
(r'<title>「(.+)(全',
r'<div class="TwitterShareButton"\s+data-text="(.+)\s+https:'),
webpage, 'title', fatal=False)
if title:
title = unescapeHTML(title)
json_data = next(self._yield_json_ld(webpage, None, fatal=False))
return self.playlist_from_matches(
traverse_obj(json_data, ('itemListElement', ..., 'url')), list_id, title, ie=NiconicoIE)
return self.playlist_result(
self._entries(list_id), list_id,
series.get('title'), series.get('description'), **self._parse_owner(series))
class NiconicoHistoryIE(NiconicoPlaylistBaseIE):

View file

@ -12,6 +12,7 @@ from ..utils import (
parse_iso8601,
str_or_none,
try_get,
update_url_query,
url_or_none,
urljoin,
)
@ -27,6 +28,12 @@ class NRKBaseIE(InfoExtractor):
)/'''
def _extract_nrk_formats(self, asset_url, video_id):
asset_url = update_url_query(asset_url, {
# Remove 'adap' to return all streams (known values are: small, large, small_h265, large_h265)
'adap': [],
# Disable subtitles since they are fetched separately
's': 0,
})
if re.match(r'https?://[^/]+\.akamaihd\.net/i/', asset_url):
return self._extract_akamai_formats(asset_url, video_id)
asset_url = re.sub(r'(?:bw_(?:low|high)=\d+|no_audio_only)&?', '', asset_url)
@ -58,7 +65,10 @@ class NRKBaseIE(InfoExtractor):
return self._download_json(
urljoin('https://psapi.nrk.no/', path),
video_id, note or f'Downloading {item} JSON',
fatal=fatal, query=query)
fatal=fatal, query=query, headers={
# Needed for working stream URLs, see https://github.com/yt-dlp/yt-dlp/issues/12192
'Accept': 'application/vnd.nrk.psapi+json; version=9; player=tv-player; device=player-core',
})
class NRKIE(NRKBaseIE):
@ -77,13 +87,17 @@ class NRKIE(NRKBaseIE):
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533',
'md5': 'f46be075326e23ad0e524edfcb06aeb6',
'md5': '2b88a652ad2e275591e61cf550887eec',
'info_dict': {
'id': '150533',
'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 262,
'upload_date': '20140325',
'thumbnail': r're:^https?://gfx\.nrk\.no/.*$',
'timestamp': 1395751833,
'alt_title': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
},
}, {
# audio
@ -95,6 +109,10 @@ class NRKIE(NRKBaseIE):
'title': 'Slik høres internett ut når du er blind',
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
'duration': 20,
'timestamp': 1398429565,
'alt_title': 'Cathrine Lie Wathne er blind, og bruker hurtigtaster for å navigere seg rundt på ulike nettsider.',
'thumbnail': 'https://gfx.nrk.no/urxQMSXF-WnbfjBH5ke2igLGyN27EdJVWZ6FOsEAclhA',
'upload_date': '20140425',
},
}, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
@ -152,7 +170,7 @@ class NRKIE(NRKBaseIE):
return self._call_api(f'playback/{item}/{video_id}', video_id, item, query=query)
raise
# known values for preferredCdn: akamai, iponly, minicdn and telenor
# known values for preferredCdn: akamai, globalconnect and telenor
manifest = call_playback_api('manifest', {'preferredCdn': 'akamai'})
video_id = try_get(manifest, lambda x: x['id'], str) or video_id
@ -307,6 +325,13 @@ class NRKTVIE(InfoExtractor):
'ext': 'vtt',
}],
},
'upload_date': '20170627',
'timestamp': 1498591822,
'thumbnail': 'https://gfx.nrk.no/myRSc4vuFlahB60P3n6swwRTQUZI1LqJZl9B7icZFgzA',
'alt_title': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
@ -321,6 +346,13 @@ class NRKTVIE(InfoExtractor):
'series': '20 spørsmål',
'episode': '23. mai 2014',
'age_limit': 0,
'timestamp': 1584593700,
'thumbnail': 'https://gfx.nrk.no/u7uCe79SEfPVGRAGVp2_uAZnNc4mfz_kjXg6Bgek8lMQ',
'season_id': '126936',
'upload_date': '20200319',
'season': 'Season 2014',
'season_number': 2014,
'episode_number': 3,
},
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',

View file

@ -343,7 +343,7 @@ class NYTimesCookingIE(NYTimesBaseIE):
if media_ids:
media_ids.append(lead_video_id)
return self.playlist_result(
[self._extract_video(media_id) for media_id in media_ids], page_id, title, description)
map(self._extract_video, media_ids), page_id, title, description)
return {
**self._extract_video(lead_video_id),

View file

@ -63,6 +63,7 @@ class PatreonIE(PatreonBaseIE):
'info_dict': {
'id': '743933',
'ext': 'mp3',
'alt_title': 'cd166.mp3',
'title': 'Episode 166: David Smalley of Dogma Debate',
'description': 'md5:34d207dd29aa90e24f1b3f58841b81c7',
'uploader': 'Cognitive Dissonance Podcast',
@ -280,7 +281,7 @@ class PatreonIE(PatreonBaseIE):
video_id = self._match_id(url)
post = self._call_api(
f'posts/{video_id}', video_id, query={
'fields[media]': 'download_url,mimetype,size_bytes',
'fields[media]': 'download_url,mimetype,size_bytes,file_name',
'fields[post]': 'comment_count,content,embed,image,like_count,post_file,published_at,title,current_user_can_view',
'fields[user]': 'full_name,url',
'fields[post_tag]': 'value',
@ -317,6 +318,7 @@ class PatreonIE(PatreonBaseIE):
'ext': ext,
'filesize': size_bytes,
'url': download_url,
'alt_title': traverse_obj(media_attributes, ('file_name', {str})),
})
elif include_type == 'user':
@ -457,7 +459,7 @@ class PatreonCampaignIE(PatreonBaseIE):
_VALID_URL = r'''(?x)
https?://(?:www\.)?patreon\.com/(?:
(?:m|api/campaigns)/(?P<campaign_id>\d+)|
(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
(?:c/)?(?P<vanity>(?!creation[?/]|posts/|rss[?/])[\w-]+)
)(?:/posts)?/?(?:$|[?#])'''
_TESTS = [{
'url': 'https://www.patreon.com/dissonancepod/',
@ -509,6 +511,26 @@ class PatreonCampaignIE(PatreonBaseIE):
'thumbnail': r're:^https?://.*$',
},
'playlist_mincount': 201,
}, {
'url': 'https://www.patreon.com/c/OgSog',
'info_dict': {
'id': '8504388',
'title': 'OGSoG',
'description': r're:(?s)Hello and welcome to our Patreon page. We are Mari, Lasercorn, .+',
'channel': 'OGSoG',
'channel_id': '8504388',
'channel_url': 'https://www.patreon.com/OgSog',
'uploader_url': 'https://www.patreon.com/OgSog',
'uploader_id': '72323575',
'uploader': 'David Moss',
'thumbnail': r're:https?://.+/.+',
'channel_follower_count': int,
'age_limit': 0,
},
'playlist_mincount': 331,
}, {
'url': 'https://www.patreon.com/c/OgSog/posts',
'only_matching': True,
}, {
'url': 'https://www.patreon.com/dissonancepod/posts',
'only_matching': True,

View file

@ -47,7 +47,7 @@ class PBSIE(InfoExtractor):
(r'video\.kpbs\.org', 'KPBS San Diego (KPBS)'), # http://www.kpbs.org/
(r'video\.kqed\.org', 'KQED (KQED)'), # http://www.kqed.org
(r'vids\.kvie\.org', 'KVIE Public Television (KVIE)'), # http://www.kvie.org
(r'video\.pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'), # http://www.pbssocal.org/
(r'(?:video\.|www\.)pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'), # http://www.pbssocal.org/
(r'video\.valleypbs\.org', 'ValleyPBS (KVPT)'), # http://www.valleypbs.org/
(r'video\.cptv\.org', 'CONNECTICUT PUBLIC TELEVISION (WEDH)'), # http://cptv.org
(r'watch\.knpb\.org', 'KNPB Channel 5 (KNPB)'), # http://www.knpb.org/
@ -185,12 +185,13 @@ class PBSIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://
(?:
# Direct video URL
(?:{})/(?:(?:vir|port)alplayer|video)/(?P<id>[0-9]+)(?:[?/]|$) |
# Article with embedded player (or direct video)
(?:www\.)?pbs\.org/(?:[^/]+/){{1,5}}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
# Player
(?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)
# Player
(?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/?#]+) |
# Direct video URL, or article with embedded player
(?:{})/(?:
(?:(?:vir|port)alplayer|video)/(?P<id>[0-9]+)(?:[?/#]|$) |
(?:[^/?#]+/){{1,5}}(?P<presumptive_id>[^/?#]+?)(?:\.html)?/?(?:$|[?#])
)
)
'''.format('|'.join(next(zip(*_STATIONS))))
@ -403,6 +404,19 @@ class PBSIE(InfoExtractor):
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
},
{
'url': 'https://www.pbssocal.org/shows/newshour/clip/capehart-johnson-1715984001',
'info_dict': {
'id': '3091549094',
'ext': 'mp4',
'title': 'PBS NewsHour - Capehart and Johnson on the unusual Biden-Trump debate plans',
'description': 'Capehart and Johnson on how the Biden-Trump debates could shape the campaign season',
'display_id': 'capehart-johnson-1715984001',
'duration': 593,
'thumbnail': 'https://image.pbs.org/video-assets/mF3oSVn-asset-mezzanine-16x9-QeXjXPy.jpg',
'chapters': [],
},
},
{
'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
'only_matching': True,
@ -467,6 +481,7 @@ class PBSIE(InfoExtractor):
r"(?s)window\.PBS\.playerConfig\s*=\s*{.*?id\s*:\s*'([0-9]+)',",
r'<div[^>]+\bdata-cove-id=["\'](\d+)"', # http://www.pbs.org/wgbh/roadshow/watch/episode/2105-indianapolis-hour-2/
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//video\.pbs\.org/widget/partnerplayer/(\d+)', # https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/
r'\bhttps?://player\.pbs\.org/[\w-]+player/(\d+)', # last pattern to avoid false positives
]
media_id = self._search_regex(

View file

@ -0,0 +1,99 @@
from .common import InfoExtractor
from ..utils import parse_iso8601, smuggle_url, unsmuggle_url, url_or_none
from ..utils.traversal import traverse_obj
class PiramideTVIE(InfoExtractor):
_VALID_URL = r'https?://piramide\.tv/video/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://piramide.tv/video/wWtBAORdJUTh',
'info_dict': {
'id': 'wWtBAORdJUTh',
'ext': 'mp4',
'title': 'md5:79f9c8183ea6a35c836923142cf0abcc',
'description': '',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/W86PgQDn/thumbnails/B9gpIxkH.jpg',
'channel': 'León Picarón',
'channel_id': 'leonpicaron',
'timestamp': 1696460362,
'upload_date': '20231004',
},
}, {
'url': 'https://piramide.tv/video/wcYn6li79NgN',
'info_dict': {
'id': 'wcYn6li79NgN',
'ext': 'mp4',
'title': 'ACEPTO TENER UN BEBE CON MI NOVIA\u2026? | Parte 1',
'description': '',
'channel': 'ARTA GAME',
'channel_id': 'arta_game',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/cnEdGp5X/thumbnails/rHAaWfP7.jpg',
'timestamp': 1703434976,
'upload_date': '20231224',
},
}]
def _extract_video(self, video_id):
video_data = self._download_json(
f'https://hermes.piramide.tv/video/data/{video_id}', video_id, fatal=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://cdn.piramide.tv/video/{video_id}/manifest.m3u8', video_id, fatal=False)
next_video = traverse_obj(video_data, ('video', 'next_video', 'id', {str}))
return next_video, {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_data, ('video', {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('media', 'thumbnail', {url_or_none}),
'channel': ('channel', 'name', {str}),
'channel_id': ('channel', 'id', {str}),
'timestamp': ('date', {parse_iso8601}),
})),
}
def _entries(self, video_id):
visited = set()
while True:
visited.add(video_id)
next_video, info = self._extract_video(video_id)
yield info
if not next_video or next_video in visited:
break
video_id = next_video
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
if self._yes_playlist(video_id, video_id, smuggled_data):
return self.playlist_result(self._entries(video_id), video_id)
return self._extract_video(video_id)[1]
class PiramideTVChannelIE(InfoExtractor):
_VALID_URL = r'https?://piramide\.tv/channel/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://piramide.tv/channel/thekalo',
'playlist_mincount': 10,
'info_dict': {
'id': 'thekalo',
},
}]
def _entries(self, channel_name):
videos = self._download_json(
f'https://hermes.piramide.tv/channel/list/{channel_name}/date/100000', channel_name)
for video in traverse_obj(videos, ('videos', lambda _, v: v['id'])):
yield self.url_result(smuggle_url(
f'https://piramide.tv/video/{video["id"]}', {'force_noplaylist': True}),
**traverse_obj(video, {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
}))
def _real_extract(self, url):
channel_name = self._match_id(url)
return self.playlist_result(self._entries(channel_name), channel_name)

View file

@ -1,4 +1,5 @@
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
traverse_obj,
@ -110,8 +111,8 @@ class PixivSketchUserIE(PixivSketchBaseIE):
if not traverse_obj(data, 'is_broadcasting'):
try:
self._call_api(user_id, 'users/current.json', url, 'Investigating reason for request failure')
except ExtractorError as ex:
if ex.cause and ex.cause.code == 401:
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
self.raise_login_required(f'Please log in, or use direct link like https://sketch.pixiv.net/@{user_id}/1234567890', method='cookies')
raise ExtractorError('This user is offline', expected=True)

130
yt_dlp/extractor/plvideo.py Normal file
View file

@ -0,0 +1,130 @@
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
parse_resolution,
url_or_none,
)
from ..utils.traversal import traverse_obj
class PlVideoIE(InfoExtractor):
IE_DESC = 'Платформа'
_VALID_URL = r'https?://(?:www\.)?plvideo\.ru/(?:watch\?(?:[^#]+&)?v=|shorts/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://plvideo.ru/watch?v=Y5JzUzkcQTMK',
'md5': 'fe8e18aca892b3b31f3bf492169f8a26',
'info_dict': {
'id': 'Y5JzUzkcQTMK',
'ext': 'mp4',
'thumbnail': 'https://img.plvideo.ru/images/fp-2024-images/v/cover/37/dd/37dd00a4c96c77436ab737e85947abd7/original663a4a3bb713e5.33151959.jpg',
'title': 'Presidente de Cuba llega a Moscú en una visita de trabajo',
'channel': 'RT en Español',
'channel_id': 'ZH4EKqunVDvo',
'media_type': 'video',
'comment_count': int,
'tags': ['rusia', 'cuba', 'russia', 'miguel díaz-canel'],
'description': 'md5:a1a395d900d77a86542a91ee0826c115',
'release_timestamp': 1715096124,
'channel_is_verified': True,
'like_count': int,
'timestamp': 1715095911,
'duration': 44320,
'view_count': int,
'dislike_count': int,
'upload_date': '20240507',
'modified_date': '20240701',
'channel_follower_count': int,
'modified_timestamp': 1719824073,
},
}, {
'url': 'https://plvideo.ru/shorts/S3Uo9c-VLwFX',
'md5': '7d8fa2279406c69d2fd2a6fc548a9805',
'info_dict': {
'id': 'S3Uo9c-VLwFX',
'ext': 'mp4',
'channel': 'Romaatom',
'tags': 'count:22',
'dislike_count': int,
'upload_date': '20241130',
'description': 'md5:452e6de219bf2f32bb95806c51c3b364',
'duration': 58433,
'modified_date': '20241130',
'thumbnail': 'https://img.plvideo.ru/images/fp-2024-11-cover/S3Uo9c-VLwFX/f9318999-a941-482b-b700-2102a7049366.jpg',
'media_type': 'shorts',
'like_count': int,
'modified_timestamp': 1732961458,
'channel_is_verified': True,
'channel_id': 'erJyyTIbmUd1',
'timestamp': 1732961355,
'comment_count': int,
'title': 'Белоусов отменил приказы о кадровом резерве на гражданской службе',
'channel_follower_count': int,
'view_count': int,
'release_timestamp': 1732961458,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
f'https://api.g1.plvideo.ru/v1/videos/{video_id}?Aud=18', video_id)
is_live = False
formats = []
subtitles = {}
automatic_captions = {}
for quality, data in traverse_obj(video_data, ('item', 'profiles', {dict.items}, lambda _, v: url_or_none(v[1]['hls']))):
formats.append({
'format_id': quality,
'ext': 'mp4',
'protocol': 'm3u8_native',
**traverse_obj(data, {
'url': 'hls',
'fps': ('fps', {float_or_none}),
'aspect_ratio': ('aspectRatio', {float_or_none}),
}),
**parse_resolution(quality),
})
if livestream_url := traverse_obj(video_data, ('item', 'livestream', 'url', {url_or_none})):
is_live = True
formats.extend(self._extract_m3u8_formats(livestream_url, video_id, 'mp4', live=True))
for lang, url in traverse_obj(video_data, ('item', 'subtitles', {dict.items}, lambda _, v: url_or_none(v[1]))):
if lang.endswith('-auto'):
automatic_captions.setdefault(lang[:-5], []).append({
'url': url,
})
else:
subtitles.setdefault(lang, []).append({
'url': url,
})
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'automatic_captions': automatic_captions,
'is_live': is_live,
**traverse_obj(video_data, ('item', {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('cover', 'paths', 'original', 'src', {url_or_none}),
'duration': ('uploadFile', 'videoDuration', {int_or_none}),
'channel': ('channel', 'name', {str}),
'channel_id': ('channel', 'id', {str}),
'channel_follower_count': ('channel', 'stats', 'subscribers', {int_or_none}),
'channel_is_verified': ('channel', 'verified', {bool}),
'tags': ('tags', ..., {str}),
'timestamp': ('createdAt', {parse_iso8601}),
'release_timestamp': ('publishedAt', {parse_iso8601}),
'modified_timestamp': ('updatedAt', {parse_iso8601}),
'view_count': ('stats', 'viewTotalCount', {int_or_none}),
'like_count': ('stats', 'likeCount', {int_or_none}),
'dislike_count': ('stats', 'dislikeCount', {int_or_none}),
'comment_count': ('stats', 'commentCount', {int_or_none}),
'media_type': ('type', {str}),
})),
}

View file

@ -114,7 +114,7 @@ class RedGifsBaseInfoExtractor(InfoExtractor):
class RedGifsIE(RedGifsBaseInfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/watch/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/(?:watch|ifr)/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_TESTS = [{
'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent',
'info_dict': {
@ -147,6 +147,22 @@ class RedGifsIE(RedGifsBaseInfoExtractor):
'age_limit': 18,
'tags': list,
},
}, {
'url': 'https://www.redgifs.com/ifr/squeakyhelplesswisent',
'info_dict': {
'id': 'squeakyhelplesswisent',
'ext': 'mp4',
'title': 'Hotwife Legs Thick',
'timestamp': 1636287915,
'upload_date': '20211107',
'uploader': 'ignored52',
'duration': 16,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
'tags': list,
},
}]
def _real_extract(self, url):

View file

@ -176,6 +176,8 @@ class RTVSLOShowIE(InfoExtractor):
'info_dict': {
'id': '173250997',
'title': 'Ekipa Bled',
'description': 'md5:c88471e27a1268c448747a5325319ab7',
'thumbnail': 'https://img.rtvcdn.si/_up/ava/ava_misc/show_logos/173250997/logo_wide1.jpg',
},
'playlist_count': 18,
}]
@ -187,4 +189,7 @@ class RTVSLOShowIE(InfoExtractor):
return self.playlist_from_matches(
re.findall(r'<a [^>]*\bhref="(/arhiv/[^"]+)"', webpage),
playlist_id, self._html_extract_title(webpage),
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE)
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE,
description=self._og_search_description(webpage),
thumbnail=self._og_search_thumbnail(webpage),
)

View file

@ -4,43 +4,12 @@ import urllib.parse
from .common import InfoExtractor
from ..utils import (
ExtractorError,
parse_qs,
unsmuggle_url,
UnsupportedError,
make_archive_id,
remove_end,
url_or_none,
)
_COMMITTEES = {
'ag': ('76440', 'http://ag-f.akamaihd.net'),
'aging': ('76442', 'http://aging-f.akamaihd.net'),
'approps': ('76441', 'http://approps-f.akamaihd.net'),
'arch': ('', 'http://ussenate-f.akamaihd.net'),
'armed': ('76445', 'http://armed-f.akamaihd.net'),
'banking': ('76446', 'http://banking-f.akamaihd.net'),
'budget': ('76447', 'http://budget-f.akamaihd.net'),
'cecc': ('76486', 'http://srs-f.akamaihd.net'),
'commerce': ('80177', 'http://commerce1-f.akamaihd.net'),
'csce': ('75229', 'http://srs-f.akamaihd.net'),
'dpc': ('76590', 'http://dpc-f.akamaihd.net'),
'energy': ('76448', 'http://energy-f.akamaihd.net'),
'epw': ('76478', 'http://epw-f.akamaihd.net'),
'ethics': ('76449', 'http://ethics-f.akamaihd.net'),
'finance': ('76450', 'http://finance-f.akamaihd.net'),
'foreign': ('76451', 'http://foreign-f.akamaihd.net'),
'govtaff': ('76453', 'http://govtaff-f.akamaihd.net'),
'help': ('76452', 'http://help-f.akamaihd.net'),
'indian': ('76455', 'http://indian-f.akamaihd.net'),
'intel': ('76456', 'http://intel-f.akamaihd.net'),
'intlnarc': ('76457', 'http://intlnarc-f.akamaihd.net'),
'jccic': ('85180', 'http://jccic-f.akamaihd.net'),
'jec': ('76458', 'http://jec-f.akamaihd.net'),
'judiciary': ('76459', 'http://judiciary-f.akamaihd.net'),
'rpc': ('76591', 'http://rpc-f.akamaihd.net'),
'rules': ('76460', 'http://rules-f.akamaihd.net'),
'saa': ('76489', 'http://srs-f.akamaihd.net'),
'smbiz': ('76461', 'http://smbiz-f.akamaihd.net'),
'srs': ('75229', 'http://srs-f.akamaihd.net'),
'uscc': ('76487', 'http://srs-f.akamaihd.net'),
'vetaff': ('76462', 'http://vetaff-f.akamaihd.net'),
}
from ..utils.traversal import traverse_obj
class SenateISVPIE(InfoExtractor):
@ -53,31 +22,46 @@ class SenateISVPIE(InfoExtractor):
'info_dict': {
'id': 'judiciary031715',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
'title': 'ISVP',
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'_old_archive_ids': ['senategov judiciary031715'],
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false',
'info_dict': {
'id': 'commerce011514',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
'_old_archive_ids': ['senategov commerce011514'],
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'This video is not available.',
}, {
'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi',
# checksum differs each time
'info_dict': {
'id': 'intel090613',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
'title': 'ISVP',
'_old_archive_ids': ['senategov intel090613'],
},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'https://www.senate.gov/isvp/?auto_play=false&comm=help&filename=help090920&poster=https://www.help.senate.gov/assets/images/video-poster.png&stt=950',
'info_dict': {
'id': 'help090920',
'ext': 'mp4',
'title': 'ISVP',
'thumbnail': 'https://www.help.senate.gov/assets/images/video-poster.png',
'_old_archive_ids': ['senategov help090920'],
},
}, {
# From http://www.c-span.org/video/?96791-1
@ -85,60 +69,81 @@ class SenateISVPIE(InfoExtractor):
'only_matching': True,
}]
_COMMITTEES = {
'ag': ('76440', 'https://ag-f.akamaihd.net', '2036803', 'agriculture'),
'aging': ('76442', 'https://aging-f.akamaihd.net', '2036801', 'aging'),
'approps': ('76441', 'https://approps-f.akamaihd.net', '2036802', 'appropriations'),
'arch': ('', 'https://ussenate-f.akamaihd.net', '', 'arch'),
'armed': ('76445', 'https://armed-f.akamaihd.net', '2036800', 'armedservices'),
'banking': ('76446', 'https://banking-f.akamaihd.net', '2036799', 'banking'),
'budget': ('76447', 'https://budget-f.akamaihd.net', '2036798', 'budget'),
'cecc': ('76486', 'https://srs-f.akamaihd.net', '2036782', 'srs_cecc'),
'commerce': ('80177', 'https://commerce1-f.akamaihd.net', '2036779', 'commerce'),
'csce': ('75229', 'https://srs-f.akamaihd.net', '2036777', 'srs_srs'),
'dpc': ('76590', 'https://dpc-f.akamaihd.net', '', 'dpc'),
'energy': ('76448', 'https://energy-f.akamaihd.net', '2036797', 'energy'),
'epw': ('76478', 'https://epw-f.akamaihd.net', '2036783', 'environment'),
'ethics': ('76449', 'https://ethics-f.akamaihd.net', '2036796', 'ethics'),
'finance': ('76450', 'https://finance-f.akamaihd.net', '2036795', 'finance_finance'),
'foreign': ('76451', 'https://foreign-f.akamaihd.net', '2036794', 'foreignrelations'),
'govtaff': ('76453', 'https://govtaff-f.akamaihd.net', '2036792', 'hsgac'),
'help': ('76452', 'https://help-f.akamaihd.net', '2036793', 'help'),
'indian': ('76455', 'https://indian-f.akamaihd.net', '2036791', 'indianaffairs'),
'intel': ('76456', 'https://intel-f.akamaihd.net', '2036790', 'intelligence'),
'intlnarc': ('76457', 'https://intlnarc-f.akamaihd.net', '', 'internationalnarcoticscaucus'),
'jccic': ('85180', 'https://jccic-f.akamaihd.net', '2036778', 'jccic'),
'jec': ('76458', 'https://jec-f.akamaihd.net', '2036789', 'jointeconomic'),
'judiciary': ('76459', 'https://judiciary-f.akamaihd.net', '2036788', 'judiciary'),
'rpc': ('76591', 'https://rpc-f.akamaihd.net', '', 'rpc'),
'rules': ('76460', 'https://rules-f.akamaihd.net', '2036787', 'rules'),
'saa': ('76489', 'https://srs-f.akamaihd.net', '2036780', 'srs_saa'),
'smbiz': ('76461', 'https://smbiz-f.akamaihd.net', '2036786', 'smallbusiness'),
'srs': ('75229', 'https://srs-f.akamaihd.net', '2031966', 'srs_srs'),
'uscc': ('76487', 'https://srs-f.akamaihd.net', '2036781', 'srs_uscc'),
'vetaff': ('76462', 'https://vetaff-f.akamaihd.net', '2036785', 'veteransaffairs'),
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
qs = urllib.parse.parse_qs(self._match_valid_url(url).group('qs'))
if not qs.get('filename') or not qs.get('type') or not qs.get('comm'):
if not qs.get('filename') or not qs.get('comm'):
raise ExtractorError('Invalid URL', expected=True)
video_id = re.sub(r'.mp4$', '', qs['filename'][0])
filename = qs['filename'][0]
video_id = remove_end(filename, '.mp4')
webpage = self._download_webpage(url, video_id)
committee = qs['comm'][0]
if smuggled_data.get('force_title'):
title = smuggled_data['force_title']
else:
title = self._html_extract_title(webpage)
poster = qs.get('poster')
thumbnail = poster[0] if poster else None
video_type = qs['type'][0]
committee = video_type if video_type == 'arch' else qs['comm'][0]
stream_num, domain = _COMMITTEES[committee]
stream_num, stream_domain, stream_id, msl3 = self._COMMITTEES[committee]
urls_alternatives = [f'https://www-senate-gov-media-srs.akamaized.net/hls/live/{stream_id}/{committee}/{filename}/master.m3u8',
f'https://www-senate-gov-msl3archive.akamaized.net/{msl3}/{filename}_1/master.m3u8',
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
f'{stream_domain}/i/{filename}.mp4/master.m3u8']
formats = []
if video_type == 'arch':
filename = video_id if '.' in video_id else video_id + '.mp4'
m3u8_url = urllib.parse.urljoin(domain, 'i/' + filename + '/master.m3u8')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8')
else:
hdcore_sign = 'hdcore=3.1.0'
url_params = (domain, video_id, stream_num)
f4m_url = f'%s/z/%s_1@%s/manifest.f4m?{hdcore_sign}' % url_params
m3u8_url = '{}/i/{}_1@{}/master.m3u8'.format(*url_params)
for entry in self._extract_f4m_formats(f4m_url, video_id, f4m_id='f4m'):
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.append(entry)
for entry in self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8'):
mobj = re.search(r'(?P<tag>(?:-p|-b)).m3u8', entry['url'])
if mobj:
entry['format_id'] += mobj.group('tag')
formats.append(entry)
subtitles = {}
for video_url in urls_alternatives:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(video_url, video_id, ext='mp4', fatal=False)
if formats:
break
return {
'id': video_id,
'title': title,
'title': self._html_extract_title(webpage),
'formats': formats,
'thumbnail': thumbnail,
'subtitles': subtitles,
'thumbnail': traverse_obj(qs, ('poster', 0, {url_or_none})),
'_old_archive_ids': [make_archive_id(SenateGovIE, video_id)],
}
class SenateGovIE(InfoExtractor):
_IE_NAME = 'senate.gov'
_VALID_URL = r'https?:\/\/(?:www\.)?(help|appropriations|judiciary|banking|armed-services|finance)\.senate\.gov'
_SUBDOMAIN_RE = '|'.join(map(re.escape, (
'agriculture', 'aging', 'appropriations', 'armed-services', 'banking',
'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help',
'intelligence', 'inaugural', 'judiciary', 'rules', 'sbc', 'veterans',
)))
_VALID_URL = rf'https?://(?:www\.)?(?:{_SUBDOMAIN_RE})\.senate\.gov'
_TESTS = [{
'url': 'https://www.help.senate.gov/hearings/vaccines-saving-lives-ensuring-confidence-and-protecting-public-health',
'info_dict': {
@ -147,6 +152,9 @@ class SenateGovIE(InfoExtractor):
'title': 'Vaccines: Saving Lives, Ensuring Confidence, and Protecting Public Health',
'description': 'The U.S. Senate Committee on Health, Education, Labor & Pensions',
'ext': 'mp4',
'age_limit': 0,
'thumbnail': 'https://www.help.senate.gov/assets/images/sharelogo.jpg',
'_old_archive_ids': ['senategov help090920'],
},
'params': {'skip_download': 'm3u8'},
}, {
@ -156,8 +164,12 @@ class SenateGovIE(InfoExtractor):
'display_id': 'watch?hearingid=B8A25434-5056-A066-6020-1F68CB75F0CD',
'title': 'Review of the FY2019 Budget Request for the U.S. Army',
'ext': 'mp4',
'age_limit': 0,
'thumbnail': 'https://www.appropriations.senate.gov/themes/appropriations/images/video-poster-flash-fit.png',
'_old_archive_ids': ['senategov appropsA051518'],
},
'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'https://www.banking.senate.gov/hearings/21st-century-communities-public-transportation-infrastructure-investment-and-fast-act-reauthorization',
'info_dict': {
@ -166,32 +178,65 @@ class SenateGovIE(InfoExtractor):
'title': '21st Century Communities: Public Transportation Infrastructure Investment and FAST Act Reauthorization',
'description': 'The Official website of The United States Committee on Banking, Housing, and Urban Affairs',
'ext': 'mp4',
'thumbnail': 'https://www.banking.senate.gov/themes/banking/images/sharelogo.jpg',
'age_limit': 0,
'_old_archive_ids': ['senategov banking041521'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.agriculture.senate.gov/hearings/hemp-production-and-the-2018-farm-bill',
'only_matching': True,
}, {
'url': 'https://www.aging.senate.gov/hearings/the-older-americans-act-the-local-impact-of-the-law-and-the-upcoming-reauthorization',
'only_matching': True,
}, {
'url': 'https://www.budget.senate.gov/hearings/improving-care-lowering-costs-achieving-health-care-efficiency',
'only_matching': True,
}, {
'url': 'https://www.commerce.senate.gov/2024/12/communications-networks-safety-and-security',
'only_matching': True,
}, {
'url': 'https://www.energy.senate.gov/hearings/2024/2/full-committee-hearing-to-examine',
'only_matching': True,
}, {
'url': 'https://www.epw.senate.gov/public/index.cfm/hearings?ID=F63083EA-2C13-498C-B548-341BED68C209',
'only_matching': True,
}, {
'url': 'https://www.foreign.senate.gov/hearings/american-diplomacy-and-global-leadership-review-of-the-fy25-state-department-budget-request',
'only_matching': True,
}, {
'url': 'https://www.intelligence.senate.gov/hearings/foreign-threats-elections-2024-%E2%80%93-roles-and-responsibilities-us-tech-providers',
'only_matching': True,
}, {
'url': 'https://www.inaugural.senate.gov/52nd-inaugural-ceremonies/',
'only_matching': True,
}, {
'url': 'https://www.rules.senate.gov/hearings/02/07/2023/business-meeting',
'only_matching': True,
}, {
'url': 'https://www.sbc.senate.gov/public/index.cfm/hearings?ID=5B13AA6B-8279-45AF-B54B-94156DC7A2AB',
'only_matching': True,
}, {
'url': 'https://www.veterans.senate.gov/2024/5/frontier-health-care-ensuring-veterans-access-no-matter-where-they-live',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._generic_id(url)
webpage = self._download_webpage(url, display_id)
parse_info = parse_qs(self._search_regex(
r'<iframe class="[^>"]*streaminghearing[^>"]*"\s[^>]*\bsrc="([^">]*)', webpage, 'hearing URL'))
stream_num, stream_domain = _COMMITTEES[parse_info['comm'][-1]]
filename = parse_info['filename'][-1]
formats = self._extract_m3u8_formats(
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
display_id, ext='mp4')
url_info = next(SenateISVPIE.extract_from_webpage(self._downloader, url, webpage), None)
if not url_info:
raise UnsupportedError(url)
title = self._html_search_regex(
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title')
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title', fatal=False)
return {
'id': re.sub(r'.mp4$', '', filename),
**url_info,
'_type': 'url_transparent',
'display_id': display_id,
'title': re.sub(r'\s+', ' ', title.split('|')[0]).strip(),
'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'age_limit': self._rta_search(webpage),
'formats': formats,
}

View file

@ -7,7 +7,6 @@ from .common import InfoExtractor, SearchInfoExtractor
from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import (
KNOWN_EXTENSIONS,
ExtractorError,
float_or_none,
int_or_none,
@ -211,6 +210,7 @@ class SoundcloudBaseIE(InfoExtractor):
format_urls = set()
formats = []
has_drm = False
query = {'client_id': self._CLIENT_ID}
if secret_token:
query['secret_token'] = secret_token
@ -246,55 +246,24 @@ class SoundcloudBaseIE(InfoExtractor):
'url': format_url,
'quality': 10,
'format_note': 'Original',
'vcodec': 'none',
})
def invalid_url(url):
return not url or url in format_urls
def add_format(f, protocol, is_preview=False):
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
if mobj:
for k, v in mobj.groupdict().items():
if not f.get(k):
f[k] = v
format_id_list = []
if protocol:
format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f.update({
'abr': 256,
'quality': 5,
'format_note': 'Premium',
})
for k in ('ext', 'abr'):
v = str_or_none(f.get(k))
if v:
format_id_list.append(v)
preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
if preview:
format_id_list.append('preview')
abr = f.get('abr')
if abr:
f['abr'] = int(abr)
if protocol in ('hls', 'hls-aes'):
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
else:
protocol = 'http'
f.update({
'format_id': '_'.join(format_id_list),
'protocol': protocol,
'preference': -10 if preview else None,
})
formats.append(f)
# New API
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']))):
for t in traverse_obj(info, ('media', 'transcodings', lambda _, v: url_or_none(v['url']) and v['preset'])):
if extract_flat:
break
format_url = t['url']
preset = t['preset']
preset_base = preset.partition('_')[0]
protocol = traverse_obj(t, ('format', 'protocol', {str}))
protocol = traverse_obj(t, ('format', 'protocol', {str})) or 'http'
if protocol.startswith(('ctr-', 'cbc-')):
has_drm = True
continue
if protocol == 'progressive':
protocol = 'http'
if protocol != 'hls' and '/hls' in format_url:
@ -302,35 +271,60 @@ class SoundcloudBaseIE(InfoExtractor):
if protocol == 'encrypted-hls' or '/encrypted-hls' in format_url:
protocol = 'hls-aes'
ext = None
if preset := traverse_obj(t, ('preset', {str_or_none})):
ext = preset.split('_')[0]
if ext not in KNOWN_EXTENSIONS:
ext = mimetype2ext(traverse_obj(t, ('format', 'mime_type', {str})))
identifier = join_nonempty(protocol, ext, delim='_')
if not self._is_requested(identifier):
self.write_debug(f'"{identifier}" is not a requested format, skipping')
short_identifier = f'{protocol}_{preset_base}'
if preset_base == 'abr':
self.write_debug(f'Skipping broken "{short_identifier}" format')
continue
if not self._is_requested(short_identifier):
self.write_debug(f'"{short_identifier}" is not a requested format, skipping')
continue
# XXX: if not extract_flat, 429 error must be caught where _extract_info_dict is called
stream_url = traverse_obj(self._call_api(
format_url, track_id, f'Downloading {identifier} format info JSON',
format_url, track_id, f'Downloading {short_identifier} format info JSON',
query=query, headers=self._HEADERS), ('url', {url_or_none}))
if invalid_url(stream_url):
continue
format_urls.add(stream_url)
add_format({
mime_type = traverse_obj(t, ('format', 'mime_type', {str}))
codec = self._search_regex(r'codecs="([^"]+)"', mime_type, 'codec', default=None)
ext = {
'mp4a': 'm4a',
'opus': 'opus',
}.get(codec[:4] if codec else None) or mimetype2ext(mime_type, default=None)
if not ext or ext == 'm3u8':
ext = preset_base
is_premium = t.get('quality') == 'hq'
abr = int_or_none(
self._search_regex(r'(\d+)k$', preset, 'abr', default=None)
or self._search_regex(r'\.(\d+)\.(?:opus|mp3)[/?]', stream_url, 'abr', default=None)
or (256 if (is_premium and 'aac' in preset) else None))
is_preview = (t.get('snipped')
or '/preview/' in format_url
or re.search(r'/(?:preview|playlist)/0/30/', stream_url))
formats.append({
'format_id': join_nonempty(protocol, preset, is_preview and 'preview', delim='_'),
'url': stream_url,
'ext': ext,
}, protocol, t.get('snipped') or '/preview/' in format_url)
'acodec': codec,
'vcodec': 'none',
'abr': abr,
'protocol': 'm3u8_native' if protocol in ('hls', 'hls-aes') else 'http',
'container': 'm4a_dash' if ext == 'm4a' else None,
'quality': 5 if is_premium else 0 if (abr and abr >= 160) else -1,
'format_note': 'Premium' if is_premium else None,
'preference': -10 if is_preview else None,
})
for f in formats:
f['vcodec'] = 'none'
if not formats and info.get('policy') == 'BLOCK':
self.raise_geo_restricted(metadata_available=True)
if not formats:
if has_drm:
self.report_drm(track_id)
if info.get('policy') == 'BLOCK':
self.raise_geo_restricted(metadata_available=True)
user = info.get('user') or {}
@ -367,6 +361,7 @@ class SoundcloudBaseIE(InfoExtractor):
'uploader_url': user.get('permalink_url'),
'timestamp': unified_timestamp(info.get('created_at')),
'title': info.get('title'),
'track': info.get('title'),
'description': info.get('description'),
'thumbnails': thumbnails,
'duration': float_or_none(info.get('duration'), 1000),
@ -399,7 +394,7 @@ class SoundcloudIE(SoundcloudBaseIE):
(?:(?:(?:www\.|m\.)?soundcloud\.com/
(?!stations/track)
(?P<uploader>[\w\d-]+)/
(?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#]))
(?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight|comments)/?(?:$|[?#]))
(?P<title>[\w\d-]+)
(?:/(?P<token>(?!(?:albums|sets|recommended))[^?]+?))?
(?:[?].*)?$)
@ -416,6 +411,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '62986583',
'ext': 'opus',
'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'track': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'description': 'No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o\'d',
'uploader': 'E.T. ExTerrestrial Music',
'uploader_id': '1571244',
@ -438,6 +434,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '47127627',
'ext': 'opus',
'title': 'Goldrushed',
'track': 'Goldrushed',
'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com',
'uploader': 'The Royal Concept',
'uploader_id': '9615865',
@ -463,6 +460,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '123998367',
'ext': 'mp3',
'title': 'Youtube - Dl Test Video \'\' Ä↭',
'track': 'Youtube - Dl Test Video \'\' Ä↭',
'description': 'test chars: "\'/\\ä↭',
'uploader': 'jaimeMF',
'uploader_id': '69767071',
@ -487,6 +485,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '123998367',
'ext': 'mp3',
'title': 'Youtube - Dl Test Video \'\' Ä↭',
'track': 'Youtube - Dl Test Video \'\' Ä↭',
'description': 'test chars: "\'/\\ä↭',
'uploader': 'jaimeMF',
'uploader_id': '69767071',
@ -511,6 +510,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '343609555',
'ext': 'wav',
'title': 'The Following',
'track': 'The Following',
'description': '',
'uploader': '80M',
'uploader_id': '312384765',
@ -536,6 +536,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '340344461',
'ext': 'wav',
'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
'uploader': 'Ori Uplift Music',
'uploader_id': '12563093',
@ -561,6 +562,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '309699954',
'ext': 'mp3',
'title': 'Sideways (Prod. Mad Real)',
'track': 'Sideways (Prod. Mad Real)',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'uploader': 'garyvee',
'uploader_id': '2366352',
@ -587,6 +589,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '583011102',
'ext': 'opus',
'title': 'Mezzo Valzer',
'track': 'Mezzo Valzer',
'description': 'md5:f4d5f39d52e0ccc2b4f665326428901a',
'uploader': 'Giovanni Sarani',
'uploader_id': '3352531',
@ -662,6 +665,11 @@ class SoundcloudPlaylistBaseIE(SoundcloudBaseIE):
'playlistId': playlist_id,
'playlistSecretToken': token,
}, headers=self._HEADERS)
album_info = traverse_obj(playlist, {
'album': ('title', {str}),
'album_artist': ('user', 'username', {str}),
'album_type': ('set_type', {str}, {lambda x: x or 'playlist'}),
})
entries = []
for track in tracks:
track_id = str_or_none(track.get('id'))
@ -673,11 +681,17 @@ class SoundcloudPlaylistBaseIE(SoundcloudBaseIE):
if token:
url += '?secret_token=' + token
entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id))
url, SoundcloudIE.ie_key(), track_id, url_transparent=True, **album_info))
return self.playlist_result(
entries, playlist_id,
playlist.get('title'),
playlist.get('description'))
playlist.get('description'),
**album_info,
**traverse_obj(playlist, {
'uploader': ('user', 'username', {str}),
'uploader_id': ('user', 'id', {str_or_none}),
}),
)
class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@ -689,6 +703,11 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'id': '2284613',
'title': 'The Royal Concept EP',
'description': 'md5:71d07087c7a449e8941a70a29e34671e',
'uploader': 'The Royal Concept',
'uploader_id': '9615865',
'album': 'The Royal Concept EP',
'album_artists': ['The Royal Concept'],
'album_type': 'ep',
},
'playlist_mincount': 5,
}, {
@ -782,7 +801,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
(?:(?:www|m)\.)?soundcloud\.com/
(?P<user>[^/]+)
(?:/
(?P<rsrc>tracks|albums|sets|reposts|likes|spotlight)
(?P<rsrc>tracks|albums|sets|reposts|likes|spotlight|comments)
)?
/?(?:[?#].*)?$
'''
@ -836,6 +855,13 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
'title': 'Grynpyret (Spotlight)',
},
'playlist_mincount': 1,
}, {
'url': 'https://soundcloud.com/one-thousand-and-one/comments',
'info_dict': {
'id': '992430331',
'title': '7x11x13-testing (Comments)',
},
'playlist_mincount': 1,
}]
_BASE_URL_MAP = {
@ -846,6 +872,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
'reposts': 'stream/users/%s/reposts',
'likes': 'users/%s/likes',
'spotlight': 'users/%s/spotlight',
'comments': 'users/%s/comments',
}
def _real_extract(self, url):
@ -966,6 +993,11 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
'id': '4110309',
'title': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
'description': 're:.*?TILT Brass - Bowery Poetry Club',
'uploader': 'Non-Site Records',
'uploader_id': '33660914',
'album_artists': ['Non-Site Records'],
'album_type': 'playlist',
'album': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
},
'playlist_count': 6,
}]

View file

@ -207,7 +207,7 @@ class TheaterComplexTownVODIE(TheaterComplexTownBaseIE):
class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?ppv/(?P<id>\w+)'
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?(?:ppv|live)/(?P<id>\w+)'
IE_NAME = 'theatercomplextown:ppv'
_TESTS = [{
'url': 'https://www.theater-complex.town/ppv/wytW3X7khrjJBUpKuV3jen',
@ -229,6 +229,9 @@ class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
}, {
'url': 'https://www.theater-complex.town/ja/ppv/qwUVmLmGEiZ3ZW6it9uGys',
'only_matching': True,
}, {
'url': 'https://www.theater-complex.town/en/live/79akNM7bJeD5Fi9EP39aDp',
'only_matching': True,
}]
_API_PATH = 'events'

View file

@ -0,0 +1,199 @@
import functools
import math
from .common import InfoExtractor
from ..utils import (
InAdvancePagedList,
int_or_none,
parse_iso8601,
try_call,
url_or_none,
)
from ..utils.traversal import traverse_obj
class SubsplashBaseIE(InfoExtractor):
def _get_headers(self, url, display_id):
token = try_call(lambda: self._get_cookies(url)['ss-token-guest'].value)
if not token:
webpage, urlh = self._download_webpage_handle(url, display_id)
token = (
try_call(lambda: self._get_cookies(url)['ss-token-guest'].value)
or urlh.get_header('x-api-token')
or self._search_json(
r'<script[^>]+\bid="shoebox-tokens"[^>]*>', webpage, 'shoebox tokens',
display_id, default={}).get('apiToken')
or self._search_regex(r'\\"tokens\\":{\\"guest\\":\\"([A-Za-z0-9._-]+)\\"', webpage, 'token', default=None))
if not token:
self.report_warning('Unable to extract auth token')
return None
return {'Authorization': f'Bearer {token}'}
def _extract_video(self, data, video_id):
formats = []
video_data = traverse_obj(data, ('_embedded', 'video', '_embedded', {dict}))
m3u8_url = traverse_obj(video_data, ('playlists', 0, '_links', 'related', 'href', {url_or_none}))
if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
mp4_entry = traverse_obj(video_data, ('video-outputs', lambda _, v: url_or_none(v['_links']['related']['href']), any))
if mp4_entry:
formats.append({
'url': mp4_entry['_links']['related']['href'],
'format_id': 'direct',
'quality': 1,
**traverse_obj(mp4_entry, {
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
'filesize': ('file_size', {int_or_none}),
}),
})
return {
'id': video_id,
'formats': formats,
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('summary_text', {str}),
'thumbnail': ('_embedded', 'images', 0, '_links', 'related', 'href', {url_or_none}),
'duration': ('_embedded', 'video', 'duration', {int_or_none(scale=1000)}),
'timestamp': ('date', {parse_iso8601}),
'release_timestamp': ('published_at', {parse_iso8601}),
'modified_timestamp': ('updated_at', {parse_iso8601}),
}),
}
class SubsplashIE(SubsplashBaseIE):
_VALID_URL = [
r'https?://(?:www\.)?subsplash\.com/(?:u/)?[^/?#]+/[^/?#]+/(?:d/|mi/\+)(?P<id>\w+)',
r'https?://(?:\w+\.)?subspla\.sh/(?P<id>\w+)',
]
_TESTS = [{
'url': 'https://subsplash.com/u/skywatchtv/media/d/5whnx5s-the-grand-delusion-taking-place-right-now',
'md5': 'd468729814e533cec86f1da505dec82d',
'info_dict': {
'id': '5whnx5s',
'ext': 'mp4',
'title': 'THE GRAND DELUSION TAKING PLACE RIGHT NOW!',
'description': 'md5:220a630865c3697b0ec9dcb3a70cbc33',
'upload_date': '20240901',
'duration': 1710,
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'modified_date': '20240901',
'release_date': '20240901',
'release_timestamp': 1725195600,
'timestamp': 1725148800,
'modified_timestamp': 1725195657,
},
}, {
'url': 'https://subsplash.com/u/prophecywatchers/media/d/n4dr8b2-the-transhumanist-plan-for-humanity-billy-crone',
'md5': '01982d58021af81c969958459bd81f13',
'info_dict': {
'id': 'n4dr8b2',
'ext': 'mp4',
'title': 'The Transhumanist Plan for Humanity | Billy Crone',
'upload_date': '20240903',
'duration': 1709,
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'timestamp': 1725321600,
'modified_date': '20241010',
'release_date': '20240903',
'release_timestamp': 1725379200,
'modified_timestamp': 1728577804,
},
}, {
'url': 'https://subsplash.com/laiglesiadelcentro/vid/mi/+ecb6a6b?autoplay=true',
'md5': '013c9b1e391dd4b34d8612439445deef',
'info_dict': {
'id': 'ecb6a6b',
'ext': 'mp4',
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'release_timestamp': 1477095852,
'title': 'En el Principio Era el Verbo | EVANGELIO DE JUAN | Ps. Gadiel Ríos',
'timestamp': 1425772800,
'upload_date': '20150308',
'description': 'md5:f368221de93176654989ba66bb564798',
'modified_timestamp': 1730258864,
'modified_date': '20241030',
'release_date': '20161022',
},
}, {
'url': 'https://prophecywatchers.subspla.sh/8gps8cx',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://core.subsplash.com/media/v1/media-items',
video_id, headers=self._get_headers(url, video_id),
query={
'filter[short_code]': video_id,
'include': 'images,audio.audio-outputs,audio.video,video.video-outputs,video.playlists,document,broadcast',
})
return self._extract_video(traverse_obj(data, ('_embedded', 'media-items', 0)), video_id)
class SubsplashPlaylistIE(SubsplashBaseIE):
IE_NAME = 'subsplash:playlist'
_VALID_URL = r'https?://(?:www\.)?subsplash\.com/[^/?#]+/(?:our-videos|media)/ms/\+(?P<id>\w+)'
_PAGE_SIZE = 15
_TESTS = [{
'url': 'https://subsplash.com/skywatchtv/our-videos/ms/+dbyjzp8',
'info_dict': {
'id': 'dbyjzp8',
'title': 'Five in Ten',
},
'playlist_mincount': 11,
}, {
'url': 'https://subsplash.com/prophecywatchers/media/ms/+n42mr48',
'info_dict': {
'id': 'n42mr48',
'title': 'Road to Zion Series',
},
'playlist_mincount': 13,
}, {
'url': 'https://subsplash.com/prophecywatchers/media/ms/+918b9f6',
'only_matching': True,
}]
def _entries(self, series_id, headers, page):
data = self._download_json(
'https://core.subsplash.com/media/v1/media-items', series_id, headers=headers,
query={
'filter[broadcast.status|broadcast.status]': 'null|on-demand',
'filter[media_series]': series_id,
'filter[status]': 'published',
'include': 'images,audio.audio-outputs,audio.video,video.video-outputs,video.playlists,document',
'page[number]': page + 1,
'page[size]': self._PAGE_SIZE,
'sort': '-position',
}, note=f'Downloading page {page + 1}')
for entry in traverse_obj(data, ('_embedded', 'media-items', lambda _, v: v['short_code'])):
entry_id = entry['short_code']
info = self._extract_video(entry, entry_id)
yield {
**info,
'webpage_url': f'https://subspla.sh/{entry_id}',
'extractor_key': SubsplashIE.ie_key(),
'extractor': SubsplashIE.IE_NAME,
}
def _real_extract(self, url):
display_id = self._match_id(url)
headers = self._get_headers(url, display_id)
data = self._download_json(
'https://core.subsplash.com/media/v1/media-series', display_id, headers=headers,
query={'filter[short_code]': display_id})
series_data = traverse_obj(data, ('_embedded', 'media-series', 0, {
'id': ('id', {str}),
'title': ('title', {str}),
'count': ('media_items_count', {int}),
}))
total_pages = math.ceil(series_data['count'] / self._PAGE_SIZE)
return self.playlist_result(
InAdvancePagedList(functools.partial(self._entries, series_data['id'], headers), total_pages, self._PAGE_SIZE),
display_id, series_data['title'])

View file

@ -413,15 +413,6 @@ class TikTokBaseIE(InfoExtractor):
for f in formats:
self._set_cookie(urllib.parse.urlparse(f['url']).hostname, 'sid_tt', auth_cookie.value)
thumbnails = []
for cover_id in ('cover', 'ai_dynamic_cover', 'animated_cover', 'ai_dynamic_cover_bak',
'origin_cover', 'dynamic_cover'):
for cover_url in traverse_obj(video_info, (cover_id, 'url_list', ...)):
thumbnails.append({
'id': cover_id,
'url': cover_url,
})
stats_info = aweme_detail.get('statistics') or {}
music_info = aweme_detail.get('music') or {}
labels = traverse_obj(aweme_detail, ('hybrid_label', ..., 'text'), expected_type=str)
@ -467,7 +458,17 @@ class TikTokBaseIE(InfoExtractor):
'formats': formats,
'subtitles': self.extract_subtitles(
aweme_detail, aweme_id, traverse_obj(author_info, 'uploader', 'uploader_id', 'channel_id')),
'thumbnails': thumbnails,
'thumbnails': [
{
'id': cover_id,
'url': cover_url,
'preference': -1 if cover_id in ('cover', 'origin_cover') else -2,
}
for cover_id in (
'cover', 'ai_dynamic_cover', 'animated_cover',
'ai_dynamic_cover_bak', 'origin_cover', 'dynamic_cover')
for cover_url in traverse_obj(video_info, (cover_id, 'url_list', ...))
],
'duration': (traverse_obj(video_info, (
(None, 'download_addr'), 'duration', {int_or_none(scale=1000)}, any))
or traverse_obj(music_info, ('duration', {int_or_none}))),
@ -600,11 +601,15 @@ class TikTokBaseIE(InfoExtractor):
'repost_count': 'shareCount',
'comment_count': 'commentCount',
}), expected_type=int_or_none),
'thumbnails': traverse_obj(aweme_detail, (
(None, 'video'), ('thumbnail', 'cover', 'dynamicCover', 'originCover'), {
'url': ({url_or_none}, {self._proto_relative_url}),
},
)),
'thumbnails': [
{
'id': cover_id,
'url': self._proto_relative_url(cover_url),
'preference': -2 if cover_id == 'dynamicCover' else -1,
}
for cover_id in ('thumbnail', 'cover', 'dynamicCover', 'originCover')
for cover_url in traverse_obj(aweme_detail, ((None, 'video'), cover_id, {url_or_none}))
],
}

View file

@ -189,26 +189,6 @@ class TumblrIE(InfoExtractor):
'release_date': '20140227',
},
'add_ie': ['Vimeo'],
}, {
'url': 'http://sutiblr.tumblr.com/post/139638707273',
'md5': '2dd184b3669e049ba40563a7d423f95c',
'info_dict': {
'id': 'ir7qBEIKqvq',
'ext': 'mp4',
'title': 'Vine by sutiblr',
'alt_title': 'Vine by sutiblr',
'uploader': 'sutiblr',
'uploader_id': '1198993975374495744',
'upload_date': '20160220',
'like_count': int,
'comment_count': int,
'repost_count': int,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1455940159,
'view_count': int,
},
'add_ie': ['Vine'],
'skip': 'Vine is unavailable',
}, {
'url': 'https://silami.tumblr.com/post/84250043974/my-bad-river-flows-in-you-impression-on-maschine',
'md5': '3c92d7c3d867f14ccbeefa2119022277',
@ -366,7 +346,6 @@ class TumblrIE(InfoExtractor):
_providers = {
'instagram': 'Instagram',
'vimeo': 'Vimeo',
'vine': 'Vine',
'youtube': 'Youtube',
'dailymotion': 'Dailymotion',
'tiktok': 'TikTok',

View file

@ -24,8 +24,6 @@ class TVerIE(InfoExtractor):
'channel': 'テレビ朝日',
'id': 'ep83nf3w4p',
'ext': 'mp4',
'onair_label': '5月3日(火)放送分',
'ext_title': '家事ヤロウ!!! 売り場席巻のチーズSP財前直見×森泉親子の脱東京暮らし密着 テレビ朝日 5月3日(火)放送分',
},
'add_ie': ['BrightcoveNew'],
}, {

View file

@ -409,26 +409,6 @@ class TwitterCardIE(InfoExtractor):
},
'add_ie': ['Youtube'],
},
{
'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
'info_dict': {
'id': 'iBb2x00UVlv',
'ext': 'mp4',
'upload_date': '20151113',
'uploader_id': '1189339351084113920',
'uploader': 'ArsenalTerje',
'title': 'Vine by ArsenalTerje',
'timestamp': 1447451307,
'alt_title': 'Vine by ArsenalTerje',
'comment_count': int,
'like_count': int,
'thumbnail': r're:^https?://[^?#]+\.jpg',
'view_count': int,
'repost_count': int,
},
'add_ie': ['Vine'],
'params': {'skip_download': 'm3u8'},
},
{
'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
'md5': '884812a2adc8aaf6fe52b15ccbfa3b88',
@ -567,25 +547,6 @@ class TwitterIE(TwitterBaseIE):
'age_limit': 0,
'_old_archive_ids': ['twitter 700207533655363584'],
},
}, {
'url': 'https://twitter.com/Filmdrunk/status/713801302971588609',
'md5': '89a15ed345d13b86e9a5a5e051fa308a',
'info_dict': {
'id': 'MIOxnrUteUd',
'ext': 'mp4',
'title': 'Dr.Pepperの飲み方 #japanese #バカ #ドクペ #電動ガン',
'uploader': 'TAKUMA',
'uploader_id': '1004126642786242560',
'timestamp': 1402826626,
'upload_date': '20140615',
'thumbnail': r're:^https?://.*\.jpg',
'alt_title': 'Vine by TAKUMA',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
},
'add_ie': ['Vine'],
}, {
'url': 'https://twitter.com/captainamerica/status/719944021058060289',
'info_dict': {

View file

@ -50,6 +50,7 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'music\.amazon\.(?:\w{2}\.)?\w+',
r'(?:watch|front)\.njpwworld\.com',
r'qub\.ca/vrai',
r'(?:beta\.)?crunchyroll\.com',
)
_TESTS = [{
@ -153,6 +154,12 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, {
'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -14,59 +14,69 @@ class VideocampusSachsenIE(InfoExtractor):
'corporate.demo.vimp.com',
'dancehalldatabase.com',
'drehzahl.tv',
'educhannel.hs-gesundheit.de',
'educhannel.hs-gesundheit.de', # Hochschule für Gesundheit NRW
'emedia.ls.haw-hamburg.de',
'globale-evolution.net',
'hohu.tv',
'htvideos.hightechhigh.org',
'k210039.vimp.mivitec.net',
'media.cmslegal.com',
'media.hs-furtwangen.de',
'media.hwr-berlin.de',
'media.fh-swf.de', # Fachhochschule Südwestfalen
'media.hs-furtwangen.de', # Hochschule Furtwangen
'media.hwr-berlin.de', # Hochschule für Wirtschaft und Recht Berlin
'mediathek.dkfz.de',
'mediathek.htw-berlin.de',
'mediathek.htw-berlin.de', # Hochschule für Technik und Wirtschaft Berlin
'mediathek.polizei-bw.de',
'medien.hs-merseburg.de',
'mportal.europa-uni.de',
'medien.hs-merseburg.de', # Hochschule Merseburg
'mitmedia.manukau.ac.nz', # Manukau Institute of Technology Auckland (NZ)
'mportal.europa-uni.de', # Europa-Universität Viadrina
'pacific.demo.vimp.com',
'slctv.com',
'streaming.prairiesouth.ca',
'tube.isbonline.cn',
'univideo.uni-kassel.de',
'univideo.uni-kassel.de', # Universität Kassel
'ursula2.genetics.emory.edu',
'ursulablicklevideoarchiv.com',
'v.agrarumweltpaedagogik.at',
'video.eplay-tv.de',
'video.fh-dortmund.de',
'video.hs-offenburg.de',
'video.hs-pforzheim.de',
'video.hspv.nrw.de',
'video.fh-dortmund.de', # Fachhochschule Dortmund
'video.hs-nb.de', # Hochschule Neubrandenburg
'video.hs-offenburg.de', # Hochschule Offenburg
'video.hs-pforzheim.de', # Hochschule Pforzheim
'video.hspv.nrw.de', # Hochschule für Polizei und öffentliche Verwaltung NRW
'video.irtshdf.fr',
'video.pareygo.de',
'video.tu-freiberg.de',
'videocampus.sachsen.de',
'videoportal.uni-freiburg.de',
'videoportal.vm.uni-freiburg.de',
'video.tu-dortmund.de', # Technische Universität Dortmund
'video.tu-freiberg.de', # Technische Universität Bergakademie Freiberg
'videocampus.sachsen.de', # Video Campus Sachsen (gemeinsame Videoplattform sächsischer Universitäten, Hochschulen und der Berufsakademie Sachsen)
'videoportal.uni-freiburg.de', # Albert-Ludwigs-Universität Freiburg
'videoportal.vm.uni-freiburg.de', # Albert-Ludwigs-Universität Freiburg
'videos.duoc.cl',
'videos.uni-paderborn.de',
'videos.uni-paderborn.de', # Universität Paderborn
'vimp-bemus.udk-berlin.de',
'vimp.aekwl.de',
'vimp.hs-mittweida.de',
'vimp.oth-regensburg.de',
'vimp.ph-heidelberg.de',
'vimp.landesfilmdienste.de',
'vimp.oth-regensburg.de', # Ostbayerische Technische Hochschule Regensburg
'vimp.ph-heidelberg.de', # Pädagogische Hochschule Heidelberg
'vimp.sma-events.com',
'vimp.weka-fachmedien.de',
'vimpdesk.com',
'webtv.univ-montp3.fr',
'www.b-tu.de/media',
'www.b-tu.de/media', # Brandenburgische Technische Universität Cottbus-Senftenberg
'www.bergauf.tv',
'www.bigcitytv.de',
'www.cad-videos.de',
'www.drehzahl.tv',
'www.fh-bielefeld.de/medienportal',
'www.hohu.tv',
'www.hsbi.de/medienportal', # Hochschule Bielefeld
'www.logistic.tv',
'www.orvovideo.com',
'www.printtube.co.uk',
'www.rwe.tv',
'www.salzi.tv',
'www.signtube.co.uk',
'www.twb-power.com',
'www.wenglor-media.com',
'www2.univ-sba.dz',
)
@ -188,22 +198,23 @@ class VideocampusSachsenIE(InfoExtractor):
class ViMPPlaylistIE(InfoExtractor):
IE_NAME = 'ViMP:Playlist'
_VALID_URL = r'''(?x)(?P<host>https?://(?:{}))/(?:
album/view/aid/(?P<album_id>[0-9]+)|
(?P<mode>category|channel)/(?P<name>[\w-]+)/(?P<id>[0-9]+)
(?P<mode1>album)/view/aid/(?P<album_id>[0-9]+)|
(?P<mode2>category|channel)/(?P<name>[\w-]+)/(?P<channel_id>[0-9]+)|
(?P<mode3>tag)/(?P<tag_id>[0-9]+)
)'''.format('|'.join(map(re.escape, VideocampusSachsenIE._INSTANCES)))
_TESTS = [{
'url': 'https://vimp.oth-regensburg.de/channel/Designtheorie-1-SoSe-2020/3',
'info_dict': {
'id': 'channel-3',
'title': 'Designtheorie 1 SoSe 2020 :: Channels :: ViMP OTH Regensburg',
'title': 'Designtheorie 1 SoSe 2020 - Channels - ViMP OTH Regensburg',
},
'playlist_mincount': 9,
}, {
'url': 'https://www.fh-bielefeld.de/medienportal/album/view/aid/208',
'url': 'https://www.hsbi.de/medienportal/album/view/aid/208',
'info_dict': {
'id': 'album-208',
'title': 'KG Praktikum ABT/MEC :: Playlists :: FH-Medienportal',
'title': 'KG Praktikum ABT/MEC - Playlists - HSBI-Medienportal',
},
'playlist_mincount': 4,
}, {
@ -213,6 +224,13 @@ class ViMPPlaylistIE(InfoExtractor):
'title': 'Online-Seminare ONYX - BPS - Bildungseinrichtungen - VCS',
},
'playlist_mincount': 7,
}, {
'url': 'https://videocampus.sachsen.de/tag/26902',
'info_dict': {
'id': 'tag-26902',
'title': 'advanced mobile and v2x communication - Tags - VCS',
},
'playlist_mincount': 6,
}]
_PAGE_SIZE = 10
@ -220,34 +238,37 @@ class ViMPPlaylistIE(InfoExtractor):
webpage = self._download_webpage(
f'{host}/media/ajax/component/boxList/{url_part}', playlist_id,
query={'page': page, 'page_only': 1}, data=urlencode_postdata(data))
urls = re.findall(r'"([^"]+/video/[^"]+)"', webpage)
urls = re.findall(r'"([^"]*/video/[^"]+)"', webpage)
for url in urls:
yield self.url_result(host + url, VideocampusSachsenIE)
def _real_extract(self, url):
host, album_id, mode, name, playlist_id = self._match_valid_url(url).group(
'host', 'album_id', 'mode', 'name', 'id')
host, album_id, name, channel_id, tag_id, mode1, mode2, mode3 = self._match_valid_url(url).group(
'host', 'album_id', 'name', 'channel_id', 'tag_id', 'mode1', 'mode2', 'mode3')
webpage = self._download_webpage(url, album_id or playlist_id, fatal=False) or ''
mode = mode1 or mode2 or mode3
playlist_id = album_id or channel_id or tag_id
webpage = self._download_webpage(url, playlist_id, fatal=False) or ''
title = (self._html_search_meta('title', webpage, fatal=False)
or self._html_extract_title(webpage))
url_part = (f'aid/{album_id}' if album_id
else f'category/{name}/category_id/{playlist_id}' if mode == 'category'
else f'title/{name}/channel/{playlist_id}')
else f'category/{name}/category_id/{channel_id}' if mode == 'category'
else f'title/{name}/channel/{channel_id}' if mode == 'channel'
else f'tag/{tag_id}')
mode = mode or 'album'
data = {
'vars[mode]': mode,
f'vars[{mode}]': album_id or playlist_id,
'vars[context]': '4' if album_id else '1' if mode == 'category' else '3',
'vars[context_id]': album_id or playlist_id,
f'vars[{mode}]': playlist_id,
'vars[context]': '4' if album_id else '1' if mode == 'category' else '3' if mode == 'album' else '0',
'vars[context_id]': playlist_id,
'vars[layout]': 'thumb',
'vars[per_page][thumb]': str(self._PAGE_SIZE),
}
return self.playlist_result(
OnDemandPagedList(functools.partial(
self._fetch_page, host, url_part, album_id or playlist_id, data), self._PAGE_SIZE),
playlist_title=title, id=f'{mode}-{album_id or playlist_id}')
self._fetch_page, host, url_part, playlist_id, data), self._PAGE_SIZE),
playlist_title=title, id=f'{mode}-{playlist_id}')

View file

@ -421,5 +421,5 @@ class VidyardIE(VidyardBaseIE):
return self._process_video_json(video_json['chapters'][0], video_id)
return self.playlist_result(
[self._process_video_json(chapter, video_id) for chapter in video_json['chapters']],
(self._process_video_json(chapter, video_id) for chapter in video_json['chapters']),
str(video_json['playerUuid']), video_json.get('name'))

View file

@ -28,6 +28,7 @@ from ..utils import (
try_get,
unified_timestamp,
unsmuggle_url,
url_or_none,
urlencode_postdata,
urlhandle_detect_ext,
urljoin,
@ -211,11 +212,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'width': int_or_none(key),
'url': thumb,
})
thumbnail = video_data.get('thumbnail')
if thumbnail:
thumbnails.append({
'url': thumbnail,
})
thumbnails.extend(traverse_obj(video_data, (('thumbnail', 'thumbnail_url'), {'url': {url_or_none}})))
owner = video_data.get('owner') or {}
video_uploader_url = owner.get('url')
@ -388,7 +385,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'businessofsoftware',
'duration': 3610,
'thumbnail': 'https://i.vimeocdn.com/video/376682406-f34043e7b766af6bef2af81366eacd6724f3fc3173179a11a97a1e26587c9529-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/376682406-f34043e7b766af6bef2af81366eacd6724f3fc3173179a11a97a1e26587c9529-d',
},
'params': {
'format': 'best[protocol=https]',
@ -413,7 +410,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 10,
'comment_count': int,
'like_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
},
'params': {
'format': 'best[protocol=https]',
@ -437,7 +434,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'timestamp': 1380339469,
'upload_date': '20130928',
'duration': 187,
'thumbnail': 'https://i.vimeocdn.com/video/450239872-a05512d9b1e55d707a7c04365c10980f327b06d966351bc403a5d5d65c95e572-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/450239872-a05512d9b1e55d707a7c04365c10980f327b06d966351bc403a5d5d65c95e572-d',
'view_count': int,
'comment_count': int,
'like_count': int,
@ -463,7 +460,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 62,
'comment_count': int,
'like_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/452001751-8216e0571c251a09d7a8387550942d89f7f86f6398f8ed886e639b0dd50d3c90-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/452001751-8216e0571c251a09d7a8387550942d89f7f86f6398f8ed886e639b0dd50d3c90-d',
'subtitles': {
'de': 'count:3',
'en': 'count:3',
@ -488,7 +485,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593',
'uploader_id': 'user28849593',
'duration': 118,
'thumbnail': 'https://i.vimeocdn.com/video/478636036-c18440305ef3df9decfb6bf207a61fe39d2d17fa462a96f6f2d93d30492b037d-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/478636036-c18440305ef3df9decfb6bf207a61fe39d2d17fa462a96f6f2d93d30492b037d-d',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
},
@ -509,7 +506,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 60,
'comment_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d',
'like_count': int,
'tags': 'count:11',
},
@ -531,7 +528,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': 'md5:f2edc61af3ea7a5592681ddbb683db73',
'upload_date': '20200225',
'duration': 176,
'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d',
'uploader_url': 'https://vimeo.com/frameworkla',
},
# 'params': {'format': 'source'},
@ -556,7 +553,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 321,
'comment_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/22728298-bfc22146f930de7cf497821c7b0b9f168099201ecca39b00b6bd31fcedfca7a6-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/22728298-bfc22146f930de7cf497821c7b0b9f168099201ecca39b00b6bd31fcedfca7a6-d',
'like_count': int,
'tags': ['[the shining', 'vimeohq', 'cv', 'vimeo tribute]'],
},
@ -596,7 +593,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
},
'params': {
'format': 'best[protocol=https]',
@ -633,7 +630,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': str, # FIXME: Dynamic SEO spam description
'upload_date': '20150209',
'timestamp': 1423518307,
'thumbnail': 'https://i.vimeocdn.com/video/default_1280',
'thumbnail': 'https://i.vimeocdn.com/video/default',
'duration': 10,
'like_count': int,
'uploader_url': 'https://vimeo.com/user20132939',
@ -666,7 +663,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'license': 'by-nc',
'duration': 159,
'comment_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/562802436-585eeb13b5020c6ac0f171a2234067938098f84737787df05ff0d767f6d54ee9-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/562802436-585eeb13b5020c6ac0f171a2234067938098f84737787df05ff0d767f6d54ee9-d',
'like_count': int,
'uploader_url': 'https://vimeo.com/aliniamedia',
'release_date': '20160329',
@ -686,7 +683,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Firework Champions',
'upload_date': '20150910',
'timestamp': 1441901895,
'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d',
'uploader_url': 'https://vimeo.com/fireworkchampions',
'tags': 'count:6',
'duration': 229,
@ -715,7 +712,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 336,
'comment_count': int,
'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/541243181-b593db36a16db2f0096f655da3f5a4dc46b8766d77b0f440df937ecb0c418347-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/541243181-b593db36a16db2f0096f655da3f5a4dc46b8766d77b0f440df937ecb0c418347-d',
'like_count': int,
'uploader_url': 'https://vimeo.com/karimhd',
'channel_url': 'https://vimeo.com/channels/staffpicks',
@ -740,7 +737,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'release_timestamp': 1627621014,
'duration': 976,
'comment_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/1202249320-4ddb2c30398c0dc0ee059172d1bd5ea481ad12f0e0e3ad01d2266f56c744b015-d_1280',
'thumbnail': 'https://i.vimeocdn.com/video/1202249320-4ddb2c30398c0dc0ee059172d1bd5ea481ad12f0e0e3ad01d2266f56c744b015-d',
'like_count': int,
'uploader_url': 'https://vimeo.com/txwestcapital',
'release_date': '20210730',
@ -764,7 +761,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Alex Howard',
'uploader_id': 'user54729178',
'uploader_url': 'https://vimeo.com/user54729178',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d_1280',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d',
'duration': 2636,
'chapters': [
{'start_time': 0, 'end_time': 10, 'title': '<Untitled Chapter 1>'},
@ -807,7 +804,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': int,
'view_count': int,
'comment_count': int,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1018638656-[\da-f]+-d_1280',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1018638656-[\da-f]+-d',
},
# 'params': {'format': 'Original'},
'expected_warnings': ['Failed to parse XML: not well-formed'],
@ -824,7 +821,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'rajavirdi',
'uploader_url': 'https://vimeo.com/rajavirdi',
'duration': 309,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d_1280',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d',
},
# 'params': {'format': 'source'},
'expected_warnings': ['Failed to parse XML: not well-formed'],

View file

@ -1,150 +0,0 @@
from .common import InfoExtractor
from ..utils import (
determine_ext,
format_field,
int_or_none,
unified_timestamp,
)
class VineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vine\.co/(?:v|oembed)/(?P<id>\w+)'
_EMBED_REGEX = [r'<iframe[^>]+src=[\'"](?P<url>(?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))']
_TESTS = [{
'url': 'https://vine.co/v/b9KOOWX7HUx',
'md5': '2f36fed6235b16da96ce9b4dc890940d',
'info_dict': {
'id': 'b9KOOWX7HUx',
'ext': 'mp4',
'title': 'Chicken.',
'alt_title': 'Vine by Jack',
'timestamp': 1368997951,
'upload_date': '20130519',
'uploader': 'Jack',
'uploader_id': '76',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
}, {
'url': 'https://vine.co/v/e192BnZnZ9V',
'info_dict': {
'id': 'e192BnZnZ9V',
'ext': 'mp4',
'title': 'ยิ้ม~ เขิน~ อาย~ น่าร้ากอ้ะ >//< @n_whitewo @orlameena #lovesicktheseries #lovesickseason2',
'alt_title': 'Vine by Pimry_zaa',
'timestamp': 1436057405,
'upload_date': '20150705',
'uploader': 'Pimry_zaa',
'uploader_id': '1135760698325307392',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://vine.co/v/MYxVapFvz2z',
'only_matching': True,
}, {
'url': 'https://vine.co/v/bxVjBbZlPUH',
'only_matching': True,
}, {
'url': 'https://vine.co/oembed/MYxVapFvz2z.json',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
f'https://archive.vine.co/posts/{video_id}.json', video_id)
def video_url(kind):
for url_suffix in ('Url', 'URL'):
format_url = data.get(f'video{kind}{url_suffix}')
if format_url:
return format_url
formats = []
for quality, format_id in enumerate(('low', '', 'dash')):
format_url = video_url(format_id.capitalize())
if not format_url:
continue
# DASH link returns plain mp4
if format_id == 'dash' and determine_ext(format_url) == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id or 'standard',
'quality': quality,
})
self._check_formats(formats, video_id)
username = data.get('username')
alt_title = format_field(username, None, 'Vine by %s')
return {
'id': video_id,
'title': data.get('description') or alt_title or 'Vine video',
'alt_title': alt_title,
'thumbnail': data.get('thumbnailUrl'),
'timestamp': unified_timestamp(data.get('created')),
'uploader': username,
'uploader_id': data.get('userIdStr'),
'view_count': int_or_none(data.get('loops')),
'like_count': int_or_none(data.get('likes')),
'comment_count': int_or_none(data.get('comments')),
'repost_count': int_or_none(data.get('reposts')),
'formats': formats,
}
class VineUserIE(InfoExtractor):
IE_NAME = 'vine:user'
_VALID_URL = r'https?://vine\.co/(?P<u>u/)?(?P<user>[^/]+)'
_VINE_BASE_URL = 'https://vine.co/'
_TESTS = [{
'url': 'https://vine.co/itsruthb',
'info_dict': {
'id': 'itsruthb',
'title': 'Ruth B',
'description': '| Instagram/Twitter: itsruthb | still a lost boy from neverland',
},
'playlist_mincount': 611,
}, {
'url': 'https://vine.co/u/942914934646415360',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if VineIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
mobj = self._match_valid_url(url)
user = mobj.group('user')
u = mobj.group('u')
profile_url = '{}api/users/profiles/{}{}'.format(
self._VINE_BASE_URL, 'vanity/' if not u else '', user)
profile_data = self._download_json(
profile_url, user, note='Downloading user profile data')
data = profile_data['data']
user_id = data.get('userId') or data['userIdStr']
profile = self._download_json(
f'https://archive.vine.co/profiles/{user_id}.json', user_id)
entries = [
self.url_result(
f'https://vine.co/v/{post_id}', ie='Vine', video_id=post_id)
for post_id in profile['posts']
if post_id and isinstance(post_id, str)]
return self.playlist_result(
entries, user, profile.get('username'), profile.get('description'))

View file

@ -17,10 +17,10 @@ from ..utils import (
get_element_html_by_id,
int_or_none,
join_nonempty,
parse_qs,
parse_resolution,
str_or_none,
str_to_int,
traverse_obj,
try_call,
unescapeHTML,
unified_timestamp,
@ -29,6 +29,7 @@ from ..utils import (
urlencode_postdata,
urljoin,
)
from ..utils.traversal import require, traverse_obj
class VKBaseIE(InfoExtractor):
@ -91,17 +92,17 @@ class VKBaseIE(InfoExtractor):
class VKIE(VKBaseIE):
IE_NAME = 'vk'
IE_DESC = 'VK'
_EMBED_REGEX = [r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1']
_EMBED_REGEX = [r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk(?:(?:video)?\.ru|\.com)/video_ext\.php.+?)\1']
_VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:m|new)\.)?vk\.com/video_|
(?:(?:m|new)\.)?vk(?:(?:video)?\.ru|\.com)/video_|
(?:www\.)?daxab\.com/
)
ext\.php\?(?P<embed_query>.*?\boid=(?P<oid>-?\d+).*?\bid=(?P<id>\d+).*)|
(?:
(?:(?:m|new)\.)?vk\.com/(?:.+?\?.*?z=)?(?:video|clip)|
(?:(?:m|new)\.)?vk(?:(?:video)?\.ru|\.com)/(?:.+?\?.*?z=)?(?:video|clip)|
(?:www\.)?daxab\.com/embed/
)
(?P<videoid>-?\d+_\d+)(?:.*\blist=(?P<list_id>([\da-f]+)|(ln-[\da-zA-Z]+)))?
@ -110,7 +111,7 @@ class VKIE(VKBaseIE):
_TESTS = [
{
'url': 'http://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
'url': 'https://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
'info_dict': {
'id': '-77521_162222515',
'ext': 'mp4',
@ -127,7 +128,7 @@ class VKIE(VKBaseIE):
'params': {'skip_download': 'm3u8'},
},
{
'url': 'http://vk.com/video205387401_165548505',
'url': 'https://vk.com/video205387401_165548505',
'info_dict': {
'id': '205387401_165548505',
'ext': 'mp4',
@ -182,10 +183,10 @@ class VKIE(VKBaseIE):
'ext': 'mp4',
'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate",
'description': 'md5:bf9c26cfa4acdfb146362682edd3827a',
'duration': 178,
'duration': 179,
'upload_date': '20130117',
'uploader': "Children's Joy Foundation Inc.",
'uploader_id': 'thecjf',
'uploader_id': '@CJFIofficial',
'view_count': int,
'channel_id': 'UCgzCNQ11TmR9V97ECnhi3gw',
'availability': 'public',
@ -193,7 +194,7 @@ class VKIE(VKBaseIE):
'live_status': 'not_live',
'playable_in_embed': True,
'channel': 'Children\'s Joy Foundation Inc.',
'uploader_url': 'http://www.youtube.com/user/thecjf',
'uploader_url': 'https://www.youtube.com/@CJFIofficial',
'thumbnail': r're:https?://.+\.jpg$',
'tags': 'count:27',
'start_time': 0.0,
@ -201,6 +202,7 @@ class VKIE(VKBaseIE):
'channel_url': 'https://www.youtube.com/channel/UCgzCNQ11TmR9V97ECnhi3gw',
'channel_follower_count': int,
'age_limit': 0,
'timestamp': 1358394935,
},
},
{
@ -222,6 +224,7 @@ class VKIE(VKBaseIE):
'thumbnail': r're:https?://.+x1080$',
'tags': list,
},
'skip': 'This video has been deleted and is no longer available.',
},
{
'url': 'https://vk.com/clips-74006511?z=clip-74006511_456247211',
@ -235,13 +238,13 @@ class VKIE(VKBaseIE):
'timestamp': 1664995597,
'title': 'Clip by @madempress',
'upload_date': '20221005',
'uploader': 'Шальная императрица',
'uploader': 'Шальная Императрица',
'uploader_id': '-74006511',
},
},
{
# video key is extra_data not url\d+
'url': 'http://vk.com/video-110305615_171782105',
'url': 'https://vk.com/video-110305615_171782105',
'md5': 'e13fcda136f99764872e739d13fac1d1',
'info_dict': {
'id': '-110305615_171782105',
@ -273,6 +276,7 @@ class VKIE(VKBaseIE):
'params': {
'skip_download': True,
},
'skip': 'No formats found',
},
{
# live stream, hls and rtmp links, most likely already finished live
@ -312,7 +316,16 @@ class VKIE(VKBaseIE):
{
'url': 'https://vk.com/clip30014565_456240946',
'only_matching': True,
}]
},
{
'url': 'https://vkvideo.ru/video-127553155_456242961',
'only_matching': True,
},
{
'url': 'https://vk.ru/video-220754053_456242564',
'only_matching': True,
},
]
def _real_extract(self, url):
mobj = self._match_valid_url(url)
@ -338,7 +351,7 @@ class VKIE(VKBaseIE):
video_id = '{}_{}'.format(mobj.group('oid'), mobj.group('id'))
info_page = self._download_webpage(
'http://vk.com/video_ext.php?' + mobj.group('embed_query'), video_id)
'https://vk.com/video_ext.php?' + mobj.group('embed_query'), video_id)
error_message = self._html_search_regex(
[r'(?s)<!><div[^>]+class="video_layer_message"[^>]*>(.+?)</div>',
@ -432,7 +445,7 @@ class VKIE(VKBaseIE):
if m_opts_url:
opts_url = m_opts_url.group(1)
if opts_url.startswith('//'):
opts_url = 'http:' + opts_url
opts_url = 'https:' + opts_url
return self.url_result(opts_url)
data = player['params'][0]
@ -512,8 +525,11 @@ class VKIE(VKBaseIE):
class VKUserVideosIE(VKBaseIE):
IE_NAME = 'vk:uservideos'
IE_DESC = "VK - User's Videos"
_VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/video/(?:playlist/)?(?P<id>[^?$#/&]+)(?!\?.*\bz=video)(?:[/?#&](?:.*?\bsection=(?P<section>\w+))?|$)'
_TEMPLATE_URL = 'https://vk.com/videos'
_BASE_URL_RE = r'https?://(?:(?:m|new)\.)?vk(?:video\.ru|\.com/video)'
_VALID_URL = [
rf'{_BASE_URL_RE}/playlist/(?P<id>-?\d+_\d+)',
rf'{_BASE_URL_RE}/(?P<id>@[^/?#]+)(?:/all)?/?(?!\?.*\bz=video)(?:[?#]|$)',
]
_TESTS = [{
'url': 'https://vk.com/video/@mobidevices',
'info_dict': {
@ -527,12 +543,20 @@ class VKUserVideosIE(VKBaseIE):
},
'playlist_mincount': 182,
}, {
'url': 'https://vk.com/video/playlist/-174476437_2',
'url': 'https://vkvideo.ru/playlist/-204353299_426',
'info_dict': {
'id': '-174476437_playlist_2',
'title': 'Анонсы',
'id': '-204353299_playlist_426',
},
'playlist_mincount': 108,
'playlist_mincount': 33,
}, {
'url': 'https://vk.com/video/@gorkyfilmstudio/all',
'only_matching': True,
}, {
'url': 'https://vkvideo.ru/@mobidevices',
'only_matching': True,
}, {
'url': 'https://vk.com/video/playlist/-174476437_2',
'only_matching': True,
}]
_VIDEO = collections.namedtuple('Video', ['owner_id', 'id'])
@ -552,7 +576,7 @@ class VKUserVideosIE(VKBaseIE):
v = self._VIDEO._make(video[:2])
video_id = '%d_%d' % (v.owner_id, v.id)
yield self.url_result(
'http://vk.com/video' + video_id, VKIE.ie_key(), video_id)
'https://vk.com/video' + video_id, VKIE.ie_key(), video_id)
if count >= total:
break
video_list_json = self._download_payload('al_video', page_id, {
@ -561,23 +585,25 @@ class VKUserVideosIE(VKBaseIE):
'oid': page_id,
'section': section,
})[0][section]
count += video_list_json['count']
new_count = video_list_json['count']
if not new_count:
self.to_screen(f'{page_id}: Skipping {total - count} unavailable videos')
break
count += new_count
video_list = video_list_json['list']
def _real_extract(self, url):
u_id, section = self._match_valid_url(url).groups()
u_id = self._match_id(url)
webpage = self._download_webpage(url, u_id)
if u_id.startswith('@'):
page_id = self._search_regex(r'data-owner-id\s?=\s?"([^"]+)"', webpage, 'page_id')
elif '_' in u_id:
page_id, section = u_id.split('_', 1)
section = f'playlist_{section}'
page_id = traverse_obj(
self._search_json(r'\bvar newCur\s*=', webpage, 'cursor data', u_id),
('oid', {int}, {str_or_none}, {require('page id')}))
section = traverse_obj(parse_qs(url), ('section', 0)) or 'all'
else:
raise ExtractorError('Invalid URL', expected=True)
if not section:
section = 'all'
page_id, _, section = u_id.partition('_')
section = f'playlist_{section}'
playlist_title = clean_html(get_element_by_class('VideoInfoPanel__title', webpage))
return self.playlist_result(self._entries(page_id, section), f'{page_id}_{section}', playlist_title)
@ -717,7 +743,7 @@ class VKWallPostIE(VKBaseIE):
class VKPlayBaseIE(InfoExtractor):
_BASE_URL_RE = r'https?://(?:vkplay\.live|live\.vkplay\.ru)/'
_BASE_URL_RE = r'https?://(?:vkplay\.live|live\.vk(?:play|video)\.ru)/'
_RESOLUTIONS = {
'tiny': '256x144',
'lowest': '426x240',
@ -797,6 +823,9 @@ class VKPlayIE(VKPlayBaseIE):
}, {
'url': 'https://live.vkplay.ru/lebwa/record/33a4e4ce-e3ef-49db-bb14-f006cc6fabc9/records',
'only_matching': True,
}, {
'url': 'https://live.vkvideo.ru/lebwa/record/33a4e4ce-e3ef-49db-bb14-f006cc6fabc9/records',
'only_matching': True,
}]
def _real_extract(self, url):
@ -839,6 +868,9 @@ class VKPlayLiveIE(VKPlayBaseIE):
}, {
'url': 'https://live.vkplay.ru/lebwa',
'only_matching': True,
}, {
'url': 'https://live.vkvideo.ru/panterka',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -124,7 +124,7 @@ class WeiboBaseIE(InfoExtractor):
class WeiboIE(WeiboBaseIE):
_VALID_URL = r'https?://(?:m\.weibo\.cn/status|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:m\.weibo\.cn/(?:status|detail)|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://weibo.com/7827771738/N4xlMvjhI',
'info_dict': {
@ -164,6 +164,25 @@ class WeiboIE(WeiboBaseIE):
'like_count': int,
'repost_count': int,
},
}, {
'url': 'https://m.weibo.cn/detail/4189191225395228',
'info_dict': {
'id': '4189191225395228',
'ext': 'mp4',
'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频',
'description': '午睡当然是要甜甜蜜蜜的啦![坏笑] Instagramshibainu.gaku http://t.cn/RHbmjzW ',
'duration': 53,
'timestamp': 1514264429,
'upload_date': '20171226',
'thumbnail': r're:https://.*\.jpg',
'uploader': '柴犬柴犬',
'uploader_id': '5926682210',
'uploader_url': 'https://weibo.com/u/5926682210',
'view_count': int,
'like_count': int,
'repost_count': int,
},
}, {
'url': 'https://weibo.com/0/4224132150961381',
'note': 'no playback_list example',

View file

@ -20,7 +20,7 @@ from ..utils import (
class XHamsterIE(InfoExtractor):
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com|xhday\.com|xhvid\.com)'
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.(?:com|desi)|xhday\.com|xhvid\.com)'
_VALID_URL = rf'''(?x)
https?://
(?:[^/?#]+\.)?{_DOMAINS}/
@ -31,7 +31,7 @@ class XHamsterIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'md5': '34e1ab926db5dc2750fed9e1f34304bb',
'md5': 'e009ea6b849b129e3bebaeb9cf0dee51',
'info_dict': {
'id': '1509445',
'display_id': 'femaleagent-shy-beauty-takes-the-bait',
@ -43,6 +43,11 @@ class XHamsterIE(InfoExtractor):
'uploader_id': 'ruseful2011',
'duration': 893,
'age_limit': 18,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/u3Vr5F2vvcU3yK59_jJqVA/001/509/445/1280x720.8.jpg',
'uploader_url': 'https://xhamster.com/users/ruseful2011',
'description': '',
'view_count': int,
'comment_count': int,
},
}, {
'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=',
@ -56,6 +61,10 @@ class XHamsterIE(InfoExtractor):
'uploader': 'jojo747400',
'duration': 200,
'age_limit': 18,
'description': '',
'view_count': int,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/kk5nio_iR-h4Z3frfVtoDw/002/221/348/1280x720.4.jpg',
'comment_count': int,
},
'params': {
'skip_download': True,
@ -73,6 +82,11 @@ class XHamsterIE(InfoExtractor):
'uploader_id': 'parejafree',
'duration': 72,
'age_limit': 18,
'comment_count': int,
'uploader_url': 'https://xhamster.com/users/parejafree',
'description': '',
'view_count': int,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/xc8MSwVKcsQeRRiTT-saMQ/005/667/973/1280x720.2.jpg',
},
'params': {
'skip_download': True,
@ -122,6 +136,9 @@ class XHamsterIE(InfoExtractor):
}, {
'url': 'https://xhvid.com/videos/lk-mm-xhc6wn6',
'only_matching': True,
}, {
'url': 'https://xhamster20.desi/videos/my-verification-video-scottishmistress23-11937369',
'only_matching': True,
}]
def _real_extract(self, url):
@ -267,7 +284,7 @@ class XHamsterIE(InfoExtractor):
video, lambda x: x['rating']['likes'], int)),
'dislike_count': int_or_none(try_get(
video, lambda x: x['rating']['dislikes'], int)),
'comment_count': int_or_none(video.get('views')),
'comment_count': int_or_none(video.get('comments')),
'age_limit': age_limit if age_limit is not None else 18,
'categories': categories,
'formats': formats,

View file

@ -5,12 +5,13 @@ from ..utils import (
int_or_none,
js_to_json,
url_or_none,
urlhandle_detect_ext,
)
from ..utils.traversal import traverse_obj
class XiaoHongShuIE(InfoExtractor):
_VALID_URL = r'https?://www\.xiaohongshu\.com/explore/(?P<id>[\da-f]+)'
_VALID_URL = r'https?://www\.xiaohongshu\.com/(?:explore|discovery/item)/(?P<id>[\da-f]+)'
IE_DESC = '小红书'
_TESTS = [{
'url': 'https://www.xiaohongshu.com/explore/6411cf99000000001300b6d9',
@ -25,6 +26,18 @@ class XiaoHongShuIE(InfoExtractor):
'duration': 101.726,
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[a-z0-9]+/[\w]+',
},
}, {
'url': 'https://www.xiaohongshu.com/discovery/item/674051740000000007027a15?xsec_token=CBgeL8Dxd1ZWBhwqRd568gAZ_iwG-9JIf9tnApNmteU2E=',
'info_dict': {
'id': '674051740000000007027a15',
'ext': 'mp4',
'title': '相互喜欢就可以了',
'uploader_id': '63439913000000001901f49a',
'duration': 28.073,
'description': '#广州[话题]# #深圳[话题]# #香港[话题]# #街头采访[话题]# #是你喜欢的类型[话题]#',
'thumbnail': r're:https?://sns-webpic-qc\.xhscdn\.com/\d+/[\da-f]+/[^/]+',
'tags': ['广州', '深圳', '香港', '街头采访', '是你喜欢的类型'],
},
}]
def _real_extract(self, url):
@ -34,7 +47,7 @@ class XiaoHongShuIE(InfoExtractor):
r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', display_id, transform_source=js_to_json)
note_info = traverse_obj(initial_state, ('note', 'noteDetailMap', display_id, 'note'))
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ('h264', 'av1', 'h265'), ...))
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ..., ...))
formats = []
for info in video_info:
@ -44,18 +57,32 @@ class XiaoHongShuIE(InfoExtractor):
'height': ('height', {int_or_none}),
'vcodec': ('videoCodec', {str}),
'acodec': ('audioCodec', {str}),
'abr': ('audioBitrate', {int_or_none}),
'vbr': ('videoBitrate', {int_or_none}),
'abr': ('audioBitrate', {int_or_none(scale=1000)}),
'vbr': ('videoBitrate', {int_or_none(scale=1000)}),
'audio_channels': ('audioChannels', {int_or_none}),
'tbr': ('avgBitrate', {int_or_none}),
'tbr': ('avgBitrate', {int_or_none(scale=1000)}),
'format': ('qualityType', {str}),
'filesize': ('size', {int_or_none}),
'duration': ('duration', {float_or_none(scale=1000)}),
})
formats.extend(traverse_obj(info, (('mediaUrl', ('backupUrls', ...)), {
formats.extend(traverse_obj(info, (('masterUrl', ('backupUrls', ...)), {
lambda u: url_or_none(u) and {'url': u, **format_info}})))
if origin_key := traverse_obj(note_info, ('video', 'consumer', 'originVideoKey', {str})):
# Not using a head request because of false negatives
urlh = self._request_webpage(
f'https://sns-video-bd.xhscdn.com/{origin_key}', display_id,
'Checking original video availability', 'Original video is not available', fatal=False)
if urlh:
formats.append({
'format_id': 'direct',
'ext': urlhandle_detect_ext(urlh, default='mp4'),
'filesize': int_or_none(urlh.get_header('Content-Length')),
'url': urlh.url,
'quality': 1,
})
thumbnails = []
for image_info in traverse_obj(note_info, ('imageList', ...)):
thumbnail_info = traverse_obj(image_info, {

File diff suppressed because it is too large Load diff

View file

@ -5,7 +5,6 @@ from ..utils import (
NO_DEFAULT,
ExtractorError,
determine_ext,
extract_attributes,
float_or_none,
int_or_none,
join_nonempty,
@ -25,6 +24,11 @@ class ZDFBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd', 'fhd', 'uhd')
def _download_v2_doc(self, document_id):
return self._download_json(
f'https://zdf-prod-futura.zdf.de/mediathekV2/document/{document_id}',
document_id)
def _call_api(self, url, video_id, item, api_token=None, referrer=None):
headers = {}
if api_token:
@ -320,9 +324,7 @@ class ZDFIE(ZDFBaseIE):
return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id):
video = self._download_json(
f'https://zdf-cdn.live.cellular.de/mediathekV2/document/{video_id}',
video_id)
video = self._download_v2_doc(video_id)
formats = []
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
@ -374,7 +376,7 @@ class ZDFIE(ZDFBaseIE):
class ZDFChannelIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/?#]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
'info_dict': {
@ -387,18 +389,19 @@ class ZDFChannelIE(ZDFBaseIE):
'info_dict': {
'id': 'planet-e',
'title': 'planet e.',
'description': 'md5:87e3b9c66a63cf1407ee443d2c4eb88e',
},
'playlist_mincount': 50,
}, {
'url': 'https://www.zdf.de/gesellschaft/aktenzeichen-xy-ungeloest',
'info_dict': {
'id': 'aktenzeichen-xy-ungeloest',
'title': 'Aktenzeichen XY... ungelöst',
'entries': "lambda x: not any('xy580-fall1-kindermoerder-gesucht-100' in e['url'] for e in x)",
'title': 'Aktenzeichen XY... Ungelöst',
'description': 'md5:623ede5819c400c6d04943fa8100e6e7',
},
'playlist_mincount': 2,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/',
'url': 'https://www.zdf.de/serien/taunuskrimi/',
'only_matching': True,
}]
@ -406,36 +409,41 @@ class ZDFChannelIE(ZDFBaseIE):
def suitable(cls, url):
return False if ZDFIE.suitable(url) else super().suitable(url)
def _og_search_title(self, webpage, fatal=False):
title = super()._og_search_title(webpage, fatal=fatal)
return re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', title or '')[0] or None
def _extract_entry(self, entry):
return self.url_result(
entry['sharingUrl'], ZDFIE, **traverse_obj(entry, {
'id': ('basename', {str}),
'title': ('titel', {str}),
'description': ('beschreibung', {str}),
'duration': ('length', {float_or_none}),
# TODO: seasonNumber and episodeNumber can be extracted but need to also be in ZDFIE
}))
def _entries(self, data, document_id):
for entry in traverse_obj(data, (
'cluster', lambda _, v: v['type'] == 'teaser',
# If 'brandId' differs, it is a 'You might also like' video. Filter these out
'teaser', lambda _, v: v['type'] == 'video' and v['brandId'] == document_id and v['sharingUrl'],
)):
yield self._extract_entry(entry)
def _real_extract(self, url):
channel_id = self._match_id(url)
webpage = self._download_webpage(url, channel_id)
document_id = self._search_regex(
r'docId\s*:\s*(["\'])(?P<doc_id>(?:(?!\1).)+)\1', webpage, 'document id', group='doc_id')
data = self._download_v2_doc(document_id)
matches = re.finditer(
rf'''<div\b[^>]*?\sdata-plusbar-id\s*=\s*(["'])(?P<p_id>[\w-]+)\1[^>]*?\sdata-plusbar-url=\1(?P<url>{ZDFIE._VALID_URL})\1''',
webpage)
main_video = traverse_obj(data, (
'cluster', lambda _, v: v['type'] == 'teaserContent',
'teaser', lambda _, v: v['type'] == 'video' and v['basename'] and v['sharingUrl'], any)) or {}
if self._downloader.params.get('noplaylist', False):
entry = next(
(self.url_result(m.group('url'), ie=ZDFIE.ie_key()) for m in matches),
None)
self.to_screen('Downloading just the main video because of --no-playlist')
if entry:
return entry
else:
self.to_screen(f'Downloading playlist {channel_id} - add --no-playlist to download just the main video')
if not self._yes_playlist(channel_id, main_video.get('basename')):
return self._extract_entry(main_video)
def check_video(m):
v_ref = self._search_regex(
r'''(<a\b[^>]*?\shref\s*=[^>]+?\sdata-target-id\s*=\s*(["']){}\2[^>]*>)'''.format(m.group('p_id')),
webpage, 'check id', default='')
v_ref = extract_attributes(v_ref)
return v_ref.get('data-target-video-type') != 'novideo'
return self.playlist_from_matches(
(m.group('url') for m in matches if check_video(m)),
channel_id, self._og_search_title(webpage, fatal=False))
return self.playlist_result(
self._entries(data, document_id), channel_id,
re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', self._og_search_title(webpage) or '')[0] or None,
join_nonempty(
'headline', 'text', delim='\n\n',
from_dict=traverse_obj(data, ('shortText', {dict}), default={})) or None)

View file

@ -1370,12 +1370,12 @@ def create_parser():
help='Allow Unicode characters, "&" and spaces in filenames (default)')
filesystem.add_option(
'--windows-filenames',
action='store_true', dest='windowsfilenames', default=False,
action='store_true', dest='windowsfilenames', default=None,
help='Force filenames to be Windows-compatible')
filesystem.add_option(
'--no-windows-filenames',
action='store_false', dest='windowsfilenames',
help='Make filenames Windows-compatible only if using Windows (default)')
help='Sanitize filenames only minimally')
filesystem.add_option(
'--trim-filenames', '--trim-file-names', metavar='LENGTH',
dest='trim_file_name', default=0, type=int,

View file

@ -183,4 +183,4 @@ def load_plugins(name, suffix):
sys.meta_path.insert(0, PluginFinder(f'{PACKAGE_NAME}.extractor', f'{PACKAGE_NAME}.postprocessor'))
__all__ = ['directories', 'load_plugins', 'PACKAGE_NAME', 'COMPAT_PACKAGE_NAME']
__all__ = ['COMPAT_PACKAGE_NAME', 'PACKAGE_NAME', 'directories', 'load_plugins']

View file

@ -44,4 +44,4 @@ def get_postprocessor(key):
globals().update(_PLUGIN_CLASSES)
__all__ = [name for name in globals() if name.endswith('PP')]
__all__.extend(('PostProcessor', 'FFmpegPostProcessor'))
__all__.extend(('FFmpegPostProcessor', 'PostProcessor'))

View file

@ -626,7 +626,7 @@ class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
sub_ext = sub_info['ext']
if sub_ext == 'json':
self.report_warning('JSON subtitles cannot be embedded')
elif ext != 'webm' or ext == 'webm' and sub_ext == 'vtt':
elif ext != 'webm' or (ext == 'webm' and sub_ext == 'vtt'):
sub_langs.append(lang)
sub_names.append(sub_info.get('name'))
sub_filenames.append(sub_info['filepath'])

View file

@ -65,9 +65,14 @@ def _get_variant_and_executable_path():
machine = '_legacy' if version_tuple(platform.mac_ver()[0]) < (10, 15) else ''
else:
machine = f'_{platform.machine().lower()}'
is_64bits = sys.maxsize > 2**32
# Ref: https://en.wikipedia.org/wiki/Uname#Examples
if machine[1:] in ('x86', 'x86_64', 'amd64', 'i386', 'i686'):
machine = '_x86' if platform.architecture()[0][:2] == '32' else ''
machine = '_x86' if not is_64bits else ''
# platform.machine() on 32-bit raspbian OS may return 'aarch64', so check "64-bitness"
# See: https://github.com/yt-dlp/yt-dlp/issues/11813
elif machine[1:] == 'aarch64' and not is_64bits:
machine = '_armv7l'
# sys.executable returns a /tmp/ path for staticx builds (linux_static)
# Ref: https://staticx.readthedocs.io/en/latest/usage.html#run-time-information
if static_exe_path := os.getenv('STATICX_PROG_PATH'):
@ -525,11 +530,16 @@ class Updater:
@functools.cached_property
def cmd(self):
"""The command-line to run the executable, if known"""
argv = None
# There is no sys.orig_argv in py < 3.10. Also, it can be [] when frozen
if getattr(sys, 'orig_argv', None):
return sys.orig_argv
argv = sys.orig_argv
elif getattr(sys, 'frozen', False):
return sys.argv
argv = sys.argv
# linux_static exe's argv[0] will be /tmp/staticx-NNNN/yt-dlp_linux if we don't fixup here
if argv and os.getenv('STATICX_PROG_PATH'):
argv = [self.filename, *argv[1:]]
return argv
def restart(self):
"""Restart the executable"""

View file

@ -685,7 +685,8 @@ def _sanitize_path_parts(parts):
elif part == '..':
if sanitized_parts and sanitized_parts[-1] != '..':
sanitized_parts.pop()
sanitized_parts.append('..')
else:
sanitized_parts.append('..')
continue
# Replace invalid segments with `#`
# - trailing dots and spaces (`asdf...` => `asdf..#`)
@ -702,7 +703,8 @@ def sanitize_path(s, force=False):
if not force:
return s
root = '/' if s.startswith('/') else ''
return root + '/'.join(_sanitize_path_parts(s.split('/')))
path = '/'.join(_sanitize_path_parts(s.split('/')))
return root + path if root or path else '.'
normed = s.replace('/', '\\')
@ -721,7 +723,8 @@ def sanitize_path(s, force=False):
root = '\\' if normed[:1] == '\\' else ''
parts = normed.split('\\')
return root + '\\'.join(_sanitize_path_parts(parts))
path = '\\'.join(_sanitize_path_parts(parts))
return root + path if root or path else '.'
def sanitize_url(url, *, scheme='http'):
@ -2683,8 +2686,8 @@ def merge_dicts(*dicts):
merged = {}
for a_dict in dicts:
for k, v in a_dict.items():
if (v is not None and k not in merged
or isinstance(v, str) and merged[k] == ''):
if ((v is not None and k not in merged)
or (isinstance(v, str) and merged[k] == '')):
merged[k] = v
return merged
@ -5330,7 +5333,7 @@ class FormatSorter:
settings = {
'vcodec': {'type': 'ordered', 'regex': True,
'order': ['av0?1', 'vp0?9.0?2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']},
'order': ['av0?1', r'vp0?9\.0?2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']},
'acodec': {'type': 'ordered', 'regex': True,
'order': ['[af]lac', 'wav|aiff', 'opus', 'vorbis|ogg', 'aac', 'mp?4a?', 'mp3', 'ac-?4', 'e-?a?c-?3', 'ac-?3', 'dts', '', None, 'none']},
'hdr': {'type': 'ordered', 'regex': True, 'field': 'dynamic_range',

View file

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2024.11.18'
__version__ = '2025.01.26'
RELEASE_GIT_HEAD = '7ea2787920cccc6b8ea30791993d114fbd564434'
RELEASE_GIT_HEAD = '3b4531934465580be22937fecbb6e1a3a9e2334f'
VARIANT = None
@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2024.11.18'
_pkg_version = '2025.01.26'