1
0
mirror of https://github.com/yt-dlp/yt-dlp synced 2025-12-19 15:45:39 +07:00

Compare commits

...

26 Commits

Author SHA1 Message Date
github-actions[bot]
abe10131fc Release 2024.08.01
Created by: bashonly

:ci skip all :ci run dl
2024-08-01 15:11:19 +00:00
bashonly
ffd7781d65 [cleanup] Misc (#10623)
Authored by: bashonly
2024-08-01 15:03:49 +00:00
sepro
efb42763de [ie/youtube] Change default player clients to ios,tv (#10457)
Closes #10046
Authored by: seproDev
2024-08-01 14:03:03 +00:00
Oğulcan Tokar
bb3936ae2b [ie/kick:clips] Add extractor (#10572)
Closes #8115
Authored by: luvyana
2024-08-01 00:00:52 +00:00
bashonly
d19fcb9342 [ie/youtube] Fix age-verification workaround (#10610)
Authored by: bashonly, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
2024-07-31 21:39:36 +00:00
bashonly
011b4a04db [ie/youtube] Fix n function name extraction for player 20dfca59 (#10611)
Closes #10608
Authored by: bashonly
2024-07-31 21:19:30 +00:00
szantnerb
7e3e4779ad [ie/mediaklikk] Fix extractor (#10605)
Closes #10588
Authored by: szantnerb
2024-07-31 02:22:44 +00:00
vvto33
5260696b1c [ie/tver] Support olympic URLs (#10600)
Closes #10583
Authored by: vvto33
2024-07-31 02:18:43 +00:00
bashonly
2f1ddfe12a [ie/olympics] Fix extractor (#10604)
Closes #10592
Authored by: bashonly
2024-07-31 01:50:20 +00:00
bashonly
4b69e1b53e [ie/mlbtv] Fix makeup game extraction (#10607)
Closes #10606
Authored by: bashonly
2024-07-30 23:17:05 +00:00
bashonly
0e539617a4 [ie/youtube] Player client maintenance (#10573)
- Add clients: android_producer, android_testsuite, android_vr, tv, web_safari
- Remove obsolete clients: android_embedded, ios_embedded, *_embedscreen

Authored by: bashonly
2024-07-30 21:27:06 +00:00
bashonly
fe15d3178e [ie/learningonscreen] Add extractor (#10590)
Authored by: bashonly, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
2024-07-30 09:09:55 +00:00
trainman261
94a1c5e642 [ie/cbc.ca:player] Fix extractor (#10302)
Closes #10170
Authored by: trainman261, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2024-07-29 21:58:26 +00:00
bashonly
2b6df93a24 [ie/vimeo:review] Fix password-protected video extraction (#10598)
Closes #10255
Authored by: bashonly
2024-07-29 21:55:06 +00:00
middlingphys
ef36d517f9 [ie/abematv] Fix availability extraction (#10569)
Authored by: middlingphys
2024-07-29 05:54:59 +00:00
bashonly
6daf2c27c0 [utils] unified_timestamp: Recognize Sunday (#10589)
Authored by: bashonly
2024-07-29 05:35:46 +00:00
bashonly
28d485714f [ie/tva] Fix extractor (#10567)
Closes #10555
Authored by: bashonly
2024-07-25 22:30:00 +00:00
bashonly
0b77286184 [ie/DiscoveryPlus] Support olympics URLs (#10566)
Closes #10564
Authored by: bashonly
2024-07-25 22:00:58 +00:00
github-actions[bot]
6b1e430d8e Release 2024.07.25
Created by: bashonly

:ci skip all :ci run dl
2024-07-25 03:29:27 +00:00
bashonly
f0993391e6 [ie/mlbtv] Fix extractor (#10515)
Closes #10510
Authored by: bashonly
2024-07-24 21:22:55 +00:00
bashonly
1a34a802f4 [ie/facebook] Fix extraction (#10531)
Closes #10532
Authored by: bashonly
2024-07-23 23:08:24 +00:00
bashonly
a0a1bc3d8d [ie/vimeo] Fix chapters extraction (#10544)
Closes #5308
Authored by: bashonly
2024-07-23 22:00:57 +00:00
bashonly
2f97779f33 [ie/tiktok] Fix and deprioritize JSON subtitles (#10516)
Fixes regression caused by 5ce582448e

Closes #10514
Authored by: bashonly
2024-07-23 21:49:31 +00:00
bashonly
713b4cd18f [ie/youtube] Fix n function name extraction for player 3400486c (#10542)
Authored by: bashonly
2024-07-23 21:25:49 +00:00
bashonly
a3bab4752a [ie/abematv] Adapt key retrieval to request handler framework (#10491)
Fixes a regression caused by a dependence on buggy behavior that was corrected in 150ecc45d9

Closes #10489
Authored by: bashonly
2024-07-18 20:43:31 +00:00
bashonly
e046db8a11 [build] Pin setuptools version (#10493)
https://github.com/pypa/setuptools/issues/4480#issuecomment-2236507819
https://github.com/pypa/setuptools/issues/4482

Authored by: bashonly
2024-07-18 20:33:28 +00:00
26 changed files with 1157 additions and 416 deletions

View File

@@ -653,3 +653,5 @@ LeSuisse
DunnesH DunnesH
iancmy iancmy
mokrueger mokrueger
luvyana
szantnerb

View File

@@ -4,6 +4,45 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2024.08.01
#### Core changes
- **utils**: `unified_timestamp`: [Recognize Sunday](https://github.com/yt-dlp/yt-dlp/commit/6daf2c27c0464fba98337be30de0b66d520d0db1) ([#10589](https://github.com/yt-dlp/yt-dlp/issues/10589)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- **abematv**: [Fix availability extraction](https://github.com/yt-dlp/yt-dlp/commit/ef36d517f9b05785d61abca7691d9ab7d63cc75c) ([#10569](https://github.com/yt-dlp/yt-dlp/issues/10569)) by [middlingphys](https://github.com/middlingphys)
- **cbc.ca**: player: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/94a1c5e642e468cebeb51f74c6c220434cb47d96) ([#10302](https://github.com/yt-dlp/yt-dlp/issues/10302)) by [bashonly](https://github.com/bashonly), [trainman261](https://github.com/trainman261)
- **discoveryplus**: [Support olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/0b7728618417e1aa382722a4d29b916b594d4459) ([#10566](https://github.com/yt-dlp/yt-dlp/issues/10566)) by [bashonly](https://github.com/bashonly)
- **kick**: clips: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/bb3936ae2b3ce96d0b53f9e17cad1082058f032b) ([#10572](https://github.com/yt-dlp/yt-dlp/issues/10572)) by [luvyana](https://github.com/luvyana)
- **learningonscreen**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/fe15d3178e242803ae7a934b90137f13598eba2e) ([#10590](https://github.com/yt-dlp/yt-dlp/issues/10590)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- **mediaklikk**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7e3e4779ad13e4511c9ba3869879e53f0267bd7a) ([#10605](https://github.com/yt-dlp/yt-dlp/issues/10605)) by [szantnerb](https://github.com/szantnerb)
- **mlbtv**: [Fix makeup game extraction](https://github.com/yt-dlp/yt-dlp/commit/4b69e1b53ea21e631cd5dd68ff531e2f1671ec17) ([#10607](https://github.com/yt-dlp/yt-dlp/issues/10607)) by [bashonly](https://github.com/bashonly)
- **olympics**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/2f1ddfe12a2c174bc777264c5c8ffe7ca0922d94) ([#10604](https://github.com/yt-dlp/yt-dlp/issues/10604)) by [bashonly](https://github.com/bashonly)
- **tva**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/28d485714fef88937c82635438afba5db81f9089) ([#10567](https://github.com/yt-dlp/yt-dlp/issues/10567)) by [bashonly](https://github.com/bashonly)
- **tver**: [Support olympic URLs](https://github.com/yt-dlp/yt-dlp/commit/5260696b1cba77161828941fdb38f09f14ac6c60) ([#10600](https://github.com/yt-dlp/yt-dlp/issues/10600)) by [vvto33](https://github.com/vvto33)
- **vimeo**: review: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/2b6df93a243bdfb9d6bb5c1e18020625cd02d465) ([#10598](https://github.com/yt-dlp/yt-dlp/issues/10598)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Change default player clients to `ios,tv`](https://github.com/yt-dlp/yt-dlp/commit/efb42763dec23ccf6a2e3bac3afbfefce8efd012) ([#10457](https://github.com/yt-dlp/yt-dlp/issues/10457)) by [seproDev](https://github.com/seproDev)
- [Fix `n` function name extraction for player `20dfca59`](https://github.com/yt-dlp/yt-dlp/commit/011b4a04db2a636c3ef0a0ad4e2d3ae482c9fd76) ([#10611](https://github.com/yt-dlp/yt-dlp/issues/10611)) by [bashonly](https://github.com/bashonly)
- [Fix age-verification workaround](https://github.com/yt-dlp/yt-dlp/commit/d19fcb934269465fd707e68a87f735ec6983e93d) ([#10610](https://github.com/yt-dlp/yt-dlp/issues/10610)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/0e539617a41913c7da1edd74fb6543c10ad727b3) ([#10573](https://github.com/yt-dlp/yt-dlp/issues/10573)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **cleanup**: Miscellaneous: [ffd7781](https://github.com/yt-dlp/yt-dlp/commit/ffd7781d6588926f820b44a34b9e6e3068fb9f97) by [bashonly](https://github.com/bashonly)
### 2024.07.25
#### Extractor changes
- **abematv**: [Adapt key retrieval to request handler framework](https://github.com/yt-dlp/yt-dlp/commit/a3bab4752a2b3d56e5a59b4e0411bb8f695c010b) ([#10491](https://github.com/yt-dlp/yt-dlp/issues/10491)) by [bashonly](https://github.com/bashonly)
- **facebook**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/1a34a802f44a1dab8f642c79c3cc810e21541d3b) ([#10531](https://github.com/yt-dlp/yt-dlp/issues/10531)) by [bashonly](https://github.com/bashonly)
- **mlbtv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f0993391e6052ec8f7aacc286609564f226943b9) ([#10515](https://github.com/yt-dlp/yt-dlp/issues/10515)) by [bashonly](https://github.com/bashonly)
- **tiktok**: [Fix and deprioritize JSON subtitles](https://github.com/yt-dlp/yt-dlp/commit/2f97779f335ac069ecccd9c7bf81abf4a83cfe7a) ([#10516](https://github.com/yt-dlp/yt-dlp/issues/10516)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Fix chapters extraction](https://github.com/yt-dlp/yt-dlp/commit/a0a1bc3d8d8e3bb9a48a06e835815a0460e90e77) ([#10544](https://github.com/yt-dlp/yt-dlp/issues/10544)) by [bashonly](https://github.com/bashonly)
- **youtube**: [Fix `n` function name extraction for player `3400486c`](https://github.com/yt-dlp/yt-dlp/commit/713b4cd18f00556771af8cfdd9cea6cc1a09e948) ([#10542](https://github.com/yt-dlp/yt-dlp/issues/10542)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **build**: [Pin `setuptools` version](https://github.com/yt-dlp/yt-dlp/commit/e046db8a116b1c320d4785daadd48ea0b22a3987) ([#10493](https://github.com/yt-dlp/yt-dlp/issues/10493)) by [bashonly](https://github.com/bashonly)
### 2024.07.16 ### 2024.07.16
#### Core changes #### Core changes

View File

@@ -1758,7 +1758,7 @@ # Replace all spaces and "_" in title and uploader with a `-`
# EXTRACTOR ARGUMENTS # EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=android_embedded,web;formats=incomplete" --extractor-args "funimation:version=uncut"` Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=mediaconnect,web;formats=incomplete" --extractor-args "funimation:version=uncut"`
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"` Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
@@ -1767,7 +1767,7 @@ # EXTRACTOR ARGUMENTS
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mediaconnect`, `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. The `android` clients will always be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients. * `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mediaconnect`, `mweb`, `android_producer`, `android_testsuite`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,tv` is used, but `tv_embedded`, `web_creator` and `mediaconnect` are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. Most `android` clients will be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients.
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@@ -1775,7 +1775,7 @@ #### youtube
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total * E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8) * `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others * `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests * `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
#### youtubetab (YouTube playlists, channels, feeds, etc.) #### youtubetab (YouTube playlists, channels, feeds, etc.)

View File

@@ -66,7 +66,7 @@ build = [
"build", "build",
"hatchling", "hatchling",
"pip", "pip",
"setuptools", "setuptools>=71.0.2", # 71.0.0 broke pyinstaller
"wheel", "wheel",
] ]
dev = [ dev = [

View File

@@ -655,10 +655,11 @@ # Supported sites
- **Ketnet** - **Ketnet**
- **khanacademy** - **khanacademy**
- **khanacademy:unit** - **khanacademy:unit**
- **Kick** - **kick:clips**
- **kick:live**
- **kick:vod**
- **Kicker** - **Kicker**
- **KickStarter** - **KickStarter**
- **KickVOD**
- **kinja:embed** - **kinja:embed**
- **KinoPoisk** - **KinoPoisk**
- **Kommunetv** - **Kommunetv**
@@ -690,6 +691,7 @@ # Supported sites
- **Lcp** - **Lcp**
- **LcpPlay** - **LcpPlay**
- **Le**: 乐视网 - **Le**: 乐视网
- **LearningOnScreen**
- **Lecture2Go**: (**Currently broken**) - **Lecture2Go**: (**Currently broken**)
- **Lecturio**: [*lecturio*](## "netrc machine") - **Lecturio**: [*lecturio*](## "netrc machine")
- **LecturioCourse**: [*lecturio*](## "netrc machine") - **LecturioCourse**: [*lecturio*](## "netrc machine")
@@ -1140,7 +1142,6 @@ # Supported sites
- **QuantumTV**: [*quantumtv*](## "netrc machine") - **QuantumTV**: [*quantumtv*](## "netrc machine")
- **QuantumTVLive**: [*quantumtv*](## "netrc machine") - **QuantumTVLive**: [*quantumtv*](## "netrc machine")
- **QuantumTVRecordings**: [*quantumtv*](## "netrc machine") - **QuantumTVRecordings**: [*quantumtv*](## "netrc machine")
- **Qub**
- **R7**: (**Currently broken**) - **R7**: (**Currently broken**)
- **R7Article**: (**Currently broken**) - **R7Article**: (**Currently broken**)
- **Radiko** - **Radiko**
@@ -1517,9 +1518,9 @@ # Supported sites
- **tv5unis** - **tv5unis**
- **tv5unis:video** - **tv5unis:video**
- **tv8.it** - **tv8.it**
- **TVA**
- **TVANouvelles** - **TVANouvelles**
- **TVANouvellesArticle** - **TVANouvellesArticle**
- **tvaplus**: TVA+
- **TVC** - **TVC**
- **TVCArticle** - **TVCArticle**
- **TVer** - **TVer**

View File

@@ -444,6 +444,8 @@ def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540) self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140) self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363) self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('Sunday, 26 Nov 2006, 19:00'), 1164567600)
self.assertEqual(unified_timestamp('wed, aug 16, 2008, 12:00pm'), 1218931200)
self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1) self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86) self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)

View File

@@ -171,6 +171,14 @@
'https://www.youtube.com/s/player/b22ef6e7/player_ias.vflset/en_US/base.js', 'https://www.youtube.com/s/player/b22ef6e7/player_ias.vflset/en_US/base.js',
'b6HcntHGkvBLk_FRf', 'kNPW6A7FyP2l8A', 'b6HcntHGkvBLk_FRf', 'kNPW6A7FyP2l8A',
), ),
(
'https://www.youtube.com/s/player/3400486c/player_ias.vflset/en_US/base.js',
'lL46g3XifCKUZn1Xfw', 'z767lhet6V2Skl',
),
(
'https://www.youtube.com/s/player/20dfca59/player_ias.vflset/en_US/base.js',
'-fLCxedkAk4LUTK2', 'O8kfRq1y1eyHGw',
),
] ]

View File

@@ -939,6 +939,7 @@
KhanAcademyUnitIE, KhanAcademyUnitIE,
) )
from .kick import ( from .kick import (
KickClipIE,
KickIE, KickIE,
KickVODIE, KickVODIE,
) )
@@ -986,6 +987,7 @@
LcpIE, LcpIE,
LcpPlayIE, LcpPlayIE,
) )
from .learningonscreen import LearningOnScreenIE
from .lecture2go import Lecture2GoIE from .lecture2go import Lecture2GoIE
from .lecturio import ( from .lecturio import (
LecturioCourseIE, LecturioCourseIE,
@@ -2169,10 +2171,7 @@
TV5UnisVideoIE, TV5UnisVideoIE,
) )
from .tv24ua import TV24UAVideoIE from .tv24ua import TV24UAVideoIE
from .tva import ( from .tva import TVAIE
TVAIE,
QubIE,
)
from .tvanouvelles import ( from .tvanouvelles import (
TVANouvellesArticleIE, TVANouvellesArticleIE,
TVANouvellesIE, TVANouvellesIE,

View File

@@ -9,12 +9,12 @@
import struct import struct
import time import time
import urllib.parse import urllib.parse
import urllib.request
import urllib.response
import uuid import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_ecb_decrypt from ..aes import aes_ecb_decrypt
from ..networking import RequestHandler, Response
from ..networking.exceptions import TransportError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
@@ -26,37 +26,36 @@
traverse_obj, traverse_obj,
update_url_query, update_url_query,
) )
from ..utils.networking import clean_proxies
def add_opener(ydl, handler): # FIXME: Create proper API in .networking class AbemaLicenseRH(RequestHandler):
"""Add a handler for opening URLs, like _download_webpage""" _SUPPORTED_URL_SCHEMES = ('abematv-license',)
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426 _SUPPORTED_PROXY_SCHEMES = None
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605 _SUPPORTED_FEATURES = None
rh = ydl._request_director.handlers['Urllib'] RH_NAME = 'abematv_license'
if 'abematv-license' in rh._SUPPORTED_URL_SCHEMES:
return
headers = ydl.params['http_headers'].copy()
proxies = ydl.proxies.copy()
clean_proxies(proxies, headers)
opener = rh._get_instance(cookiejar=ydl.cookiejar, proxies=proxies)
assert isinstance(opener, urllib.request.OpenerDirector)
opener.add_handler(handler)
rh._SUPPORTED_URL_SCHEMES = (*rh._SUPPORTED_URL_SCHEMES, 'abematv-license')
_STRTABLE = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
_HKEY = b'3AF0298C219469522A313570E8583005A642E73EDD58E3EA2FB7339D3DF1597E'
class AbemaLicenseHandler(urllib.request.BaseHandler): def __init__(self, *, ie: 'AbemaTVIE', **kwargs):
handler_order = 499 super().__init__(**kwargs)
STRTABLE = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
HKEY = b'3AF0298C219469522A313570E8583005A642E73EDD58E3EA2FB7339D3DF1597E'
def __init__(self, ie: 'AbemaTVIE'):
# the protocol that this should really handle is 'abematv-license://'
# abematv_license_open is just a placeholder for development purposes
# ref. https://github.com/python/cpython/blob/f4c03484da59049eb62a9bf7777b963e2267d187/Lib/urllib/request.py#L510
setattr(self, 'abematv-license_open', getattr(self, 'abematv_license_open', None))
self.ie = ie self.ie = ie
def _send(self, request):
url = request.url
ticket = urllib.parse.urlparse(url).netloc
try:
response_data = self._get_videokey_from_ticket(ticket)
except ExtractorError as e:
raise TransportError(cause=e.cause) from e
except (IndexError, KeyError, TypeError) as e:
raise TransportError(cause=repr(e)) from e
return Response(
io.BytesIO(response_data), url,
headers={'Content-Length': str(len(response_data))})
def _get_videokey_from_ticket(self, ticket): def _get_videokey_from_ticket(self, ticket):
to_show = self.ie.get_param('verbose', False) to_show = self.ie.get_param('verbose', False)
media_token = self.ie._get_media_token(to_show=to_show) media_token = self.ie._get_media_token(to_show=to_show)
@@ -72,25 +71,17 @@ def _get_videokey_from_ticket(self, ticket):
'Content-Type': 'application/json', 'Content-Type': 'application/json',
}) })
res = decode_base_n(license_response['k'], table=self.STRTABLE) res = decode_base_n(license_response['k'], table=self._STRTABLE)
encvideokey = bytes_to_intlist(struct.pack('>QQ', res >> 64, res & 0xffffffffffffffff)) encvideokey = bytes_to_intlist(struct.pack('>QQ', res >> 64, res & 0xffffffffffffffff))
h = hmac.new( h = hmac.new(
binascii.unhexlify(self.HKEY), binascii.unhexlify(self._HKEY),
(license_response['cid'] + self.ie._DEVICE_ID).encode(), (license_response['cid'] + self.ie._DEVICE_ID).encode(),
digestmod=hashlib.sha256) digestmod=hashlib.sha256)
enckey = bytes_to_intlist(h.digest()) enckey = bytes_to_intlist(h.digest())
return intlist_to_bytes(aes_ecb_decrypt(encvideokey, enckey)) return intlist_to_bytes(aes_ecb_decrypt(encvideokey, enckey))
def abematv_license_open(self, url):
url = url.get_full_url() if isinstance(url, urllib.request.Request) else url
ticket = urllib.parse.urlparse(url).netloc
response_data = self._get_videokey_from_ticket(ticket)
return urllib.response.addinfourl(io.BytesIO(response_data), headers={
'Content-Length': str(len(response_data)),
}, url=url, code=200)
class AbemaTVBaseIE(InfoExtractor): class AbemaTVBaseIE(InfoExtractor):
_NETRC_MACHINE = 'abematv' _NETRC_MACHINE = 'abematv'
@@ -139,7 +130,7 @@ def _get_device_token(self):
if self._USERTOKEN: if self._USERTOKEN:
return self._USERTOKEN return self._USERTOKEN
add_opener(self._downloader, AbemaLicenseHandler(self)) self._downloader._request_director.add_handler(AbemaLicenseRH(ie=self, logger=None))
username, _ = self._get_login_info() username, _ = self._get_login_info()
auth_cache = username and self.cache.load(self._NETRC_MACHINE, username, min_ver='2024.01.19') auth_cache = username and self.cache.load(self._NETRC_MACHINE, username, min_ver='2024.01.19')
@@ -386,8 +377,7 @@ def _real_extract(self, url):
f'https://api.abema.io/v1/video/programs/{video_id}', video_id, f'https://api.abema.io/v1/video/programs/{video_id}', video_id,
note='Checking playability', note='Checking playability',
headers=headers) headers=headers)
ondemand_types = traverse_obj(api_response, ('terms', ..., 'onDemandType')) if not traverse_obj(api_response, ('label', 'free', {bool})):
if 3 not in ondemand_types:
# cannot acquire decryption key for these streams # cannot acquire decryption key for these streams
self.report_warning('This is a premium-only stream') self.report_warning('This is a premium-only stream')
availability = 'premium_only' availability = 'premium_only'

View File

@@ -1,4 +1,5 @@
import base64 import base64
import functools
import json import json
import re import re
import time import time
@@ -6,17 +7,24 @@
import xml.etree.ElementTree import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
float_or_none,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
js_to_json, js_to_json,
mimetype2ext,
orderedSet, orderedSet,
parse_iso8601, parse_iso8601,
replace_extension,
smuggle_url, smuggle_url,
strip_or_none, strip_or_none,
traverse_obj, traverse_obj,
try_get, try_get,
update_url,
url_basename,
url_or_none,
) )
@@ -149,6 +157,7 @@ def _real_extract(self, url):
class CBCPlayerIE(InfoExtractor): class CBCPlayerIE(InfoExtractor):
IE_NAME = 'cbc.ca:player' IE_NAME = 'cbc.ca:player'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/(?:video/)?|i/caffeine/syndicate/\?mediaId=))(?P<id>(?:\d\.)?\d+)' _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/(?:video/)?|i/caffeine/syndicate/\?mediaId=))(?P<id>(?:\d\.)?\d+)'
_GEO_COUNTRIES = ['CA']
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193', 'url': 'http://www.cbc.ca/player/play/2683190193',
'md5': '64d25f841ddf4ddb28a235338af32e2c', 'md5': '64d25f841ddf4ddb28a235338af32e2c',
@@ -172,21 +181,20 @@ class CBCPlayerIE(InfoExtractor):
'description': 'md5:dd3b692f0a139b0369943150bd1c46a9', 'description': 'md5:dd3b692f0a139b0369943150bd1c46a9',
'timestamp': 1425704400, 'timestamp': 1425704400,
'upload_date': '20150307', 'upload_date': '20150307',
'uploader': 'CBCC-NEW', 'thumbnail': 'https://i.cbc.ca/ais/1.2985700,1717262248558/full/max/0/default.jpg',
'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg',
'chapters': [], 'chapters': [],
'duration': 494.811, 'duration': 494.811,
'categories': ['AudioMobile/All in a Weekend Montreal'], 'categories': ['All in a Weekend Montreal'],
'tags': 'count:8', 'tags': 'count:11',
'location': 'Quebec', 'location': 'Quebec',
'series': 'All in a Weekend Montreal', 'series': 'All in a Weekend Montreal',
'season': 'Season 2015', 'season': 'Season 2015',
'season_number': 2015, 'season_number': 2015,
'media_type': 'Excerpt', 'media_type': 'Excerpt',
'genres': ['Other'],
}, },
}, { }, {
'url': 'http://www.cbc.ca/i/caffeine/syndicate/?mediaId=2164402062', 'url': 'http://www.cbc.ca/i/caffeine/syndicate/?mediaId=2164402062',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': { 'info_dict': {
'id': '2164402062', 'id': '2164402062',
'ext': 'mp4', 'ext': 'mp4',
@@ -194,107 +202,168 @@ class CBCPlayerIE(InfoExtractor):
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.', 'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746, 'timestamp': 1320410746,
'upload_date': '20111104', 'upload_date': '20111104',
'uploader': 'CBCC-NEW', 'thumbnail': 'https://i.cbc.ca/ais/1.1711287,1717139372111/full/max/0/default.jpg',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg',
'chapters': [], 'chapters': [],
'duration': 186.867, 'duration': 186.867,
'series': 'CBC News: Windsor at 6:00', 'series': 'CBC News: Windsor at 6:00',
'categories': ['News/Canada/Windsor'], 'categories': ['Windsor'],
'location': 'Windsor', 'location': 'Windsor',
'tags': ['cancer'], 'tags': ['Cancer', 'News/Canada/Windsor', 'Windsor'],
'creators': ['Allison Johnson'],
'media_type': 'Excerpt', 'media_type': 'Excerpt',
'genres': ['News'],
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/ # Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'https://www.cbc.ca/player/play/1.2985700', 'url': 'https://www.cbc.ca/player/play/1.2985700',
'md5': 'e5e708c34ae6fca156aafe17c43e8b75', 'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
'info_dict': { 'info_dict': {
'id': '2657631896', 'id': '1.2985700',
'ext': 'mp3', 'ext': 'mp3',
'title': 'CBC Montreal is organizing its first ever community hackathon!', 'title': 'CBC Montreal is organizing its first ever community hackathon!',
'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.', 'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.',
'timestamp': 1425704400, 'timestamp': 1425704400,
'upload_date': '20150307', 'upload_date': '20150307',
'uploader': 'CBCC-NEW', 'thumbnail': 'https://i.cbc.ca/ais/1.2985700,1717262248558/full/max/0/default.jpg',
'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg',
'chapters': [], 'chapters': [],
'duration': 494.811, 'duration': 494.811,
'categories': ['AudioMobile/All in a Weekend Montreal'], 'categories': ['All in a Weekend Montreal'],
'tags': 'count:8', 'tags': 'count:11',
'location': 'Quebec', 'location': 'Quebec',
'series': 'All in a Weekend Montreal', 'series': 'All in a Weekend Montreal',
'season': 'Season 2015', 'season': 'Season 2015',
'season_number': 2015, 'season_number': 2015,
'media_type': 'Excerpt', 'media_type': 'Excerpt',
'genres': ['Other'],
}, },
}, { }, {
'url': 'https://www.cbc.ca/player/play/1.1711287', 'url': 'https://www.cbc.ca/player/play/1.1711287',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': { 'info_dict': {
'id': '2164402062', 'id': '1.1711287',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Cancer survivor four times over', 'title': 'Cancer survivor four times over',
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.', 'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746, 'timestamp': 1320410746,
'upload_date': '20111104', 'upload_date': '20111104',
'uploader': 'CBCC-NEW', 'thumbnail': 'https://i.cbc.ca/ais/1.1711287,1717139372111/full/max/0/default.jpg',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg',
'chapters': [], 'chapters': [],
'duration': 186.867, 'duration': 186.867,
'series': 'CBC News: Windsor at 6:00', 'series': 'CBC News: Windsor at 6:00',
'categories': ['News/Canada/Windsor'], 'categories': ['Windsor'],
'location': 'Windsor', 'location': 'Windsor',
'tags': ['cancer'], 'tags': ['Cancer', 'News/Canada/Windsor', 'Windsor'],
'creators': ['Allison Johnson'],
'media_type': 'Excerpt', 'media_type': 'Excerpt',
'genres': ['News'],
}, },
'params': {'skip_download': 'm3u8'},
}, { }, {
# Has subtitles # Has subtitles
# These broadcasts expire after ~1 month, can find new test URL here: # These broadcasts expire after ~1 month, can find new test URL here:
# https://www.cbc.ca/player/news/TV%20Shows/The%20National/Latest%20Broadcast # https://www.cbc.ca/player/news/TV%20Shows/The%20National/Latest%20Broadcast
'url': 'https://www.cbc.ca/player/play/1.7159484', 'url': 'https://www.cbc.ca/player/play/video/9.6424403',
'md5': '6ed6cd0fc2ef568d2297ba68a763d455', 'md5': '8025909eaffcf0adf59922904def9a5e',
'info_dict': { 'info_dict': {
'id': '2324213316001', 'id': '9.6424403',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The National | School boards sue social media giants', 'title': 'The National | N.W.T. wildfire emergency',
'description': 'md5:4b4db69322fa32186c3ce426da07402c', 'description': 'md5:ada33d36d1df69347ed575905bfd496c',
'timestamp': 1711681200, 'timestamp': 1718589600,
'duration': 2743.400, 'duration': 2692.833,
'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]}, 'subtitles': {
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/607/559/thumbnail.jpeg', 'en-US': [{
'uploader': 'CBCC-NEW', 'name': 'English Captions',
'url': 'https://cbchls.akamaized.net/delivery/news-shows/2024/06/17/NAT_JUN16-00-55-00/NAT_JUN16_cc.vtt',
}],
},
'thumbnail': 'https://i.cbc.ca/ais/6272b5c6-5e78-4c05-915d-0e36672e33d1,1714756287822/full/max/0/default.jpg',
'chapters': 'count:5', 'chapters': 'count:5',
'upload_date': '20240329', 'upload_date': '20240617',
'categories': 'count:4', 'categories': ['News', 'The National', 'The National Latest Broadcasts'],
'series': 'The National - Full Show', 'series': 'The National - Full Show',
'tags': 'count:1', 'tags': ['The National'],
'creators': ['News'],
'location': 'Canada', 'location': 'Canada',
'media_type': 'Full Program', 'media_type': 'Full Program',
'genres': ['News'],
}, },
}, { }, {
'url': 'https://www.cbc.ca/player/play/video/1.7194274', 'url': 'https://www.cbc.ca/player/play/video/1.7194274',
'md5': '188b96cf6bdcb2540e178a6caa957128', 'md5': '188b96cf6bdcb2540e178a6caa957128',
'info_dict': { 'info_dict': {
'id': '2334524995812', 'id': '1.7194274',
'ext': 'mp4', 'ext': 'mp4',
'title': '#TheMoment a rare white spirit moose was spotted in Alberta', 'title': '#TheMoment a rare white spirit moose was spotted in Alberta',
'description': 'md5:18ae269a2d0265c5b0bbe4b2e1ac61a3', 'description': 'md5:18ae269a2d0265c5b0bbe4b2e1ac61a3',
'timestamp': 1714788791, 'timestamp': 1714788791,
'duration': 77.678, 'duration': 77.678,
'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]}, 'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]},
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/201/543/THE_MOMENT.jpg', 'thumbnail': 'https://i.cbc.ca/ais/1.7194274,1717224990425/full/max/0/default.jpg',
'uploader': 'CBCC-NEW', 'chapters': [],
'chapters': 'count:0',
'upload_date': '20240504',
'categories': 'count:3', 'categories': 'count:3',
'series': 'The National', 'series': 'The National',
'tags': 'count:15', 'tags': 'count:17',
'creators': ['encoder'],
'location': 'Canada', 'location': 'Canada',
'media_type': 'Excerpt', 'media_type': 'Excerpt',
'upload_date': '20240504',
'genres': ['News'],
},
}, {
'url': 'https://www.cbc.ca/player/play/video/9.6427282',
'info_dict': {
'id': '9.6427282',
'ext': 'mp4',
'title': 'Men\'s Soccer - Argentina vs Morocco',
'description': 'Argentina faces Morocco on the football pitch at Saint Etienne Stadium.',
'series': 'CBC Sports',
'media_type': 'Event Coverage',
'thumbnail': 'https://i.cbc.ca/ais/a4c5c0c2-99fa-4bd3-8061-5a63879c1b33,1718828053500/full/max/0/default.jpg',
'timestamp': 1721825400.0,
'upload_date': '20240724',
'duration': 10568.0,
'chapters': [],
'genres': [],
'tags': ['2024 Paris Olympic Games'],
'categories': ['Olympics Summer Soccer', 'Summer Olympics Replays', 'Summer Olympics Soccer Replays'],
'location': 'Canada',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.cbc.ca/player/play/video/9.6459530',
'md5': '6c1bb76693ab321a2e99c347a1d5ecbc',
'info_dict': {
'id': '9.6459530',
'ext': 'mp4',
'title': 'Parts of Jasper incinerated as wildfire rages',
'description': 'md5:6f1caa8d128ad3f629257ef5fecf0962',
'series': 'The National',
'media_type': 'Excerpt',
'thumbnail': 'https://i.cbc.ca/ais/507c0086-31a2-494d-96e4-bffb1048d045,1721953984375/full/max/0/default.jpg',
'timestamp': 1721964091.012,
'upload_date': '20240726',
'duration': 952.285,
'chapters': [],
'genres': [],
'tags': 'count:23',
'categories': ['News (FAST)', 'News', 'The National', 'TV News Shows', 'The National '],
},
}, {
'url': 'https://www.cbc.ca/player/play/video/9.6420651',
'md5': '71a850c2c6ee5e912de169f5311bb533',
'info_dict': {
'id': '9.6420651',
'ext': 'mp4',
'title': 'Is it a breath of fresh air? Measuring air quality in Edmonton',
'description': 'md5:3922b92cc8b69212d739bd9dd095b1c3',
'series': 'CBC News Edmonton',
'media_type': 'Excerpt',
'thumbnail': 'https://i.cbc.ca/ais/73c4ab9c-7ad4-46ee-bb9b-020fdc01c745,1718214547576/full/max/0/default.jpg',
'timestamp': 1718220065.768,
'upload_date': '20240612',
'duration': 286.086,
'chapters': [],
'genres': ['News'],
'categories': ['News', 'Edmonton'],
'tags': 'count:7',
'location': 'Edmonton',
}, },
}, { }, {
'url': 'cbcplayer:1.7159484', 'url': 'cbcplayer:1.7159484',
@@ -307,23 +376,113 @@ class CBCPlayerIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
def _parse_param(self, asset_data, name):
return traverse_obj(asset_data, ('params', lambda _, v: v['name'] == name, 'value', {str}, any))
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
if '.' in video_id: webpage = self._download_webpage(f'https://www.cbc.ca/player/play/{video_id}', video_id)
webpage = self._download_webpage(f'https://www.cbc.ca/player/play/{video_id}', video_id) data = self._search_json(
video_id = self._search_json( r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)['video']['currentClip']
r'window\.__INITIAL_STATE__\s*=', webpage, assets = traverse_obj(
'initial state', video_id)['video']['currentClip']['mediaId'] data, ('media', 'assets', lambda _, v: url_or_none(v['key']) and v['type']))
if not assets and (media_id := traverse_obj(data, ('mediaId', {str}))):
# XXX: Deprecated; CBC is migrating off of ThePlatform
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
f'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/{media_id}?mbr=true&formats=MPEG4,FLV,MP3', {
'force_smil_url': True,
}),
'id': media_id,
'_format_sort_fields': ('res', 'proto'), # Prioritize direct http formats over HLS
}
is_live = traverse_obj(data, ('media', 'streamType', {str})) == 'Live'
formats, subtitles = [], {}
for sub in traverse_obj(data, ('media', 'textTracks', lambda _, v: url_or_none(v['src']))):
subtitles.setdefault(sub.get('language') or 'und', []).append({
'url': sub['src'],
'name': sub.get('label'),
})
for asset in assets:
asset_key = asset['key']
asset_type = asset['type']
if asset_type != 'medianet':
self.report_warning(f'Skipping unsupported asset type "{asset_type}": {asset_key}')
continue
asset_data = self._download_json(asset_key, video_id, f'Downloading {asset_type} JSON')
ext = mimetype2ext(self._parse_param(asset_data, 'contentType'))
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
asset_data['url'], video_id, 'mp4', m3u8_id='hls', live=is_live)
formats.extend(fmts)
# Avoid slow/error-prone webvtt-over-m3u8 if direct https vtt is available
if not subtitles:
self._merge_subtitles(subs, target=subtitles)
if is_live or not fmts:
continue
# Check for direct https mp4 format
best_video_fmt = traverse_obj(fmts, (
lambda _, v: v.get('vcodec') != 'none' and v['tbr'], all,
{functools.partial(sorted, key=lambda x: x['tbr'])}, -1, {dict})) or {}
base_url = self._search_regex(
r'(https?://[^?#]+?/)hdntl=', best_video_fmt.get('url'), 'base url', default=None)
if not base_url or '/live/' in base_url:
continue
mp4_url = base_url + replace_extension(url_basename(best_video_fmt['url']), 'mp4')
if self._request_webpage(
HEADRequest(mp4_url), video_id, 'Checking for https format',
errnote=False, fatal=False):
formats.append({
**best_video_fmt,
'url': mp4_url,
'format_id': 'https-mp4',
'protocol': 'https',
'manifest_url': None,
'acodec': None,
})
else:
formats.append({
'url': asset_data['url'],
'ext': ext,
'vcodec': 'none' if self._parse_param(asset_data, 'mediaType') == 'audio' else None,
})
chapters = traverse_obj(data, (
'media', 'chapters', lambda _, v: float(v['startTime']) is not None, {
'start_time': ('startTime', {functools.partial(float_or_none, scale=1000)}),
'end_time': ('endTime', {functools.partial(float_or_none, scale=1000)}),
'title': ('name', {str}),
}))
# Filter out pointless single chapters with start_time==0 and no end_time
if len(chapters) == 1 and not (chapters[0].get('start_time') or chapters[0].get('end_time')):
chapters = []
return { return {
'_type': 'url_transparent', **traverse_obj(data, {
'ie_key': 'ThePlatform', 'title': ('title', {str}),
'url': smuggle_url( 'description': ('description', {str.strip}),
f'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/{video_id}?mbr=true&formats=MPEG4,FLV,MP3', { 'thumbnail': ('image', 'url', {url_or_none}, {functools.partial(update_url, query=None)}),
'force_smil_url': True, 'timestamp': ('publishedAt', {functools.partial(float_or_none, scale=1000)}),
}), 'media_type': ('media', 'clipType', {str}),
'series': ('showName', {str}),
'season_number': ('media', 'season', {int_or_none}),
'duration': ('media', 'duration', {float_or_none}, {lambda x: None if is_live else x}),
'location': ('media', 'region', {str}),
'tags': ('tags', ..., 'name', {str}),
'genres': ('media', 'genre', all),
'categories': ('categories', ..., 'name', {str}),
}),
'id': video_id, 'id': video_id,
'_format_sort_fields': ('res', 'proto'), # Prioritize direct http formats over HLS 'formats': formats,
'subtitles': subtitles,
'chapters': chapters,
'is_live': is_live,
} }

View File

@@ -3150,7 +3150,7 @@ def _parse_ism_formats_and_subtitles(self, ism_doc, ism_url, ism_id=None):
}) })
return formats, subtitles return formats, subtitles
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8_native', mpd_id=None, preference=None, quality=None): def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8_native', mpd_id=None, preference=None, quality=None, _headers=None):
def absolute_url(item_url): def absolute_url(item_url):
return urljoin(base_url, item_url) return urljoin(base_url, item_url)
@@ -3174,11 +3174,11 @@ def _media_formats(src, cur_media_type, type_info=None):
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4', full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id, entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id,
preference=preference, quality=quality, fatal=False) preference=preference, quality=quality, fatal=False, headers=_headers)
elif ext == 'mpd': elif ext == 'mpd':
is_plain_url = False is_plain_url = False
formats = self._extract_mpd_formats( formats = self._extract_mpd_formats(
full_url, video_id, mpd_id=mpd_id, fatal=False) full_url, video_id, mpd_id=mpd_id, fatal=False, headers=_headers)
else: else:
is_plain_url = True is_plain_url = True
formats = [{ formats = [{
@@ -3272,6 +3272,8 @@ def _media_formats(src, cur_media_type, type_info=None):
}) })
for f in media_info['formats']: for f in media_info['formats']:
f.setdefault('http_headers', {})['Referer'] = base_url f.setdefault('http_headers', {})['Referer'] = base_url
if _headers:
f['http_headers'].update(_headers)
if media_info['formats'] or media_info['subtitles']: if media_info['formats'] or media_info['subtitles']:
entries.append(media_info) entries.append(media_info)
return entries return entries

View File

@@ -934,7 +934,7 @@ class TLCIE(DiscoveryPlusBaseIE):
class DiscoveryPlusIE(DiscoveryPlusBaseIE): class DiscoveryPlusIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:(?P<country>[a-z]{2})/)?video(?:/sport)?' + DPlayBaseIE._PATH_REGEX _VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:(?P<country>[a-z]{2})/)?video(?:/sport|/olympics)?' + DPlayBaseIE._PATH_REGEX
_TESTS = [{ _TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family', 'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': { 'info_dict': {
@@ -958,6 +958,9 @@ class DiscoveryPlusIE(DiscoveryPlusBaseIE):
}, { }, {
'url': 'https://www.discoveryplus.com/gb/video/sport/eurosport-1-british-eurosport-1-british-sport/6-hours-of-spa-review', 'url': 'https://www.discoveryplus.com/gb/video/sport/eurosport-1-british-eurosport-1-british-sport/6-hours-of-spa-review',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.discoveryplus.com/gb/video/olympics/dplus-sport-dplus-sport-sport/rugby-sevens-australia-samoa',
'only_matching': True,
}] }]
_PRODUCT = None _PRODUCT = None

View File

@@ -571,16 +571,21 @@ def process_formats(info):
# Formats larger than ~500MB will return error 403 unless chunk size is regulated # Formats larger than ~500MB will return error 403 unless chunk size is regulated
f.setdefault('downloader_options', {})['http_chunk_size'] = 250 << 20 f.setdefault('downloader_options', {})['http_chunk_size'] = 250 << 20
def extract_relay_data(_filter): def yield_all_relay_data(_filter):
return self._parse_json(self._search_regex( for relay_data in re.findall(rf'data-sjs>({{.*?{_filter}.*?}})</script>', webpage):
rf'data-sjs>({{.*?{_filter}.*?}})</script>', yield self._parse_json(relay_data, video_id, fatal=False) or {}
webpage, 'replay data', default='{}'), video_id, fatal=False) or {}
def extract_relay_prefetched_data(_filter): def extract_relay_data(_filter):
return traverse_obj(extract_relay_data(_filter), ( return next(filter(None, yield_all_relay_data(_filter)), {})
'require', (None, (..., ..., ..., '__bbox', 'require')),
def extract_relay_prefetched_data(_filter, target_keys=None):
path = 'data'
if target_keys is not None:
path = lambda k, v: k == 'data' and any(target in v for target in variadic(target_keys))
return traverse_obj(yield_all_relay_data(_filter), (
..., 'require', (None, (..., ..., ..., '__bbox', 'require')),
lambda _, v: any(key.startswith('RelayPrefetchedStreamCache') for key in v), lambda _, v: any(key.startswith('RelayPrefetchedStreamCache') for key in v),
..., ..., '__bbox', 'result', 'data', {dict}), get_all=False) or {} ..., ..., '__bbox', 'result', path, {dict}), get_all=False) or {}
if not video_data: if not video_data:
server_js_data = self._parse_json(self._search_regex([ server_js_data = self._parse_json(self._search_regex([
@@ -591,7 +596,8 @@ def extract_relay_prefetched_data(_filter):
if not video_data: if not video_data:
data = extract_relay_prefetched_data( data = extract_relay_prefetched_data(
r'"(?:dash_manifest|playable_url(?:_quality_hd)?)') r'"(?:dash_manifest|playable_url(?:_quality_hd)?)',
target_keys=('video', 'event', 'nodes', 'node', 'mediaset'))
if data: if data:
entries = [] entries = []

View File

@@ -1,9 +1,14 @@
import functools
from .common import InfoExtractor from .common import InfoExtractor
from ..networking import HEADRequest from ..networking import HEADRequest
from ..utils import ( from ..utils import (
UserNotLive, UserNotLive,
determine_ext,
float_or_none, float_or_none,
int_or_none,
merge_dicts, merge_dicts,
parse_iso8601,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
unified_timestamp, unified_timestamp,
@@ -25,104 +30,192 @@ def _real_initialize(self):
def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs): def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs):
return self._download_json( return self._download_json(
f'https://kick.com/api/v1/{path}', display_id, note=note, f'https://kick.com/api/{path}', display_id, note=note,
headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs) headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs)
class KickIE(KickBaseIE): class KickIE(KickBaseIE):
IE_NAME = 'kick:live'
_VALID_URL = r'https?://(?:www\.)?kick\.com/(?!(?:video|categories|search|auth)(?:[/?#]|$))(?P<id>[\w-]+)' _VALID_URL = r'https?://(?:www\.)?kick\.com/(?!(?:video|categories|search|auth)(?:[/?#]|$))(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://kick.com/yuppy', 'url': 'https://kick.com/buddha',
'info_dict': { 'info_dict': {
'id': '6cde1-kickrp-joe-flemmingskick-info-heremust-knowmust-see21', 'id': '92722911-nopixel-40',
'ext': 'mp4', 'ext': 'mp4',
'title': str, 'title': str,
'description': str, 'description': str,
'channel': 'yuppy',
'channel_id': '33538',
'uploader': 'Yuppy',
'uploader_id': '33793',
'upload_date': str,
'live_status': 'is_live',
'timestamp': int, 'timestamp': int,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:https?://.+\.jpg',
'categories': list, 'categories': list,
'upload_date': str,
'channel': 'buddha',
'channel_id': '32807',
'uploader': 'Buddha',
'uploader_id': '33057',
'live_status': 'is_live',
'concurrent_view_count': int,
'release_timestamp': int,
'age_limit': 18,
'release_date': str,
}, },
'skip': 'livestream', 'params': {'skip_download': 'livestream'},
# 'skip': 'livestream',
}, { }, {
'url': 'https://kick.com/kmack710', 'url': 'https://kick.com/xqc',
'only_matching': True, 'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return False if KickClipIE.suitable(url) else super().suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
channel = self._match_id(url) channel = self._match_id(url)
response = self._call_api(f'channels/{channel}', channel) response = self._call_api(f'v2/channels/{channel}', channel)
if not traverse_obj(response, 'livestream', expected_type=dict): if not traverse_obj(response, 'livestream', expected_type=dict):
raise UserNotLive(video_id=channel) raise UserNotLive(video_id=channel)
return { return {
'id': str(traverse_obj(
response, ('livestream', ('slug', 'id')), get_all=False, default=channel)),
'formats': self._extract_m3u8_formats(
response['playback_url'], channel, 'mp4', live=True),
'title': traverse_obj(
response, ('livestream', ('session_title', 'slug')), get_all=False, default=''),
'description': traverse_obj(response, ('user', 'bio')),
'channel': channel, 'channel': channel,
'channel_id': str_or_none(traverse_obj(response, 'id', ('livestream', 'channel_id'))),
'uploader': traverse_obj(response, 'name', ('user', 'username')),
'uploader_id': str_or_none(traverse_obj(response, 'user_id', ('user', 'id'))),
'is_live': True, 'is_live': True,
'timestamp': unified_timestamp(traverse_obj(response, ('livestream', 'created_at'))), 'formats': self._extract_m3u8_formats(response['playback_url'], channel, 'mp4', live=True),
'thumbnail': traverse_obj( **traverse_obj(response, {
response, ('livestream', 'thumbnail', 'url'), expected_type=url_or_none), 'id': ('livestream', 'slug', {str}),
'categories': traverse_obj(response, ('recent_categories', ..., 'name')), 'title': ('livestream', 'session_title', {str}),
'description': ('user', 'bio', {str}),
'channel_id': (('id', ('livestream', 'channel_id')), {int}, {str_or_none}, any),
'uploader': (('name', ('user', 'username')), {str}, any),
'uploader_id': (('user_id', ('user', 'id')), {int}, {str_or_none}, any),
'timestamp': ('livestream', 'created_at', {unified_timestamp}),
'release_timestamp': ('livestream', 'start_time', {unified_timestamp}),
'thumbnail': ('livestream', 'thumbnail', 'url', {url_or_none}),
'categories': ('recent_categories', ..., 'name', {str}),
'concurrent_view_count': ('livestream', 'viewer_count', {int_or_none}),
'age_limit': ('livestream', 'is_mature', {bool}, {lambda x: 18 if x else 0}),
}),
} }
class KickVODIE(KickBaseIE): class KickVODIE(KickBaseIE):
IE_NAME = 'kick:vod'
_VALID_URL = r'https?://(?:www\.)?kick\.com/video/(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' _VALID_URL = r'https?://(?:www\.)?kick\.com/video/(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})'
_TESTS = [{ _TESTS = [{
'url': 'https://kick.com/video/58bac65b-e641-4476-a7ba-3707a35e60e3', 'url': 'https://kick.com/video/e74614f4-5270-4319-90ad-32179f19a45c',
'md5': '3870f94153e40e7121a6e46c068b70cb', 'md5': '3870f94153e40e7121a6e46c068b70cb',
'info_dict': { 'info_dict': {
'id': '58bac65b-e641-4476-a7ba-3707a35e60e3', 'id': 'e74614f4-5270-4319-90ad-32179f19a45c',
'ext': 'mp4', 'ext': 'mp4',
'title': '🤠REBIRTH IS BACK!!!!🤠!stake CODE JAREDFPS 🤠', 'title': r're:❎ MEGA DRAMA ❎ LIVE ❎ CLICK ❎ ULTIMATE SKILLS .+',
'description': 'md5:02b0c46f9b4197fb545ab09dddb85b1d', 'description': 'THE BEST AT ABSOLUTELY EVERYTHING. THE JUICER. LEADER OF THE JUICERS.',
'channel': 'jaredfps', 'channel': 'xqc',
'channel_id': '26608', 'channel_id': '668',
'uploader': 'JaredFPS', 'uploader': 'xQc',
'uploader_id': '26799', 'uploader_id': '676',
'upload_date': '20240402', 'upload_date': '20240724',
'timestamp': 1712097108, 'timestamp': 1721796562,
'duration': 33859.0, 'duration': 18566.0,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
'categories': ['Call of Duty: Warzone'], 'view_count': int,
'categories': ['VALORANT'],
'age_limit': 0,
}, },
'params': { 'params': {'skip_download': 'm3u8'},
'skip_download': 'm3u8',
},
'expected_warnings': [r'impersonation'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
response = self._call_api(f'video/{video_id}', video_id) response = self._call_api(f'v1/video/{video_id}', video_id)
return { return {
'id': video_id, 'id': video_id,
'formats': self._extract_m3u8_formats(response['source'], video_id, 'mp4'), 'formats': self._extract_m3u8_formats(response['source'], video_id, 'mp4'),
'title': traverse_obj( **traverse_obj(response, {
response, ('livestream', ('session_title', 'slug')), get_all=False, default=''), 'title': ('livestream', ('session_title', 'slug'), {str}, any),
'description': traverse_obj(response, ('livestream', 'channel', 'user', 'bio')), 'description': ('livestream', 'channel', 'user', 'bio', {str}),
'channel': traverse_obj(response, ('livestream', 'channel', 'slug')), 'channel': ('livestream', 'channel', 'slug', {str}),
'channel_id': str_or_none(traverse_obj(response, ('livestream', 'channel', 'id'))), 'channel_id': ('livestream', 'channel', 'id', {int}, {str_or_none}),
'uploader': traverse_obj(response, ('livestream', 'channel', 'user', 'username')), 'uploader': ('livestream', 'channel', 'user', 'username', {str}),
'uploader_id': str_or_none(traverse_obj(response, ('livestream', 'channel', 'user_id'))), 'uploader_id': ('livestream', 'channel', 'user_id', {int}, {str_or_none}),
'timestamp': unified_timestamp(response.get('created_at')), 'timestamp': ('created_at', {parse_iso8601}),
'duration': float_or_none(traverse_obj(response, ('livestream', 'duration')), scale=1000), 'duration': ('livestream', 'duration', {functools.partial(float_or_none, scale=1000)}),
'thumbnail': traverse_obj( 'thumbnail': ('livestream', 'thumbnail', {url_or_none}),
response, ('livestream', 'thumbnail'), expected_type=url_or_none), 'categories': ('livestream', 'categories', ..., 'name', {str}),
'categories': traverse_obj(response, ('livestream', 'categories', ..., 'name')), 'view_count': ('views', {int_or_none}),
'age_limit': ('livestream', 'is_mature', {bool}, {lambda x: 18 if x else 0}),
}),
}
class KickClipIE(KickBaseIE):
IE_NAME = 'kick:clips'
_VALID_URL = r'https?://(?:www\.)?kick\.com/[\w-]+/?\?(?:[^#]+&)?clip=(?P<id>clip_[\w-]+)'
_TESTS = [{
'url': 'https://kick.com/mxddy?clip=clip_01GYXVB5Y8PWAPWCWMSBCFB05X',
'info_dict': {
'id': 'clip_01GYXVB5Y8PWAPWCWMSBCFB05X',
'ext': 'mp4',
'title': 'Maddy detains Abd D:',
'channel': 'mxddy',
'channel_id': '133789',
'uploader': 'AbdCreates',
'uploader_id': '3309077',
'thumbnail': r're:^https?://.*\.jpeg',
'duration': 35,
'timestamp': 1682481453,
'upload_date': '20230426',
'view_count': int,
'like_count': int,
'categories': ['VALORANT'],
'age_limit': 18,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://kick.com/destiny?clip=clip_01H9SKET879NE7N9RJRRDS98J3',
'info_dict': {
'id': 'clip_01H9SKET879NE7N9RJRRDS98J3',
'title': 'W jews',
'ext': 'mp4',
'channel': 'destiny',
'channel_id': '1772249',
'uploader': 'punished_furry',
'uploader_id': '2027722',
'duration': 49.0,
'upload_date': '20230908',
'timestamp': 1694150180,
'thumbnail': 'https://clips.kick.com/clips/j3/clip_01H9SKET879NE7N9RJRRDS98J3/thumbnail.png',
'view_count': int,
'like_count': int,
'categories': ['Just Chatting'],
'age_limit': 0,
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
clip_id = self._match_id(url)
clip = self._call_api(f'v2/clips/{clip_id}/play', clip_id)['clip']
clip_url = clip['clip_url']
if determine_ext(clip_url) == 'm3u8':
formats = self._extract_m3u8_formats(clip_url, clip_id, 'mp4')
else:
formats = [{'url': clip_url}]
return {
'id': clip_id,
'formats': formats,
**traverse_obj(clip, {
'title': ('title', {str}),
'channel': ('channel', 'slug', {str}),
'channel_id': ('channel', 'id', {int}, {str_or_none}),
'uploader': ('creator', 'username', {str}),
'uploader_id': ('creator', 'id', {int}, {str_or_none}),
'thumbnail': ('thumbnail_url', {url_or_none}),
'duration': ('duration', {float_or_none}),
'categories': ('category', 'name', {str}, all),
'timestamp': ('created_at', {parse_iso8601}),
'view_count': ('views', {int_or_none}),
'like_count': ('likes', {int_or_none}),
'age_limit': ('is_mature', {bool}, {lambda x: 18 if x else 0}),
}),
} }

View File

@@ -0,0 +1,78 @@
import functools
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
extract_attributes,
get_element_by_class,
get_element_html_by_id,
join_nonempty,
parse_duration,
unified_timestamp,
)
from ..utils.traversal import traverse_obj
class LearningOnScreenIE(InfoExtractor):
_VALID_URL = r'https?://learningonscreen\.ac\.uk/ondemand/index\.php/prog/(?P<id>\w+)'
_TESTS = [{
'url': 'https://learningonscreen.ac.uk/ondemand/index.php/prog/005D81B2?bcast=22757013',
'info_dict': {
'id': '005D81B2',
'ext': 'mp4',
'title': 'Planet Earth',
'duration': 3600.0,
'timestamp': 1164567600.0,
'upload_date': '20061126',
'thumbnail': 'https://stream.learningonscreen.ac.uk/trilt-cover-images/005D81B2-Planet-Earth-2006-11-26T190000Z-BBC4.jpg',
},
}]
def _real_initialize(self):
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-BOB-LIVE'):
self.raise_login_required(
'Use --cookies for authentication. See '
' https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp '
'for how to manually pass cookies', method=None)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
details = traverse_obj(webpage, (
{functools.partial(get_element_html_by_id, 'programme-details')}, {
'title': ({functools.partial(re.search, r'<h2>([^<]+)</h2>')}, 1, {clean_html}),
'timestamp': (
{functools.partial(get_element_by_class, 'broadcast-date')},
{functools.partial(re.match, r'([^<]+)')}, 1, {unified_timestamp}),
'duration': (
{functools.partial(get_element_by_class, 'prog-running-time')},
{clean_html}, {parse_duration}),
}))
title = details.pop('title', None) or traverse_obj(webpage, (
{functools.partial(get_element_html_by_id, 'add-to-existing-playlist')},
{extract_attributes}, 'data-record-title', {clean_html}))
entries = self._parse_html5_media_entries(
'https://stream.learningonscreen.ac.uk', webpage, video_id, m3u8_id='hls', mpd_id='dash',
_headers={'Origin': 'https://learningonscreen.ac.uk', 'Referer': 'https://learningonscreen.ac.uk/'})
if not entries:
raise ExtractorError('No video found')
if len(entries) > 1:
duration = details.pop('duration', None)
for idx, entry in enumerate(entries, start=1):
entry.update(details)
entry['id'] = join_nonempty(video_id, idx)
entry['title'] = join_nonempty(title, idx)
return self.playlist_result(entries, video_id, title, duration=duration)
return {
**entries[0],
**details,
'id': video_id,
'title': title,
}

View File

@@ -133,7 +133,9 @@ def _real_extract(self, url):
r'<p+\b[^>]+\bclass="article_date">([^<]+)<', webpage, 'upload date', default=None)) r'<p+\b[^>]+\bclass="article_date">([^<]+)<', webpage, 'upload date', default=None))
player_data['video'] = player_data.pop('token') player_data['video'] = player_data.pop('token')
player_page = self._download_webpage('https://player.mediaklikk.hu/playernew/player.php', video_id, query=player_data) player_page = self._download_webpage(
'https://player.mediaklikk.hu/playernew/player.php', video_id,
query=player_data, headers={'Referer': url})
player_json = self._search_json( player_json = self._search_json(
r'\bpl\.setup\s*\(', player_page, 'player json', video_id, end_pattern=r'\);') r'\bpl\.setup\s*\(', player_page, 'player json', video_id, end_pattern=r'\);')
playlist_url = traverse_obj( playlist_url = traverse_obj(

View File

@@ -1,16 +1,21 @@
import json
import re import re
import urllib.parse import time
import uuid import uuid
from .common import InfoExtractor from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import ( from ..utils import (
ExtractorError,
determine_ext, determine_ext,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
jwt_decode_hs256,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
try_get, try_get,
url_or_none, url_or_none,
urlencode_postdata,
) )
from ..utils.traversal import traverse_obj from ..utils.traversal import traverse_obj
@@ -276,81 +281,225 @@ def _download_video_data(self, display_id):
class MLBTVIE(InfoExtractor): class MLBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/tv/g(?P<id>\d{6})' _VALID_URL = r'https?://(?:www\.)?mlb\.com/tv/g(?P<id>\d{6})'
_NETRC_MACHINE = 'mlb' _NETRC_MACHINE = 'mlb'
_TESTS = [{ _TESTS = [{
'url': 'https://www.mlb.com/tv/g661581/vee2eff5f-a7df-4c20-bdb4-7b926fa12638', 'url': 'https://www.mlb.com/tv/g661581/vee2eff5f-a7df-4c20-bdb4-7b926fa12638',
'info_dict': { 'info_dict': {
'id': '661581', 'id': '661581',
'ext': 'mp4', 'ext': 'mp4',
'title': '2022-07-02 - St. Louis Cardinals @ Philadelphia Phillies', 'title': '2022-07-02 - St. Louis Cardinals @ Philadelphia Phillies',
'release_date': '20220702',
'release_timestamp': 1656792300,
}, },
'params': { 'params': {'skip_download': 'm3u8'},
'skip_download': True, }, {
# makeup game: has multiple dates, need to avoid games with 'rescheduleDate'
'url': 'https://www.mlb.com/tv/g747039/vd22541c4-5a29-45f7-822b-635ec041cf5e',
'info_dict': {
'id': '747039',
'ext': 'mp4',
'title': '2024-07-29 - Toronto Blue Jays @ Baltimore Orioles',
'release_date': '20240729',
'release_timestamp': 1722280200,
}, },
'params': {'skip_download': 'm3u8'},
}] }]
_GRAPHQL_INIT_QUERY = '''\
mutation initSession($device: InitSessionInput!, $clientType: ClientType!, $experience: ExperienceTypeInput) {
initSession(device: $device, clientType: $clientType, experience: $experience) {
deviceId
sessionId
entitlements {
code
}
location {
countryCode
regionName
zipCode
latitude
longitude
}
clientExperience
features
}
}'''
_GRAPHQL_PLAYBACK_QUERY = '''\
mutation initPlaybackSession(
$adCapabilities: [AdExperienceType]
$mediaId: String!
$deviceId: String!
$sessionId: String!
$quality: PlaybackQuality
) {
initPlaybackSession(
adCapabilities: $adCapabilities
mediaId: $mediaId
deviceId: $deviceId
sessionId: $sessionId
quality: $quality
) {
playbackSessionId
playback {
url
token
expiration
cdn
}
}
}'''
_APP_VERSION = '7.8.2'
_device_id = None
_session_id = None
_access_token = None _access_token = None
_token_expiry = 0
@property
def _api_headers(self):
if (self._token_expiry - 120) <= time.time():
self.write_debug('Access token has expired; re-logging in')
self._perform_login(*self._get_login_info())
return {'Authorization': f'Bearer {self._access_token}'}
def _real_initialize(self): def _real_initialize(self):
if not self._access_token: if not self._access_token:
self.raise_login_required( self.raise_login_required(
'All videos are only available to registered users', method='password') 'All videos are only available to registered users', method='password')
def _set_device_id(self, username):
if not self._device_id:
self._device_id = self.cache.load(
self._NETRC_MACHINE, 'device_ids', default={}).get(username)
if self._device_id:
return
self._device_id = str(uuid.uuid4())
self.cache.store(self._NETRC_MACHINE, 'device_ids', {username: self._device_id})
def _perform_login(self, username, password): def _perform_login(self, username, password):
data = f'grant_type=password&username={urllib.parse.quote(username)}&password={urllib.parse.quote(password)}&scope=openid offline_access&client_id=0oa3e1nutA1HLzAKG356' try:
access_token = self._download_json( self._access_token = self._download_json(
'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None, 'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None,
headers={ 'Logging in', 'Unable to log in', headers={
'User-Agent': 'okhttp/3.12.1', 'User-Agent': 'okhttp/3.12.1',
'Content-Type': 'application/x-www-form-urlencoded', 'Content-Type': 'application/x-www-form-urlencoded',
}, data=data.encode())['access_token'] }, data=urlencode_postdata({
'grant_type': 'password',
'username': username,
'password': password,
'scope': 'openid offline_access',
'client_id': '0oa3e1nutA1HLzAKG356',
}))['access_token']
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 400:
raise ExtractorError('Invalid username or password', expected=True)
raise
entitlement = self._download_webpage( self._token_expiry = traverse_obj(self._access_token, ({jwt_decode_hs256}, 'exp', {int})) or 0
f'https://media-entitlement.mlb.com/api/v3/jwt?os=Android&appname=AtBat&did={uuid.uuid4()}', None, self._set_device_id(username)
headers={
'User-Agent': 'okhttp/3.12.1',
'Authorization': f'Bearer {access_token}',
})
data = f'grant_type=urn:ietf:params:oauth:grant-type:token-exchange&subject_token={entitlement}&subject_token_type=urn:ietf:params:oauth:token-type:jwt&platform=android-tv' self._session_id = self._call_api({
self._access_token = self._download_json( 'operationName': 'initSession',
'https://us.edge.bamgrid.com/token', None, 'query': self._GRAPHQL_INIT_QUERY,
'variables': {
'device': {
'appVersion': self._APP_VERSION,
'deviceFamily': 'desktop',
'knownDeviceId': self._device_id,
'languagePreference': 'ENGLISH',
'manufacturer': '',
'model': '',
'os': '',
'osVersion': '',
},
'clientType': 'WEB',
},
}, None, 'session ID')['data']['initSession']['sessionId']
def _call_api(self, data, video_id, description='GraphQL JSON', fatal=True):
return self._download_json(
'https://media-gateway.mlb.com/graphql', video_id,
f'Downloading {description}', f'Unable to download {description}', fatal=fatal,
headers={ headers={
**self._api_headers,
'Accept': 'application/json', 'Accept': 'application/json',
'Authorization': 'Bearer bWxidHYmYW5kcm9pZCYxLjAuMA.6LZMbH2r--rbXcgEabaDdIslpo4RyZrlVfWZhsAgXIk', 'Content-Type': 'application/json',
'Content-Type': 'application/x-www-form-urlencoded', 'x-client-name': 'WEB',
}, data=data.encode())['access_token'] 'x-client-version': self._APP_VERSION,
}, data=json.dumps(data, separators=(',', ':')).encode())
def _extract_formats_and_subtitles(self, broadcast, video_id):
feed = traverse_obj(broadcast, ('homeAway', {str.title}))
medium = traverse_obj(broadcast, ('type', {str}))
language = traverse_obj(broadcast, ('language', {str.lower}))
format_id = join_nonempty(feed, medium, language)
response = self._call_api({
'operationName': 'initPlaybackSession',
'query': self._GRAPHQL_PLAYBACK_QUERY,
'variables': {
'adCapabilities': ['GOOGLE_STANDALONE_AD_PODS'],
'deviceId': self._device_id,
'mediaId': broadcast['mediaId'],
'quality': 'PLACEHOLDER',
'sessionId': self._session_id,
},
}, video_id, f'{format_id} broadcast JSON', fatal=False)
playback = traverse_obj(response, ('data', 'initPlaybackSession', 'playback', {dict}))
m3u8_url = traverse_obj(playback, ('url', {url_or_none}))
token = traverse_obj(playback, ('token', {str}))
if not (m3u8_url and token):
errors = '; '.join(traverse_obj(response, ('errors', ..., 'message', {str})))
if 'not entitled' in errors:
raise ExtractorError(errors, expected=True)
elif errors: # Only warn when 'blacked out' since radio formats are available
self.report_warning(f'API returned errors for {format_id}: {errors}')
else:
self.report_warning(f'No formats available for {format_id} broadcast; skipping')
return [], {}
cdn_headers = {'x-cdn-token': token}
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url.replace(f'/{token}/', '/'), video_id, 'mp4',
m3u8_id=format_id, fatal=False, headers=cdn_headers)
for fmt in fmts:
fmt['http_headers'] = cdn_headers
fmt.setdefault('format_note', join_nonempty(feed, medium, delim=' '))
fmt.setdefault('language', language)
if fmt.get('vcodec') == 'none' and fmt['language'] == 'en':
fmt['source_preference'] = 10
return fmts, subs
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
airings = self._download_json( data = self._download_json(
f'https://search-api-mlbtv.mlb.com/svc/search/v2/graphql/persisted/query/core/Airings?variables=%7B%22partnerProgramIds%22%3A%5B%22{video_id}%22%5D%2C%22applyEsniMediaRightsLabels%22%3Atrue%7D', 'https://statsapi.mlb.com/api/v1/schedule', video_id, query={
video_id)['data']['Airings'] 'gamePk': video_id,
'hydrate': 'broadcasts(all),statusFlags',
})
metadata = traverse_obj(data, (
'dates', ..., 'games',
lambda _, v: str(v['gamePk']) == video_id and not v.get('rescheduleDate'), any))
broadcasts = traverse_obj(metadata, (
'broadcasts', lambda _, v: v['mediaId'] and v['mediaState']['mediaStateCode'] != 'MEDIA_OFF'))
formats, subtitles = [], {} formats, subtitles = [], {}
for airing in traverse_obj(airings, lambda _, v: v['playbackUrls'][0]['href']): for broadcast in broadcasts:
format_id = join_nonempty('feedType', 'feedLanguage', from_dict=airing) fmts, subs = self._extract_formats_and_subtitles(broadcast, video_id)
m3u8_url = traverse_obj(self._download_json( formats.extend(fmts)
airing['playbackUrls'][0]['href'].format(scenario='browser~csai'), video_id, self._merge_subtitles(subs, target=subtitles)
note=f'Downloading {format_id} stream info JSON',
errnote=f'Failed to download {format_id} stream info, skipping',
fatal=False, headers={
'Authorization': self._access_token,
'Accept': 'application/vnd.media-service+json; version=2',
}), ('stream', 'complete', {url_or_none}))
if not m3u8_url:
continue
f, s = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)
formats.extend(f)
self._merge_subtitles(s, target=subtitles)
return { return {
'id': video_id, 'id': video_id,
'title': traverse_obj(airings, (..., 'titles', 0, 'episodeName'), get_all=False), 'title': join_nonempty(
'is_live': traverse_obj(airings, (..., 'mediaConfig', 'productType'), get_all=False) == 'LIVE', traverse_obj(metadata, ('officialDate', {str})),
traverse_obj(metadata, ('teams', ('away', 'home'), 'team', 'name', {str}, all, {' @ '.join})),
delim=' - '),
'is_live': traverse_obj(broadcasts, (..., 'mediaState', 'mediaStateCode', {str}, any)) == 'MEDIA_ON',
'release_timestamp': traverse_obj(metadata, ('gameDate', {parse_iso8601})),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'http_headers': {'Authorization': f'Bearer {self._access_token}'},
} }

View File

@@ -1,9 +1,17 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none, try_get from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
try_get,
url_or_none,
)
from ..utils.traversal import traverse_obj
class OlympicsReplayIE(InfoExtractor): class OlympicsReplayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?olympics\.com(?:/tokyo-2020)?/[a-z]{2}/(?:replay|video)/(?P<id>[^/#&?]+)' _VALID_URL = r'https?://(?:www\.)?olympics\.com/[a-z]{2}/(?:paris-2024/)?(?:replay|videos?|original-series/episode)/(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://olympics.com/fr/video/men-s-109kg-group-a-weightlifting-tokyo-2020-replays', 'url': 'https://olympics.com/fr/video/men-s-109kg-group-a-weightlifting-tokyo-2020-replays',
'info_dict': { 'info_dict': {
@@ -11,26 +19,98 @@ class OlympicsReplayIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '+109kg (H) Groupe A - Haltérophilie | Replay de Tokyo 2020', 'title': '+109kg (H) Groupe A - Haltérophilie | Replay de Tokyo 2020',
'upload_date': '20210801', 'upload_date': '20210801',
'timestamp': 1627783200, 'timestamp': 1627797600,
'description': 'md5:c66af4a5bc7429dbcc43d15845ff03b3', 'description': 'md5:c66af4a5bc7429dbcc43d15845ff03b3',
'uploader': 'International Olympic Committee', 'thumbnail': 'https://img.olympics.com/images/image/private/t_1-1_1280/primary/nua4o7zwyaznoaejpbk2',
'duration': 7017.0,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
'url': 'https://olympics.com/tokyo-2020/en/replay/bd242924-4b22-49a5-a846-f1d4c809250d/mens-bronze-medal-match-hun-esp', 'url': 'https://olympics.com/en/original-series/episode/b-boys-and-b-girls-take-the-spotlight-breaking-life-road-to-paris-2024',
'only_matching': True, 'info_dict': {
'id': '32633650-c5ee-4280-8b94-fb6defb6a9b5',
'ext': 'mp4',
'title': 'B-girl Nicka - Breaking Life, Road to Paris 2024 | Episode 1',
'upload_date': '20240517',
'timestamp': 1715948200,
'description': 'md5:f63d728a41270ec628f6ac33ce471bb1',
'thumbnail': 'https://img.olympics.com/images/image/private/t_1-1_1280/primary/a3j96l7j6so3vyfijby1',
'duration': 1321.0,
},
}, {
'url': 'https://olympics.com/en/paris-2024/videos/men-s-preliminaries-gbr-esp-ned-rsa-hockey-olympic-games-paris-2024',
'info_dict': {
'id': '3d96db23-8eee-4b7c-8ef5-488a0361026c',
'ext': 'mp4',
'title': 'Men\'s Preliminaries GBR-ESP & NED-RSA | Hockey | Olympic Games Paris 2024',
'upload_date': '20240727',
'timestamp': 1722066600,
},
'skip': 'Geo-restricted to RU, BR, BT, NP, TM, BD, TL',
}, {
'url': 'https://olympics.com/en/paris-2024/videos/dnp-suni-lee-i-have-goals-and-i-have-expectations-for-myself-but-i-also-am-trying-to-give-myself-grace',
'info_dict': {
'id': 'a42f37ab-8a74-41d0-a7d9-af27b7b02a90',
'ext': 'mp4',
'title': 'md5:c7cfbc9918636a98e66400a812e4d407',
'upload_date': '20240729',
'timestamp': 1722288600,
},
}] }]
_GEO_BYPASS = False
def _extract_from_nextjs_data(self, webpage, video_id):
data = traverse_obj(self._search_nextjs_data(webpage, video_id, default={}), (
'props', 'pageProps', 'page', 'items',
lambda _, v: v['name'] == 'videoPlaylist', 'data', 'currentVideo', {dict}, any))
if not data:
return None
geo_countries = traverse_obj(data, ('countries', ..., {str}))
if traverse_obj(data, ('geoRestrictedVideo', {bool})):
self.raise_geo_restricted(countries=geo_countries)
is_live = traverse_obj(data, ('streamingStatus', {str})) == 'LIVE'
m3u8_url = traverse_obj(data, ('videoUrl', {url_or_none})) or data['streamUrl']
tokenized_url = m3u8_url if is_live else self._tokenize_url(m3u8_url, video_id)
try:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
tokenized_url, video_id, 'mp4', m3u8_id='hls')
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and 'georestricted' in e.cause.msg:
self.raise_geo_restricted(countries=geo_countries)
raise
return {
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
**traverse_obj(data, {
'id': ('videoID', {str}),
'title': ('title', {str}),
'timestamp': ('contentDate', {parse_iso8601}),
}),
}
def _tokenize_url(self, url, video_id):
return self._download_json(
'https://olympics.com/tokenGenerator', video_id,
'Downloading tokenized m3u8 url', query={'url': url})
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if info := self._extract_from_nextjs_data(webpage, video_id):
return info
title = self._html_search_meta(('title', 'og:title', 'twitter:title'), webpage) title = self._html_search_meta(('title', 'og:title', 'twitter:title'), webpage)
uuid = self._html_search_meta('episode_uid', webpage) video_uuid = self._html_search_meta('episode_uid', webpage)
m3u8_url = self._html_search_meta('video_url', webpage) m3u8_url = self._html_search_meta('video_url', webpage)
json_ld = self._search_json_ld(webpage, uuid) json_ld = self._search_json_ld(webpage, video_uuid)
thumbnails_list = json_ld.get('image') thumbnails_list = json_ld.get('image')
if not thumbnails_list: if not thumbnails_list:
thumbnails_list = self._html_search_regex( thumbnails_list = self._html_search_regex(
@@ -48,12 +128,12 @@ def _real_extract(self, url):
'width': width, 'width': width,
'height': int_or_none(try_get(width, lambda x: x * height_a / width_a)), 'height': int_or_none(try_get(width, lambda x: x * height_a / width_a)),
}) })
m3u8_url = self._download_json(
f'https://olympics.com/tokenGenerator?url={m3u8_url}', uuid, note='Downloading m3u8 url') formats, subtitles = self._extract_m3u8_formats_and_subtitles(
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, uuid, 'mp4', m3u8_id='hls') self._tokenize_url(m3u8_url, video_uuid), video_uuid, 'mp4', m3u8_id='hls')
return { return {
'id': uuid, 'id': video_uuid,
'title': title, 'title': title,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'formats': formats, 'formats': formats,

View File

@@ -23,7 +23,6 @@
mimetype2ext, mimetype2ext,
parse_qs, parse_qs,
qualities, qualities,
remove_start,
srt_subtitles_timecode, srt_subtitles_timecode,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
@@ -254,7 +253,16 @@ def _extract_web_data_and_status(self, url, video_id, fatal=True):
def _get_subtitles(self, aweme_detail, aweme_id, user_name): def _get_subtitles(self, aweme_detail, aweme_id, user_name):
# TODO: Extract text positioning info # TODO: Extract text positioning info
EXT_MAP = { # From lowest to highest preference
'creator_caption': 'json',
'srt': 'srt',
'webvtt': 'vtt',
}
preference = qualities(tuple(EXT_MAP.values()))
subtitles = {} subtitles = {}
# aweme/detail endpoint subs # aweme/detail endpoint subs
captions_info = traverse_obj( captions_info = traverse_obj(
aweme_detail, ('interaction_stickers', ..., 'auto_video_caption_info', 'auto_captions', ...), expected_type=dict) aweme_detail, ('interaction_stickers', ..., 'auto_video_caption_info', 'auto_captions', ...), expected_type=dict)
@@ -278,8 +286,8 @@ def _get_subtitles(self, aweme_detail, aweme_id, user_name):
if not caption.get('url'): if not caption.get('url'):
continue continue
subtitles.setdefault(caption.get('lang') or 'en', []).append({ subtitles.setdefault(caption.get('lang') or 'en', []).append({
'ext': remove_start(caption.get('caption_format'), 'web'),
'url': caption['url'], 'url': caption['url'],
'ext': EXT_MAP.get(caption.get('Format')),
}) })
# webpage subs # webpage subs
if not subtitles: if not subtitles:
@@ -288,9 +296,14 @@ def _get_subtitles(self, aweme_detail, aweme_id, user_name):
self._create_url(user_name, aweme_id), aweme_id, fatal=False) self._create_url(user_name, aweme_id), aweme_id, fatal=False)
for caption in traverse_obj(aweme_detail, ('video', 'subtitleInfos', lambda _, v: v['Url'])): for caption in traverse_obj(aweme_detail, ('video', 'subtitleInfos', lambda _, v: v['Url'])):
subtitles.setdefault(caption.get('LanguageCodeName') or 'en', []).append({ subtitles.setdefault(caption.get('LanguageCodeName') or 'en', []).append({
'ext': remove_start(caption.get('Format'), 'web'),
'url': caption['Url'], 'url': caption['Url'],
'ext': EXT_MAP.get(caption.get('Format')),
}) })
# Deprioritize creator_caption json since it can't be embedded or used by media players
for lang, subs_list in subtitles.items():
subtitles[lang] = sorted(subs_list, key=lambda x: preference(x['ext']))
return subtitles return subtitles
def _parse_url_key(self, url_key): def _parse_url_key(self, url_key):

View File

@@ -1,60 +1,29 @@
import functools import functools
import re import re
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import float_or_none, int_or_none, smuggle_url, strip_or_none from ..utils import float_or_none, int_or_none, smuggle_url, strip_or_none
from ..utils.traversal import traverse_obj from ..utils.traversal import traverse_obj
class TVAIE(InfoExtractor): class TVAIE(InfoExtractor):
_VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P<id>\d+)' IE_NAME = 'tvaplus'
IE_DESC = 'TVA+'
_VALID_URL = r'https?://(?:www\.)?tvaplus\.ca/(?:[^/?#]+/)*[\w-]+-(?P<id>\d+)(?:$|[#?])'
_TESTS = [{ _TESTS = [{
'url': 'https://videos.tva.ca/details/_5596811470001', 'url': 'https://www.tvaplus.ca/tva/alerte-amber/saison-1/episode-01-1000036619',
'info_dict': {
'id': '5596811470001',
'ext': 'mp4',
'title': 'Un extrait de l\'épisode du dimanche 8 octobre 2017 !',
'uploader_id': '5481942443001',
'upload_date': '20171003',
'timestamp': 1507064617,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'https://video.tva.ca/details/_5596811470001',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'id': video_id,
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {'geo_countries': ['CA']}),
'ie_key': 'BrightcoveNew',
}
class QubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?qub\.ca/(?:[^/]+/)*[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.qub.ca/tvaplus/tva/alerte-amber/saison-1/episode-01-1000036619',
'md5': '949490fd0e7aee11d0543777611fbd53', 'md5': '949490fd0e7aee11d0543777611fbd53',
'info_dict': { 'info_dict': {
'id': '6084352463001', 'id': '6084352463001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ép 01. Mon dernier jour', 'title': 'Mon dernier jour',
'uploader_id': '5481942443001', 'uploader_id': '5481942443001',
'upload_date': '20190907', 'upload_date': '20190907',
'timestamp': 1567899756, 'timestamp': 1567899756,
'description': 'md5:9c0d7fbb90939420c651fd977df90145', 'description': 'md5:9c0d7fbb90939420c651fd977df90145',
'thumbnail': r're:https://.+\.jpg', 'thumbnail': r're:https://.+\.jpg',
'episode': 'Ép 01. Mon dernier jour', 'episode': 'Mon dernier jour',
'episode_number': 1, 'episode_number': 1,
'tags': ['alerte amber', 'alerte amber saison 1', 'surdemande'], 'tags': ['alerte amber', 'alerte amber saison 1', 'surdemande'],
'duration': 2625.963, 'duration': 2625.963,
@@ -64,23 +33,36 @@ class QubIE(InfoExtractor):
'channel': 'TVA', 'channel': 'TVA',
}, },
}, { }, {
'url': 'https://www.qub.ca/tele/video/lcn-ca-vous-regarde-rev-30s-ap369664-1009357943', 'url': 'https://www.tvaplus.ca/tva/le-baiser-du-barbu/le-baiser-du-barbu-886644190',
'only_matching': True, 'info_dict': {
'id': '6354448043112',
'ext': 'mp4',
'title': 'Le Baiser du barbu',
'uploader_id': '5481942443001',
'upload_date': '20240606',
'timestamp': 1717694023,
'description': 'md5:025b1219086c1cbf4bc27e4e034e8b57',
'thumbnail': r're:https://.+\.jpg',
'episode': 'Le Baiser du barbu',
'tags': ['fullepisode', 'films'],
'duration': 6053.504,
'series': 'Le Baiser du barbu',
'channel': 'TVA',
},
}] }]
# reference_id also works with old account_id(5481942443001) _BC_URL_TMPL = 'https://players.brightcove.net/5481942443001/default_default/index.html?videoId={}'
# BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5813221784001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url): def _real_extract(self, url):
entity_id = self._match_id(url) entity_id = self._match_id(url)
webpage = self._download_webpage(url, entity_id) webpage = self._download_webpage(url, entity_id)
entity = self._search_nextjs_data(webpage, entity_id)['props']['initialProps']['pageProps']['fallbackData'] entity = self._search_nextjs_data(webpage, entity_id)['props']['pageProps']['staticEntity']
video_id = entity['videoId'] video_id = entity['videoId']
episode = strip_or_none(entity.get('name')) episode = strip_or_none(entity.get('name'))
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': f'https://videos.tva.ca/details/_{video_id}', 'url': smuggle_url(self._BC_URL_TMPL.format(video_id), {'geo_countries': ['CA']}),
'ie_key': TVAIE.ie_key(), 'ie_key': BrightcoveNewIE.ie_key(),
'id': video_id, 'id': video_id,
'title': episode, 'title': episode,
'episode': episode, 'episode': episode,

View File

@@ -10,7 +10,7 @@
class TVerIE(InfoExtractor): class TVerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?P<type>lp|corner|series|episodes?|feature|tokyo2020/video)/)+(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?P<type>lp|corner|series|episodes?|feature|tokyo2020/video|olympic/paris2024/video)/)+(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'skip': 'videos are only available for 7 days', 'skip': 'videos are only available for 7 days',
'url': 'https://tver.jp/episodes/ep83nf3w4p', 'url': 'https://tver.jp/episodes/ep83nf3w4p',
@@ -23,6 +23,20 @@ class TVerIE(InfoExtractor):
'channel': 'テレビ朝日', 'channel': 'テレビ朝日',
}, },
'add_ie': ['BrightcoveNew'], 'add_ie': ['BrightcoveNew'],
}, {
'url': 'https://tver.jp/olympic/paris2024/video/6359578055112/',
'info_dict': {
'id': '6359578055112',
'ext': 'mp4',
'title': '堀米雄斗 金メダルで五輪連覇!「みんなの応援が最後に乗れたカギ」',
'timestamp': 1722279928,
'upload_date': '20240729',
'tags': ['20240729', 'japanese', 'japanmedal', 'paris'],
'uploader_id': '4774017240001',
'thumbnail': r're:https?://[^/?#]+boltdns\.net/[^?#]+/1920x1080/match/image\.jpg',
'duration': 670.571,
},
'params': {'skip_download': 'm3u8'},
}, { }, {
'url': 'https://tver.jp/corner/f0103888', 'url': 'https://tver.jp/corner/f0103888',
'only_matching': True, 'only_matching': True,
@@ -47,7 +61,15 @@ def _real_initialize(self):
def _real_extract(self, url): def _real_extract(self, url):
video_id, video_type = self._match_valid_url(url).group('id', 'type') video_id, video_type = self._match_valid_url(url).group('id', 'type')
if video_type not in {'series', 'episodes'}:
if video_type == 'olympic/paris2024/video':
# Player ID is taken from .content.brightcove.E200.pro.pc.account_id:
# https://tver.jp/olympic/paris2024/req/api/hook?q=https%3A%2F%2Folympic-assets.tver.jp%2Fweb-static%2Fjson%2Fconfig.json&d=
return self.url_result(smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % ('4774017240001', video_id),
{'geo_countries': ['JP']}), 'BrightcoveNew')
elif video_type not in {'series', 'episodes'}:
webpage = self._download_webpage(url, video_id, note='Resolving to new URL') webpage = self._download_webpage(url, video_id, note='Resolving to new URL')
video_id = self._match_id(self._search_regex( video_id = self._match_id(self._search_regex(
(r'canonical"\s*href="(https?://tver\.jp/[^"]+)"', r'&link=(https?://tver\.jp/[^?&]+)[?&]'), (r'canonical"\s*href="(https?://tver\.jp/[^"]+)"', r'&link=(https?://tver\.jp/[^?&]+)[?&]'),

View File

@@ -49,6 +49,7 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'amazon\.(?:\w{2}\.)?\w+/gp/video', r'amazon\.(?:\w{2}\.)?\w+/gp/video',
r'music\.amazon\.(?:\w{2}\.)?\w+', r'music\.amazon\.(?:\w{2}\.)?\w+',
r'(?:watch|front)\.njpwworld\.com', r'(?:watch|front)\.njpwworld\.com',
r'qub\.ca/vrai',
) )
_TESTS = [{ _TESTS = [{
@@ -149,6 +150,9 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, { }, {
'url': 'https://front.njpwworld.com/p/s_series_00563_16_bs', 'url': 'https://front.njpwworld.com/p/s_series_00563_16_bs',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -212,16 +212,6 @@ def _parse_config(self, config, video_id):
owner = video_data.get('owner') or {} owner = video_data.get('owner') or {}
video_uploader_url = owner.get('url') video_uploader_url = owner.get('url')
duration = int_or_none(video_data.get('duration'))
chapter_data = try_get(config, lambda x: x['embed']['chapters']) or []
chapters = [{
'title': current_chapter.get('title'),
'start_time': current_chapter.get('timecode'),
'end_time': next_chapter.get('timecode'),
} for current_chapter, next_chapter in zip(chapter_data, chapter_data[1:] + [{'timecode': duration}])]
if chapters and chapters[0]['start_time']: # Chapters may not start from 0
chapters[:0] = [{'title': '<Untitled>', 'start_time': 0, 'end_time': chapters[0]['start_time']}]
return { return {
'id': str_or_none(video_data.get('id')) or video_id, 'id': str_or_none(video_data.get('id')) or video_id,
'title': video_title, 'title': video_title,
@@ -229,8 +219,12 @@ def _parse_config(self, config, video_id):
'uploader_id': video_uploader_url.split('/')[-1] if video_uploader_url else None, 'uploader_id': video_uploader_url.split('/')[-1] if video_uploader_url else None,
'uploader_url': video_uploader_url, 'uploader_url': video_uploader_url,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'duration': duration, 'duration': int_or_none(video_data.get('duration')),
'chapters': chapters or None, 'chapters': sorted(traverse_obj(config, (
'embed', 'chapters', lambda _, v: int(v['timecode']) is not None, {
'title': ('title', {str}),
'start_time': ('timecode', {int_or_none}),
})), key=lambda c: c['start_time']) or None,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'live_status': live_status, 'live_status': live_status,
@@ -708,6 +702,39 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
# chapters must be sorted, see: https://github.com/yt-dlp/yt-dlp/issues/5308
'url': 'https://player.vimeo.com/video/756714419',
'info_dict': {
'id': '756714419',
'ext': 'mp4',
'title': 'Dr Arielle Schwartz - Therapeutic yoga for optimum sleep',
'uploader': 'Alex Howard',
'uploader_id': 'user54729178',
'uploader_url': 'https://vimeo.com/user54729178',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d_1280',
'duration': 2636,
'chapters': [
{'start_time': 0, 'end_time': 10, 'title': '<Untitled Chapter 1>'},
{'start_time': 10, 'end_time': 106, 'title': 'Welcoming Dr Arielle Schwartz'},
{'start_time': 106, 'end_time': 305, 'title': 'What is therapeutic yoga?'},
{'start_time': 305, 'end_time': 594, 'title': 'Vagal toning practices'},
{'start_time': 594, 'end_time': 888, 'title': 'Trauma and difficulty letting go'},
{'start_time': 888, 'end_time': 1059, 'title': "Dr Schwartz' insomnia experience"},
{'start_time': 1059, 'end_time': 1471, 'title': 'A strategy for helping sleep issues'},
{'start_time': 1471, 'end_time': 1667, 'title': 'Yoga nidra'},
{'start_time': 1667, 'end_time': 2121, 'title': 'Wisdom in stillness'},
{'start_time': 2121, 'end_time': 2386, 'title': 'What helps us be more able to let go?'},
{'start_time': 2386, 'end_time': 2510, 'title': 'Practical tips to help ourselves'},
{'start_time': 2510, 'end_time': 2636, 'title': 'Where to find out more'},
],
},
'params': {
'http_headers': {'Referer': 'https://sleepsuperconference.com'},
'skip_download': 'm3u8',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
},
{ {
# user playlist alias -> https://vimeo.com/258705797 # user playlist alias -> https://vimeo.com/258705797
'url': 'https://vimeo.com/user26785108/newspiritualguide', 'url': 'https://vimeo.com/user26785108/newspiritualguide',
@@ -1240,7 +1267,7 @@ class VimeoGroupsIE(VimeoChannelIE): # XXX: Do not subclass from concrete IE
class VimeoReviewIE(VimeoBaseInfoExtractor): class VimeoReviewIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:review' IE_NAME = 'vimeo:review'
IE_DESC = 'Review pages on vimeo' IE_DESC = 'Review pages on vimeo'
_VALID_URL = r'(?P<url>https://vimeo\.com/[^/]+/review/(?P<id>[^/]+)/[0-9a-f]{10})' _VALID_URL = r'https?://vimeo\.com/(?P<user>[^/?#]+)/review/(?P<id>\d+)/(?P<hash>[\da-f]{10})'
_TESTS = [{ _TESTS = [{
'url': 'https://vimeo.com/user21297594/review/75524534/3c257a1b5d', 'url': 'https://vimeo.com/user21297594/review/75524534/3c257a1b5d',
'md5': 'c507a72f780cacc12b2248bb4006d253', 'md5': 'c507a72f780cacc12b2248bb4006d253',
@@ -1286,26 +1313,22 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
page_url, video_id = self._match_valid_url(url).groups() user, video_id, review_hash = self._match_valid_url(url).group('user', 'id', 'hash')
data = self._download_json( data_url = f'https://vimeo.com/{user}/review/data/{video_id}/{review_hash}'
page_url.replace('/review/', '/review/data/'), video_id) data = self._download_json(data_url, video_id)
if data.get('isLocked') is True: if data.get('isLocked') is True:
video_password = self._get_video_password() video_password = self._get_video_password()
viewer = self._download_json( viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id) 'https://vimeo.com/_rv/viewer', video_id)
webpage = self._verify_video_password(video_id, video_password, viewer['xsrft']) self._verify_video_password(video_id, video_password, viewer['xsrft'])
clip_page_config = self._parse_json(self._search_regex( data = self._download_json(data_url, video_id)
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});', clip_data = data['clipData']
webpage, 'clip page config'), video_id) config_url = clip_data['configUrl']
config_url = clip_page_config['player']['config_url']
clip_data = clip_page_config.get('clip') or {}
else:
clip_data = data['clipData']
config_url = clip_data['configUrl']
config = self._download_json(config_url, video_id) config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id) info_dict = self._parse_config(config, video_id)
source_format = self._extract_original_format( source_format = self._extract_original_format(
page_url + '/action', video_id) f'https://vimeo.com/{user}/review/{video_id}/{review_hash}/action', video_id,
unlisted_hash=traverse_obj(config_url, ({parse_qs}, 'h', -1)))
if source_format: if source_format:
info_dict['formats'].append(source_format) info_dict['formats'].append(source_format)
info_dict['description'] = clean_html(clip_data.get('description')) info_dict['description'] = clean_html(clip_data.get('description'))

View File

@@ -72,133 +72,169 @@
# any clients starting with _ cannot be explicitly requested by the user # any clients starting with _ cannot be explicitly requested by the user
INNERTUBE_CLIENTS = { INNERTUBE_CLIENTS = {
'web': { 'web': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'WEB', 'clientName': 'WEB',
'clientVersion': '2.20220801.00.00', 'clientVersion': '2.20240726.00.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
},
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
'web_safari': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20240726.00.00',
'userAgent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15,gzip(gfe)',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 1, 'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
}, },
'web_embedded': { 'web_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'WEB_EMBEDDED_PLAYER', 'clientName': 'WEB_EMBEDDED_PLAYER',
'clientVersion': '1.20220731.00.00', 'clientVersion': '1.20240723.01.00',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 56, 'INNERTUBE_CONTEXT_CLIENT_NAME': 56,
}, },
'web_music': { 'web_music': {
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_HOST': 'music.youtube.com', 'INNERTUBE_HOST': 'music.youtube.com',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'WEB_REMIX', 'clientName': 'WEB_REMIX',
'clientVersion': '1.20220727.01.00', 'clientVersion': '1.20240724.00.00',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 67, 'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
}, },
'web_creator': { 'web_creator': {
'INNERTUBE_API_KEY': 'AIzaSyBUPetSUmoZL-OhlxA7wSac5XinrygCqMo',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'WEB_CREATOR', 'clientName': 'WEB_CREATOR',
'clientVersion': '1.20220726.00.00', 'clientVersion': '1.20240723.03.00',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 62, 'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
}, },
'android': { 'android': {
'INNERTUBE_API_KEY': 'AIzaSyA8eiZmM1FaDVjRy-df2KTyQ_vz_yYM39w',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'ANDROID', 'clientName': 'ANDROID',
'clientVersion': '19.09.37', 'clientVersion': '19.29.37',
'androidSdkVersion': 30, 'androidSdkVersion': 30,
'userAgent': 'com.google.android.youtube/19.09.37 (Linux; U; Android 11) gzip', 'userAgent': 'com.google.android.youtube/19.29.37 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 3, 'INNERTUBE_CONTEXT_CLIENT_NAME': 3,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
'android_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyCjc_pVEDi4qsv5MtC2dMXzpIaDoRFLsxw',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_EMBEDDED_PLAYER',
'clientVersion': '19.09.37',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.youtube/19.09.37 (Linux; U; Android 11) gzip',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 55,
'REQUIRE_JS_PLAYER': False,
},
'android_music': { 'android_music': {
'INNERTUBE_API_KEY': 'AIzaSyAOghZGza2MQSZkY_zfZ370N-PUdXEo8AI',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'ANDROID_MUSIC', 'clientName': 'ANDROID_MUSIC',
'clientVersion': '6.42.52', 'clientVersion': '7.11.50',
'androidSdkVersion': 30, 'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.music/6.42.52 (Linux; U; Android 11) gzip', 'userAgent': 'com.google.android.apps.youtube.music/7.11.50 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 21, 'INNERTUBE_CONTEXT_CLIENT_NAME': 21,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
'android_creator': { 'android_creator': {
'INNERTUBE_API_KEY': 'AIzaSyD_qjV8zaaUMehtLkrKFgVeSX_Iqbtyws8',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'ANDROID_CREATOR', 'clientName': 'ANDROID_CREATOR',
'clientVersion': '22.30.100', 'clientVersion': '24.30.100',
'androidSdkVersion': 30, 'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.creator/22.30.100 (Linux; U; Android 11) gzip', 'userAgent': 'com.google.android.apps.youtube.creator/24.30.100 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 14, 'INNERTUBE_CONTEXT_CLIENT_NAME': 14,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
# YouTube Kids videos aren't returned on this client for some reason
'android_vr': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_VR',
'clientVersion': '1.57.29',
'deviceMake': 'Oculus',
'deviceModel': 'Quest 3',
'androidSdkVersion': 32,
'userAgent': 'com.google.android.apps.youtube.vr.oculus/1.57.29 (Linux; U; Android 12L; eureka-user Build/SQ3A.220605.009.A1) gzip',
'osName': 'Android',
'osVersion': '12L',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 28,
'REQUIRE_JS_PLAYER': False,
},
'android_testsuite': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_TESTSUITE',
'clientVersion': '1.9',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.youtube/1.9 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 30,
'REQUIRE_JS_PLAYER': False,
'PLAYER_PARAMS': '2AMB',
},
# This client only has legacy formats and storyboards
'android_producer': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_PRODUCER',
'clientVersion': '0.111.1',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.producer/0.111.1 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 91,
'REQUIRE_JS_PLAYER': False,
},
# iOS clients have HLS live streams. Setting device model to get 60fps formats. # iOS clients have HLS live streams. Setting device model to get 60fps formats.
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680#issuecomment-1002724558 # See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680#issuecomment-1002724558
'ios': { 'ios': {
'INNERTUBE_API_KEY': 'AIzaSyB-63vPrdThhKuerbB2N_l7Kwwcxj6yUAc',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'IOS', 'clientName': 'IOS',
'clientVersion': '19.09.3', 'clientVersion': '19.29.1',
'deviceModel': 'iPhone14,3', 'deviceMake': 'Apple',
'userAgent': 'com.google.ios.youtube/19.09.3 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)', 'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.youtube/19.29.1 (iPhone16,2; U; CPU iOS 17_5_1 like Mac OS X;)',
'osName': 'iPhone',
'osVersion': '17.5.1.21F90',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 5, 'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
'ios_embedded': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MESSAGES_EXTENSION',
'clientVersion': '19.09.3',
'deviceModel': 'iPhone14,3',
'userAgent': 'com.google.ios.youtube/19.09.3 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 66,
'REQUIRE_JS_PLAYER': False,
},
'ios_music': { 'ios_music': {
'INNERTUBE_API_KEY': 'AIzaSyBAETezhkwP0ZWA02RsqT1zu78Fpt0bC_s',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'IOS_MUSIC', 'clientName': 'IOS_MUSIC',
'clientVersion': '6.33.3', 'clientVersion': '7.08.2',
'deviceModel': 'iPhone14,3', 'deviceMake': 'Apple',
'userAgent': 'com.google.ios.youtubemusic/6.33.3 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)', 'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.youtubemusic/7.08.2 (iPhone16,2; U; CPU iOS 17_5_1 like Mac OS X;)',
'osName': 'iPhone',
'osVersion': '17.5.1.21F90',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 26, 'INNERTUBE_CONTEXT_CLIENT_NAME': 26,
@@ -208,9 +244,12 @@
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'IOS_CREATOR', 'clientName': 'IOS_CREATOR',
'clientVersion': '22.33.101', 'clientVersion': '24.30.100',
'deviceModel': 'iPhone14,3', 'deviceMake': 'Apple',
'userAgent': 'com.google.ios.ytcreator/22.33.101 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)', 'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.ytcreator/24.30.100 (iPhone16,2; U; CPU iOS 17_5_1 like Mac OS X;)',
'osName': 'iPhone',
'osVersion': '17.5.1.21F90',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 15, 'INNERTUBE_CONTEXT_CLIENT_NAME': 15,
@@ -219,19 +258,26 @@
# mweb has 'ultralow' formats # mweb has 'ultralow' formats
# See: https://github.com/yt-dlp/yt-dlp/pull/557 # See: https://github.com/yt-dlp/yt-dlp/pull/557
'mweb': { 'mweb': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'MWEB', 'clientName': 'MWEB',
'clientVersion': '2.20220801.00.00', 'clientVersion': '2.20240726.01.00',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 2, 'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
}, },
'tv': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'TVHTML5',
'clientVersion': '7.20240724.13.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
},
# This client can access age restricted videos (unless the uploader has disabled the 'allow embedding' option) # This client can access age restricted videos (unless the uploader has disabled the 'allow embedding' option)
# See: https://github.com/zerodytrash/YouTube-Internal-Clients # See: https://github.com/zerodytrash/YouTube-Internal-Clients
'tv_embedded': { 'tv_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'TVHTML5_SIMPLY_EMBEDDED_PLAYER', 'clientName': 'TVHTML5_SIMPLY_EMBEDDED_PLAYER',
@@ -249,6 +295,7 @@
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 95, 'INNERTUBE_CONTEXT_CLIENT_NAME': 95,
'REQUIRE_JS_PLAYER': False,
}, },
} }
@@ -262,7 +309,7 @@ def _split_innertube_client(client_name):
def short_client_name(client_name): def short_client_name(client_name):
main, *parts = _split_innertube_client(client_name)[0].replace('embedscreen', 'e_s').split('_') main, *parts = _split_innertube_client(client_name)[0].split('_')
return join_nonempty(main[:4], ''.join(x[0] for x in parts)).upper() return join_nonempty(main[:4], ''.join(x[0] for x in parts)).upper()
@@ -274,23 +321,18 @@ def build_innertube_clients():
priority = qualities(BASE_CLIENTS[::-1]) priority = qualities(BASE_CLIENTS[::-1])
for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()): for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()):
ytcfg.setdefault('INNERTUBE_API_KEY', 'AIzaSyDCU8hByM-4DrUqRUYnGn-3llEO78bcxq8')
ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com') ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com')
ytcfg.setdefault('REQUIRE_JS_PLAYER', True) ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
ytcfg.setdefault('PLAYER_PARAMS', None)
ytcfg['INNERTUBE_CONTEXT']['client'].setdefault('hl', 'en') ytcfg['INNERTUBE_CONTEXT']['client'].setdefault('hl', 'en')
_, base_client, variant = _split_innertube_client(client) _, base_client, variant = _split_innertube_client(client)
ytcfg['priority'] = 10 * priority(base_client) ytcfg['priority'] = 10 * priority(base_client)
if not variant: if variant == 'embedded':
INNERTUBE_CLIENTS[f'{client}_embedscreen'] = embedscreen = copy.deepcopy(ytcfg)
embedscreen['INNERTUBE_CONTEXT']['client']['clientScreen'] = 'EMBED'
embedscreen['INNERTUBE_CONTEXT']['thirdParty'] = THIRD_PARTY
embedscreen['priority'] -= 3
elif variant == 'embedded':
ytcfg['INNERTUBE_CONTEXT']['thirdParty'] = THIRD_PARTY ytcfg['INNERTUBE_CONTEXT']['thirdParty'] = THIRD_PARTY
ytcfg['priority'] -= 2 ytcfg['priority'] -= 2
else: elif variant:
ytcfg['priority'] -= 3 ytcfg['priority'] -= 3
@@ -566,9 +608,6 @@ def _select_api_hostname(self, req_api_hostname, default_client=None):
return (self._configuration_arg('innertube_host', [''], ie_key=YoutubeIE.ie_key())[0] return (self._configuration_arg('innertube_host', [''], ie_key=YoutubeIE.ie_key())[0]
or req_api_hostname or self._get_innertube_host(default_client or 'web')) or req_api_hostname or self._get_innertube_host(default_client or 'web'))
def _extract_api_key(self, ytcfg=None, default_client='web'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_API_KEY'], str, default_client)
def _extract_context(self, ytcfg=None, default_client='web'): def _extract_context(self, ytcfg=None, default_client='web'):
context = get_first( context = get_first(
(ytcfg, self._get_default_ytcfg(default_client)), 'INNERTUBE_CONTEXT', expected_type=dict) (ytcfg, self._get_default_ytcfg(default_client)), 'INNERTUBE_CONTEXT', expected_type=dict)
@@ -614,13 +653,15 @@ def _call_api(self, ep, query, video_id, fatal=True, headers=None,
real_headers.update({'content-type': 'application/json'}) real_headers.update({'content-type': 'application/json'})
if headers: if headers:
real_headers.update(headers) real_headers.update(headers)
api_key = (self._configuration_arg('innertube_key', [''], ie_key=YoutubeIE.ie_key(), casesense=True)[0]
or api_key or self._extract_api_key(default_client=default_client))
return self._download_json( return self._download_json(
f'https://{self._select_api_hostname(api_hostname, default_client)}/youtubei/v1/{ep}', f'https://{self._select_api_hostname(api_hostname, default_client)}/youtubei/v1/{ep}',
video_id=video_id, fatal=fatal, note=note, errnote=errnote, video_id=video_id, fatal=fatal, note=note, errnote=errnote,
data=json.dumps(data).encode('utf8'), headers=real_headers, data=json.dumps(data).encode('utf8'), headers=real_headers,
query={'key': api_key, 'prettyPrint': 'false'}) query=filter_dict({
'key': self._configuration_arg(
'innertube_key', [api_key], ie_key=YoutubeIE.ie_key(), casesense=True)[0],
'prettyPrint': 'false',
}, cndn=lambda _, v: v))
def extract_yt_initial_data(self, item_id, webpage, fatal=True): def extract_yt_initial_data(self, item_id, webpage, fatal=True):
return self._search_json(self._YT_INITIAL_DATA_RE, webpage, 'yt initial data', item_id, fatal=fatal) return self._search_json(self._YT_INITIAL_DATA_RE, webpage, 'yt initial data', item_id, fatal=fatal)
@@ -972,7 +1013,6 @@ def _extract_response(self, item_id, query, note='Downloading API JSON', headers
ep=ep, fatal=True, headers=headers, ep=ep, fatal=True, headers=headers,
video_id=item_id, query=query, note=note, video_id=item_id, query=query, note=note,
context=self._extract_context(ytcfg, default_client), context=self._extract_context(ytcfg, default_client),
api_key=self._extract_api_key(ytcfg, default_client),
api_hostname=api_hostname, default_client=default_client) api_hostname=api_hostname, default_client=default_client)
except ExtractorError as e: except ExtractorError as e:
if not isinstance(e.cause, network_exceptions): if not isinstance(e.cause, network_exceptions):
@@ -1295,6 +1335,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
} }
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt') _SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
_POTOKEN_EXPERIMENTS = ('51217476', '51217102') _POTOKEN_EXPERIMENTS = ('51217476', '51217102')
_BROKEN_CLIENTS = {
short_client_name(client): client
for client in ('android', 'android_creator', 'android_music')
}
_GEO_BYPASS = False _GEO_BYPASS = False
@@ -3129,12 +3173,35 @@ def _decrypt_nsig(self, s, video_id, player_url):
self.write_debug(f'Decrypted nsig {s} => {ret}') self.write_debug(f'Decrypted nsig {s} => {ret}')
return ret return ret
def _extract_n_function_name(self, jscode): def _extract_n_function_name(self, jscode, player_url=None):
# Examples (with placeholders nfunc, narray, idx):
# * .get("n"))&&(b=nfunc(b)
# * .get("n"))&&(b=narray[idx](b)
# * b=String.fromCharCode(110),c=a.get(b))&&c=narray[idx](c)
# * a.D&&(b="nn"[+a.D],c=a.get(b))&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
# * a.D&&(PL(a),b=a.j.n||null)&&(b=narray[0](b),a.set("n",b),narray.length||nfunc("")
funcname, idx = self._search_regex( funcname, idx = self._search_regex(
r'''(?x)(?:\.get\("n"\)\)&&\(b=|b=String\.fromCharCode\(110\),c=a\.get\(b\)\)&&\(c=) r'''(?x)
(?P<nfunc>[a-zA-Z0-9$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z0-9]\)''', (?:
jscode, 'Initial JS player n function name', group=('nfunc', 'idx')) \.get\("n"\)\)&&\(b=|
if not idx: (?:
b=String\.fromCharCode\(110\)|
(?P<str_idx>[a-zA-Z0-9_$.]+)&&\(b="nn"\[\+(?P=str_idx)\]
),c=a\.get\(b\)\)&&\(c=|
\b(?P<var>[a-zA-Z0-9_$]+)=
)(?P<nfunc>[a-zA-Z0-9_$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z]\)
(?(var),[a-zA-Z0-9_$]+\.set\("n"\,(?P=var)\),(?P=nfunc)\.length)''',
jscode, 'n function name', group=('nfunc', 'idx'), default=(None, None))
if not funcname:
self.report_warning(join_nonempty(
'Falling back to generic n function search',
player_url and f' player = {player_url}', delim='\n'))
return self._search_regex(
r'''(?xs)
;\s*(?P<name>[a-zA-Z0-9_$]+)\s*=\s*function\([a-zA-Z0-9_$]+\)
\s*\{(?:(?!};).)+?["']enhanced_except_''',
jscode, 'Initial JS player n function name', group='name')
elif not idx:
return funcname return funcname
return json.loads(js_to_json(self._search_regex( return json.loads(js_to_json(self._search_regex(
@@ -3150,7 +3217,7 @@ def _extract_n_function_code(self, video_id, player_url):
if func_code: if func_code:
return jsi, player_id, func_code return jsi, player_id, func_code
func_name = self._extract_n_function_name(jscode) func_name = self._extract_n_function_name(jscode, player_url=player_url)
func_code = jsi.extract_function_code(func_name) func_code = jsi.extract_function_code(func_name)
@@ -3654,9 +3721,10 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
'videoId': video_id, 'videoId': video_id,
} }
pp_arg = self._configuration_arg('player_params', [None], casesense=True)[0] default_pp = traverse_obj(
if pp_arg: INNERTUBE_CLIENTS, (_split_innertube_client(client)[0], 'PLAYER_PARAMS', {str}))
yt_query['params'] = pp_arg if player_params := self._configuration_arg('player_params', [default_pp], casesense=True)[0]:
yt_query['params'] = player_params
yt_query.update(self._generate_player_context(sts)) yt_query.update(self._generate_player_context(sts))
return self._extract_response( return self._extract_response(
@@ -3668,8 +3736,8 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
def _get_requested_clients(self, url, smuggled_data): def _get_requested_clients(self, url, smuggled_data):
requested_clients = [] requested_clients = []
android_clients = [] broken_clients = []
default = ['ios', 'web'] default = ['ios', 'tv']
allowed_clients = sorted( allowed_clients = sorted(
(client for client in INNERTUBE_CLIENTS if client[:1] != '_'), (client for client in INNERTUBE_CLIENTS if client[:1] != '_'),
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True) key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
@@ -3680,18 +3748,21 @@ def _get_requested_clients(self, url, smuggled_data):
requested_clients.extend(allowed_clients) requested_clients.extend(allowed_clients)
elif client not in allowed_clients: elif client not in allowed_clients:
self.report_warning(f'Skipping unsupported client {client}') self.report_warning(f'Skipping unsupported client {client}')
elif client.startswith('android'): elif client in self._BROKEN_CLIENTS.values():
android_clients.append(client) broken_clients.append(client)
else: else:
requested_clients.append(client) requested_clients.append(client)
# Force deprioritization of broken Android clients for format de-duplication # Force deprioritization of _BROKEN_CLIENTS for format de-duplication
requested_clients.extend(android_clients) requested_clients.extend(broken_clients)
if not requested_clients: if not requested_clients:
requested_clients = default requested_clients = default
if smuggled_data.get('is_music_url') or self.is_music_url(url): if smuggled_data.get('is_music_url') or self.is_music_url(url):
requested_clients.extend( for requested_client in requested_clients:
f'{client}_music' for client in requested_clients if f'{client}_music' in INNERTUBE_CLIENTS) _, base_client, variant = _split_innertube_client(requested_client)
music_client = f'{base_client}_music'
if variant != 'music' and music_client in INNERTUBE_CLIENTS:
requested_clients.append(music_client)
return orderedSet(requested_clients) return orderedSet(requested_clients)
@@ -3785,14 +3856,27 @@ def append_client(*client_names):
f[STREAMING_DATA_CLIENT_NAME] = name f[STREAMING_DATA_CLIENT_NAME] = name
prs.append(pr) prs.append(pr)
# creator clients can bypass AGE_VERIFICATION_REQUIRED if logged in # tv_embedded can work around age-gate and age-verification IF the video is embeddable
if variant == 'embedded' and self._is_unplayable(pr) and self.is_authenticated: if self._is_agegated(pr) and variant != 'tv_embedded':
append_client(f'{base_client}_creator') append_client(f'tv_embedded.{base_client}')
elif self._is_agegated(pr):
if variant == 'tv_embedded': # Unauthenticated users will only get tv_embedded client formats if age-gated
append_client(f'{base_client}_embedded') if self._is_agegated(pr) and not self.is_authenticated:
elif not variant: self.to_screen(
append_client(f'tv_embedded.{base_client}', f'{base_client}_embedded') f'{video_id}: This video is age-restricted; some formats may be missing '
f'without authentication. {self._login_hint()}', only_once=True)
# EU countries require age-verification for accounts to access age-restricted videos
# If account is not age-verified, _is_agegated() will be truthy for non-embedded clients
# If embedding is disabled for the video, _is_unplayable() will be truthy for tv_embedded
embedding_is_disabled = variant == 'tv_embedded' and self._is_unplayable(pr)
if self.is_authenticated and (self._is_agegated(pr) or embedding_is_disabled):
self.to_screen(
f'{video_id}: This video is age-restricted and YouTube is requiring '
'account age-verification; some formats may be missing', only_once=True)
# web_creator and mediaconnect can work around the age-verification requirement
# _producer, _testsuite, & _vr variants can also work around age-verification
append_client('web_creator', 'mediaconnect')
if skipped_clients: if skipped_clients:
self.report_warning( self.report_warning(
@@ -3928,13 +4012,13 @@ def build_fragments(f):
f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True) f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True)
client_name = fmt.get(STREAMING_DATA_CLIENT_NAME) client_name = fmt.get(STREAMING_DATA_CLIENT_NAME)
# Android client formats are broken due to integrity check enforcement # _BROKEN_CLIENTS return videoplayback URLs that expire after 30 seconds
# Ref: https://github.com/yt-dlp/yt-dlp/issues/9554 # Ref: https://github.com/yt-dlp/yt-dlp/issues/9554
is_broken = client_name and client_name.startswith(short_client_name('android')) is_broken = client_name in self._BROKEN_CLIENTS
if is_broken: if is_broken:
self.report_warning( self.report_warning(
f'{video_id}: Android client formats are broken and may yield HTTP Error 403. ' f'{video_id}: {self._BROKEN_CLIENTS[client_name]} client formats are broken '
'They will be deprioritized', only_once=True) 'and may yield HTTP Error 403. They will be deprioritized', only_once=True)
name = fmt.get('qualityLabel') or quality.replace('audio_quality_', '') or '' name = fmt.get('qualityLabel') or quality.replace('audio_quality_', '') or ''
fps = int_or_none(fmt.get('fps')) or 0 fps = int_or_none(fmt.get('fps')) or 0

View File

@@ -1217,7 +1217,7 @@ def unified_timestamp(date_str, day_first=True):
return None return None
date_str = re.sub(r'\s+', ' ', re.sub( date_str = re.sub(r'\s+', ' ', re.sub(
r'(?i)[,|]|(mon|tues?|wed(nes)?|thu(rs)?|fri|sat(ur)?)(day)?', '', date_str)) r'(?i)[,|]|(mon|tues?|wed(nes)?|thu(rs)?|fri|sat(ur)?|sun)(day)?', '', date_str))
pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0 pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0
timezone, date_str = extract_timezone(date_str) timezone, date_str = extract_timezone(date_str)

View File

@@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2024.07.16' __version__ = '2024.08.01'
RELEASE_GIT_HEAD = '89a161e8c62569a662deda1c948664152efcb6b4' RELEASE_GIT_HEAD = 'ffd7781d6588926f820b44a34b9e6e3068fb9f97'
VARIANT = None VARIANT = None
@@ -12,4 +12,4 @@
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2024.07.16' _pkg_version = '2024.08.01'