1
0
mirror of https://github.com/yt-dlp/yt-dlp synced 2025-12-17 06:35:42 +07:00

Compare commits

..

35 Commits

Author SHA1 Message Date
github-actions
cc0619f62d Release 2023.07.06
Created by: pukkandan

:ci skip all :ci run dl
2023-07-06 18:57:59 +00:00
pukkandan
b532a34810 [docs] Minor fixes
Closes #7515
2023-07-06 23:32:19 +05:30
Simon Sawicki
3121512228 [core] Change how Cookie headers are handled
Cookies are now saved and loaded under `cookies` key in the info dict
instead of `http_headers.Cookie`. Cookies passed in headers are
auto-scoped to the input URLs with a warning.

Ref: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj

Authored by: Grub4K
2023-07-06 23:14:39 +05:30
coletdjnz
f8b4bcc0a7 [core] Prevent Cookie leaks on HTTP redirect
Ref: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj

Authored by: coletdjnz
2023-07-06 23:14:39 +05:30
bashonly
1ceb657bdd [fd/external] Scope cookies
- ffmpeg: Calculate cookies from cookiejar and pass with `-cookies` arg instead of `-headers`
- aria2c, curl, wget: Write cookiejar to file and use external FD built-in cookiejar support
- httpie: Calculate cookies from cookiejar instead of `http_headers`
- axel: Calculate cookies from cookiejar and disable http redirection if cookies are passed
    - May break redirects, but axel simply don't have proper cookie support

Ref: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj

Authored by: bashonly, coletdjnz
2023-07-06 23:14:38 +05:30
pukkandan
ad8902f616 [ie/vidlii] Handle relative URLs
Closes #7480
2023-07-06 21:40:09 +05:30
pukkandan
94ed638a43 [ie/youtube] Avoid false DRM detection (#7396)
Some master manifests contain a mix of DRM and non-DRM formats
2023-07-06 21:40:07 +05:30
pukkandan
bc344cd456 [core] Allow extractors to mark formats as potentially DRM (#7396)
This is useful for HLS where detecting whether the format is
actually DRM requires the child manifest to be downloaded.

Makes the error message when using `--test` inconsistent,
but doesn't really matter.
2023-07-06 21:40:01 +05:30
pukkandan
906c0bdcd8 [formats] Fix best fallback for storyboards
Partial fix for #7478
2023-07-06 21:39:58 +05:30
pukkandan
337734d4a8 [cleanup] Misc 2023-07-06 21:39:55 +05:30
pukkandan
fa44802809 [devscripts/make_changelog] Skip reverted commits 2023-07-06 20:22:04 +05:30
pukkandan
47bcd43724 [outtmpl] Pad playlist_index etc even when with internal formatting
Closes #7501
2023-07-06 20:22:03 +05:30
pukkandan
662ef1e910 [downloader/http] Avoid infinite loop when no data is received
Closes #7504
2023-07-06 20:22:00 +05:30
Jorge
6355b5f1e1 [misc] Add CodeQL workflow (#7497) 2023-07-06 20:21:46 +05:30
coletdjnz
90db9a3c00 [extractor/youtube:stories] Remove (#7459)
YouTube killed them

https://web.archive.org/web/20230630153050/https://support.google.com/youtube/thread/217640760
2023-07-06 19:02:41 +05:30
bashonly
49296437a8 [extractor/twitter] Fix unauthenticated extraction (#7476)
Closes #7473
Authored by: bashonly
2023-07-05 16:27:36 +00:00
bashonly
1cffd621cb [extractor/twitter:spaces] Fix extraction (#7512)
Closes #7455
Authored by: bashonly
2023-07-05 03:05:52 +00:00
RfadnjdExt
3b7f5300c5 [extractor/googledrive] Fix source format extraction (#7395)
Closes #7344
Authored by: RfadnjdExt
2023-07-05 02:17:13 +00:00
coletdjnz
4dc4d8473c [extractor/youtube] Ignore incomplete data for comment threads by default (#7475)
For both `--ignore-errors` and `--ignore-errors only_download`. Pass `--no-ignore-errors` to not ignore.

Closes https://github.com/yt-dlp/yt-dlp/issues/7474

Authored by: coletdjnz
2023-07-03 10:47:10 +00:00
c-basalt
8776349ef6 [extractor/vk] VKPlay, VKPlayLive: Add extractors (#7358)
Closes #7107
Authored by: c-basalt
2023-07-02 19:31:00 +00:00
urectanc
af1fd12f67 [extractor/stacommu] Add extractors (#7432)
Authored by: urectanc
2023-06-30 18:27:07 +00:00
coletdjnz
fcbc9ed760 [extractor/youtube:tab] Support shorts-only playlists (#7425)
Fixes https://github.com/yt-dlp/yt-dlp/issues/7424

Authored by: coletdjnz
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
2023-06-29 23:26:27 +00:00
bashonly
a2be9781fb [extractor/Douyin] Fix extraction from webpage
Closes #7431
Authored by: bashonly
2023-06-27 16:50:02 -05:00
Xiao Han
8f05fbae2a [extractor/abc] Fix extraction (#7434)
Closes #6433
Authored by: meliber
2023-06-27 21:16:57 +00:00
Aman Salwan
5b4b92769a [extractor/crunchyroll:music] Fix _VALID_URL (#7439)
Closes #7419
Authored by: AmanSal1, rdamas

Co-authored-by: Robert Damas <robert.damas@byom.de>
2023-06-27 20:28:23 +00:00
pukkandan
91302ed349 [utils] clean_podcast_url: Handle protocol in redirect URL
Closes #7430
2023-06-26 16:19:49 +05:30
pukkandan
f393bbe724 [extractor/sbs] Python 3.7 compat
Closes #7410
2023-06-26 16:14:20 +05:30
pukkandan
8a8af356e3 [downloader/aria2c] Add --no-conf
Closes #7404
2023-06-26 16:13:31 +05:30
pukkandan
d949c10c45 [extractor/youtube] Process post_live over 2 hours 2023-06-26 07:25:52 +05:30
bashonly
ef8509c300 [extractor/kick] Fix _VALID_URL
Closes #7384
Authored by: bashonly
2023-06-25 17:04:42 -05:00
nnoboa
5e16cf92eb [extractor/AdultSwim] Extract subtitles from m3u8 (#7421)
Authored by: nnoboa
Closes #6191
2023-06-26 01:52:38 +05:30
bashonly
f0a1ff1181 [extractor/qdance] Add extractor (#7420)
Closes #7385
Authored by: bashonly
2023-06-25 18:13:28 +00:00
pukkandan
58786a10f2 [extractor/youtube] Add extractor-arg formats
Closes #7417
2023-06-25 20:14:37 +05:30
pukkandan
e59e20744e Bugfix for b4e0d75848 2023-06-22 23:45:53 +05:30
Simon
89bed01374 [extractor/youtube] Fix comments' is_favorited (#7390)
Authored by: bbilly1
Closes #7389
2023-06-22 23:38:42 +05:30
46 changed files with 1312 additions and 347 deletions

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm reporting that yt-dlp is broken on a **supported** site
required: true
- label: I've verified that I'm running yt-dlp version **2023.06.22** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2023.07.06** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -64,7 +64,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.06.22 [9d339c4] (win32_exe)
[debug] yt-dlp version 2023.07.06 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -72,8 +72,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.06.22, Current version: 2023.06.22
yt-dlp is up to date (2023.06.22)
Latest version: 2023.07.06, Current version: 2023.07.06
yt-dlp is up to date (2023.07.06)
<more lines>
render: shell
validations:

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm reporting a new site support request
required: true
- label: I've verified that I'm running yt-dlp version **2023.06.22** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2023.07.06** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -76,7 +76,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.06.22 [9d339c4] (win32_exe)
[debug] yt-dlp version 2023.07.06 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -84,8 +84,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.06.22, Current version: 2023.06.22
yt-dlp is up to date (2023.06.22)
Latest version: 2023.07.06, Current version: 2023.07.06
yt-dlp is up to date (2023.07.06)
<more lines>
render: shell
validations:

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm requesting a site-specific feature
required: true
- label: I've verified that I'm running yt-dlp version **2023.06.22** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2023.07.06** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -72,7 +72,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.06.22 [9d339c4] (win32_exe)
[debug] yt-dlp version 2023.07.06 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -80,8 +80,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.06.22, Current version: 2023.06.22
yt-dlp is up to date (2023.06.22)
Latest version: 2023.07.06, Current version: 2023.07.06
yt-dlp is up to date (2023.07.06)
<more lines>
render: shell
validations:

View File

@@ -18,7 +18,7 @@ body:
options:
- label: I'm reporting a bug unrelated to a specific site
required: true
- label: I've verified that I'm running yt-dlp version **2023.06.22** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2023.07.06** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've checked that all provided URLs are playable in a browser with the same IP and same login details
required: true
@@ -57,7 +57,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.06.22 [9d339c4] (win32_exe)
[debug] yt-dlp version 2023.07.06 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -65,8 +65,8 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.06.22, Current version: 2023.06.22
yt-dlp is up to date (2023.06.22)
Latest version: 2023.07.06, Current version: 2023.07.06
yt-dlp is up to date (2023.07.06)
<more lines>
render: shell
validations:

View File

@@ -20,7 +20,7 @@ body:
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **2023.06.22** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2023.07.06** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar issues **including closed ones**. DO NOT post duplicates
required: true
@@ -53,7 +53,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.06.22 [9d339c4] (win32_exe)
[debug] yt-dlp version 2023.07.06 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -61,7 +61,7 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.06.22, Current version: 2023.06.22
yt-dlp is up to date (2023.06.22)
Latest version: 2023.07.06, Current version: 2023.07.06
yt-dlp is up to date (2023.07.06)
<more lines>
render: shell

View File

@@ -26,7 +26,7 @@ body:
required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true
- label: I've verified that I'm running yt-dlp version **2023.06.22** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
- label: I've verified that I'm running yt-dlp version **2023.07.06** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true
- label: I've searched [known issues](https://github.com/yt-dlp/yt-dlp/issues/3766) and the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions **including closed ones**. DO NOT post duplicates
required: true
@@ -59,7 +59,7 @@ body:
[debug] Command-line config: ['-vU', 'test:youtube']
[debug] Portable config "yt-dlp.conf": ['-i']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.06.22 [9d339c4] (win32_exe)
[debug] yt-dlp version 2023.07.06 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
@@ -67,7 +67,7 @@ body:
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.06.22, Current version: 2023.06.22
yt-dlp is up to date (2023.06.22)
Latest version: 2023.07.06, Current version: 2023.07.06
yt-dlp is up to date (2023.07.06)
<more lines>
render: shell

65
.github/workflows/codeql.yml vendored Normal file
View File

@@ -0,0 +1,65 @@
name: "CodeQL"
on:
push:
branches: [ 'master', 'gh-pages', 'release' ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ 'master' ]
schedule:
- cron: '59 11 * * 5'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Use only 'java' to analyze code written in Java, Kotlin or both
# Use only 'javascript' to analyze code written in JavaScript, TypeScript or both
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
steps:
- name: Checkout repository
uses: actions/checkout@v3
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2
# Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"

View File

@@ -460,3 +460,10 @@ berkanteber
OverlordQ
rexlambert22
Ti4eeT4e
AmanSal1
bbilly1
meliber
nnoboa
rdamas
RfadnjdExt
urectanc

View File

@@ -4,11 +4,65 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2023.07.06
#### Important changes
- Security: [[CVE-2023-35934](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-35934)] Fix [Cookie leak](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj)
- `--add-header Cookie:` is deprecated and auto-scoped to input URL domains
- Cookies are scoped when passed to external downloaders
- Add `cookie` field to info.json and deprecate `http_headers.Cookie`
#### Core changes
- [Allow extractors to mark formats as potentially DRM](https://github.com/yt-dlp/yt-dlp/commit/bc344cd456380999c1ee74554dfd432a38f32ec7) ([#7396](https://github.com/yt-dlp/yt-dlp/issues/7396)) by [pukkandan](https://github.com/pukkandan)
- [Bugfix for b4e0d75848e9447cee2cd3646ce54d4744a7ff56](https://github.com/yt-dlp/yt-dlp/commit/e59e20744eb32ce4b6ea0dece7c673be8376a710) by [pukkandan](https://github.com/pukkandan)
- [Change how `Cookie` headers are handled](https://github.com/yt-dlp/yt-dlp/commit/3121512228487c9c690d3d39bfd2579addf96e07) by [Grub4K](https://github.com/Grub4K)
- [Prevent `Cookie` leaks on HTTP redirect](https://github.com/yt-dlp/yt-dlp/commit/f8b4bcc0a791274223723488bfbfc23ea3276641) by [coletdjnz](https://github.com/coletdjnz)
- **formats**: [Fix best fallback for storyboards](https://github.com/yt-dlp/yt-dlp/commit/906c0bdcd8974340d619e99ccd613c163eb0d0c2) by [pukkandan](https://github.com/pukkandan)
- **outtmpl**: [Pad `playlist_index` etc even when with internal formatting](https://github.com/yt-dlp/yt-dlp/commit/47bcd437247152e0af5b3ebc5592db7bb66855c2) by [pukkandan](https://github.com/pukkandan)
- **utils**: clean_podcast_url: [Handle protocol in redirect URL](https://github.com/yt-dlp/yt-dlp/commit/91302ed349f34dc26cc1d661bb45a4b71f4417f7) by [pukkandan](https://github.com/pukkandan)
#### Extractor changes
- **abc**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/8f05fbae2a79ce0713077ccc68b354e63216bf20) ([#7434](https://github.com/yt-dlp/yt-dlp/issues/7434)) by [meliber](https://github.com/meliber)
- **AdultSwim**: [Extract subtitles from m3u8](https://github.com/yt-dlp/yt-dlp/commit/5e16cf92eb496b7c1541a6b1d727cb87542984db) ([#7421](https://github.com/yt-dlp/yt-dlp/issues/7421)) by [nnoboa](https://github.com/nnoboa)
- **crunchyroll**: music: [Fix `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/5b4b92769afcc398475e481bfa839f1158902fe9) ([#7439](https://github.com/yt-dlp/yt-dlp/issues/7439)) by [AmanSal1](https://github.com/AmanSal1), [rdamas](https://github.com/rdamas)
- **Douyin**: [Fix extraction from webpage](https://github.com/yt-dlp/yt-dlp/commit/a2be9781fbf4d7e4db245c277ca2ecc41cf3a7b2) by [bashonly](https://github.com/bashonly)
- **googledrive**: [Fix source format extraction](https://github.com/yt-dlp/yt-dlp/commit/3b7f5300c577fef40464d46d4e4037a69d51fe82) ([#7395](https://github.com/yt-dlp/yt-dlp/issues/7395)) by [RfadnjdExt](https://github.com/RfadnjdExt)
- **kick**: [Fix `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/ef8509c300ea50da86aea447eb214d3d6f6db6bb) by [bashonly](https://github.com/bashonly)
- **qdance**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/f0a1ff118145b6449982ba401f9a9f656ecd8062) ([#7420](https://github.com/yt-dlp/yt-dlp/issues/7420)) by [bashonly](https://github.com/bashonly)
- **sbs**: [Python 3.7 compat](https://github.com/yt-dlp/yt-dlp/commit/f393bbe724b1fc6c7f754a5da507e807b2b40ad2) by [pukkandan](https://github.com/pukkandan)
- **stacommu**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/af1fd12f675220df6793fc019dff320bc76e8080) ([#7432](https://github.com/yt-dlp/yt-dlp/issues/7432)) by [urectanc](https://github.com/urectanc)
- **twitter**
- [Fix unauthenticated extraction](https://github.com/yt-dlp/yt-dlp/commit/49296437a8e5fa91dacb5446e51ab588474c85d3) ([#7476](https://github.com/yt-dlp/yt-dlp/issues/7476)) by [bashonly](https://github.com/bashonly)
- spaces: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/1cffd621cb371f1563563cfb2fe37d137e8a7bee) ([#7512](https://github.com/yt-dlp/yt-dlp/issues/7512)) by [bashonly](https://github.com/bashonly)
- **vidlii**: [Handle relative URLs](https://github.com/yt-dlp/yt-dlp/commit/ad8902f616ad2541f9b9626738f1393fad89a64c) by [pukkandan](https://github.com/pukkandan)
- **vk**: VKPlay, VKPlayLive: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/8776349ef6b1f644584a92dfa00a05208a48edc4) ([#7358](https://github.com/yt-dlp/yt-dlp/issues/7358)) by [c-basalt](https://github.com/c-basalt)
- **youtube**
- [Add extractor-arg `formats`](https://github.com/yt-dlp/yt-dlp/commit/58786a10f212bd63f9ad1d0b4d9e4d31c3b385e2) by [pukkandan](https://github.com/pukkandan)
- [Avoid false DRM detection](https://github.com/yt-dlp/yt-dlp/commit/94ed638a437fc766699d440e978982e24ce6a30a) ([#7396](https://github.com/yt-dlp/yt-dlp/issues/7396)) by [pukkandan](https://github.com/pukkandan)
- [Fix comments' `is_favorited`](https://github.com/yt-dlp/yt-dlp/commit/89bed013741a776506f60380b7fd89d27d0710b4) ([#7390](https://github.com/yt-dlp/yt-dlp/issues/7390)) by [bbilly1](https://github.com/bbilly1)
- [Ignore incomplete data for comment threads by default](https://github.com/yt-dlp/yt-dlp/commit/4dc4d8473c085900edc841c87c20041233d25b1f) ([#7475](https://github.com/yt-dlp/yt-dlp/issues/7475)) by [coletdjnz](https://github.com/coletdjnz)
- [Process `post_live` over 2 hours](https://github.com/yt-dlp/yt-dlp/commit/d949c10c45bfc359bdacd52e6a180169b8128958) by [pukkandan](https://github.com/pukkandan)
- stories: [Remove](https://github.com/yt-dlp/yt-dlp/commit/90db9a3c00ca80492c6a58c542e4cbf4c2710866) ([#7459](https://github.com/yt-dlp/yt-dlp/issues/7459)) by [pukkandan](https://github.com/pukkandan)
- tab: [Support shorts-only playlists](https://github.com/yt-dlp/yt-dlp/commit/fcbc9ed760be6e3455bbadfaf277b4504b06f068) ([#7425](https://github.com/yt-dlp/yt-dlp/issues/7425)) by [coletdjnz](https://github.com/coletdjnz)
#### Downloader changes
- **aria2c**: [Add `--no-conf`](https://github.com/yt-dlp/yt-dlp/commit/8a8af356e3bba98a7f7d333aff0777d5d92130c8) by [pukkandan](https://github.com/pukkandan)
- **external**: [Scope cookies](https://github.com/yt-dlp/yt-dlp/commit/1ceb657bdd254ad961489e5060f2ccc7d556b729) by [bashonly](https://github.com/bashonly), [coletdjnz](https://github.com/coletdjnz)
- **http**: [Avoid infinite loop when no data is received](https://github.com/yt-dlp/yt-dlp/commit/662ef1e910b72e57957f06589925b2332ba52821) by [pukkandan](https://github.com/pukkandan)
#### Misc. changes
- [Add CodeQL workflow](https://github.com/yt-dlp/yt-dlp/commit/6355b5f1e1e8e7f4ef866d71d51e03baf0e82f17) ([#7497](https://github.com/yt-dlp/yt-dlp/issues/7497)) by [pukkandan](https://github.com/pukkandan)
- **cleanup**: Miscellaneous: [337734d](https://github.com/yt-dlp/yt-dlp/commit/337734d4a8a6500bc65434843db346b5cbd05e81) by [pukkandan](https://github.com/pukkandan)
- **docs**: [Minor fixes](https://github.com/yt-dlp/yt-dlp/commit/b532a3481046e1eabb6232ee8196fb696c356ff6) by [pukkandan](https://github.com/pukkandan)
- **make_changelog**: [Skip reverted commits](https://github.com/yt-dlp/yt-dlp/commit/fa44802809d189fca0f4782263d48d6533384503) by [pukkandan](https://github.com/pukkandan)
### 2023.06.22
#### Core changes
- [Fix bug in db3ad8a67661d7b234a6954d9c6a4a9b1749f5eb](https://github.com/yt-dlp/yt-dlp/commit/d7cd97e8d8d42b500fea9abb2aa4ac9b0f98b2ad) by [pukkandan](https://github.com/pukkandan)
- [Improve `--download-sections`](https://github.com/yt-dlp/yt-dlp/commit/b4e0d75848e9447cee2cd3646ce54d4744a7ff56) by [pukkandan](https://github.com/pukkandan)
- Support negative time-ranges
- Add `*from-url` to obey time-ranges in URL
- [Indicate `filesize` approximated from `tbr` better](https://github.com/yt-dlp/yt-dlp/commit/0dff8e4d1e6e9fb938f4256ea9af7d81f42fd54f) by [pukkandan](https://github.com/pukkandan)
#### Extractor changes
@@ -19,7 +73,7 @@ #### Extractor changes
- **nebula**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/3f756c8c4095b942cf49788eb0862ceaf57847f2) ([#7156](https://github.com/yt-dlp/yt-dlp/issues/7156)) by [Lamieur](https://github.com/Lamieur), [rohieb](https://github.com/rohieb)
- **rheinmaintv**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/98cb1eda7a4cf67c96078980dbd63e6c06ad7f7c) ([#7311](https://github.com/yt-dlp/yt-dlp/issues/7311)) by [barthelmannk](https://github.com/barthelmannk)
- **youtube**
- [Add `ios` to default clients used](https://github.com/yt-dlp/yt-dlp/commit/1e75d97db21152acc764b30a688e516f04b8a142)
- [Add `ios` to default clients used](https://github.com/yt-dlp/yt-dlp/commit/1e75d97db21152acc764b30a688e516f04b8a142) by [pukkandan](https://github.com/pukkandan)
- IOS is affected neither by 403 nor by nsig so helps mitigate them preemptively
- IOS also has higher bit-rate 'premium' formats though they are not labeled as such
- [Improve description parsing performance](https://github.com/yt-dlp/yt-dlp/commit/71dc18fa29263a1ff0472c23d81bfc8dd4422d48) ([#7315](https://github.com/yt-dlp/yt-dlp/issues/7315)) by [berkanteber](https://github.com/berkanteber), [pukkandan](https://github.com/pukkandan)
@@ -27,7 +81,7 @@ #### Extractor changes
- [Workaround 403 for android formats](https://github.com/yt-dlp/yt-dlp/commit/81ca451480051d7ce1a31c017e005358345a9149) by [pukkandan](https://github.com/pukkandan)
#### Misc. changes
- [Revert "Add automatic duplicate issue detection"](https://github.com/yt-dlp/yt-dlp/commit/a4486bfc1dc7057efca9dd3fe70d7fa25c56f700)
- [Revert "Add automatic duplicate issue detection"](https://github.com/yt-dlp/yt-dlp/commit/a4486bfc1dc7057efca9dd3fe70d7fa25c56f700) by [pukkandan](https://github.com/pukkandan)
- **cleanup**
- Miscellaneous
- [7f9c6a6](https://github.com/yt-dlp/yt-dlp/commit/7f9c6a63b16e145495479e9f666f5b9e2ee69e2f) by [bashonly](https://github.com/bashonly)

View File

@@ -12,7 +12,7 @@
[![License: Unlicense](https://img.shields.io/badge/-Unlicense-blue.svg?style=for-the-badge)](LICENSE "License")
[![CI Status](https://img.shields.io/github/actions/workflow/status/yt-dlp/yt-dlp/core.yml?branch=master&label=Tests&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/actions "CI Status")
[![Commits](https://img.shields.io/github/commit-activity/m/yt-dlp/yt-dlp?label=commits&style=for-the-badge)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge&display_timestamp=committer)](https://github.com/yt-dlp/yt-dlp/commits "Commit History")
[![Last Commit](https://img.shields.io/github/last-commit/yt-dlp/yt-dlp/master?label=&style=for-the-badge&display_timestamp=committer)](https://github.com/yt-dlp/yt-dlp/pulse/monthly "Last activity")
</div>
<!-- MANPAGE: END EXCLUDED SECTION -->
@@ -76,7 +76,7 @@
# NEW FEATURES
* Forked from [**yt-dlc@f9401f2**](https://github.com/blackjack4494/yt-dlc/commit/f9401f2a91987068139c5f757b12fc711d4c0cee) and merged with [**youtube-dl@42f2d4**](https://github.com/yt-dlp/yt-dlp/commit/42f2d4) ([exceptions](https://github.com/yt-dlp/yt-dlp/issues/21))
* Forked from [**yt-dlc@f9401f2**](https://github.com/blackjack4494/yt-dlc/commit/f9401f2a91987068139c5f757b12fc711d4c0cee) and merged with [**youtube-dl@42f2d4**](https://github.com/ytdl-org/youtube-dl/commit/07af47960f3bb262ead02490ce65c8c45c01741e) ([exceptions](https://github.com/yt-dlp/yt-dlp/issues/21))
* **[SponsorBlock Integration](#sponsorblock-options)**: You can mark/remove sponsor sections in YouTube videos by utilizing the [SponsorBlock](https://sponsor.ajay.app) API
@@ -1323,7 +1323,7 @@ # OUTPUT TEMPLATE
- `extractor` (string): Name of the extractor
- `extractor_key` (string): Key name of the extractor
- `epoch` (numeric): Unix epoch of when the information extraction was completed
- `autonumber` (numeric): Number that will be increased with each download, starting at `--autonumber-start`
- `autonumber` (numeric): Number that will be increased with each download, starting at `--autonumber-start`, padded with leading zeros to 5 digits
- `video_autonumber` (numeric): Number that will be increased with each video
- `n_entries` (numeric): Total number of extracted items in the playlist
- `playlist_id` (string): Identifier of the playlist that contains the video
@@ -1509,7 +1509,7 @@ # FORMAT SELECTION
## Filtering Formats
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"` since filters without a selector are interpreted as `best`).
The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals):
@@ -1545,7 +1545,7 @@ ## Filtering Formats
**Note**: None of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the website. Any other field made available by the extractor can also be used for filtering.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. You can also use the filters with `all` to download all formats that satisfy the filter, e.g. `-f "all[vcodec=none]"` selects all audio-only formats.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "bv[height<=?720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. You can also use the filters with `all` to download all formats that satisfy the filter, e.g. `-f "all[vcodec=none]"` selects all audio-only formats.
Format selectors can also be grouped using parentheses; e.g. `-f "(mp4,webm)[height<480]"` will download the best pre-merged mp4 and webm formats with a height lower than 480.
@@ -1805,8 +1805,7 @@ #### youtube
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
* `max_comments`: Limit the amount of comments to gather. Comma-separated list of integers representing `max-comments,max-parents,max-replies,max-replies-per-thread`. Default is `all,all,all,all`
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `include_duplicate_formats`: Extract formats with identical content but different URLs or protocol. This is useful if some of the formats are unavailable or throttled.
* `include_incomplete_formats`: Extract formats that cannot be downloaded completely (live dash and post-live m3u8)
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests
@@ -1853,10 +1852,7 @@ #### tiktok
#### rokfinchannel
* `tab`: Which tab to download - one of `new`, `top`, `videos`, `podcasts`, `streams`, `stacks`
#### twitter
* `legacy_api`: Force usage of the legacy Twitter API instead of the GraphQL API for tweet extraction. Has no effect if login cookies are passed
#### wrestleuniverse
#### stacommu, wrestleuniverse
* `device_id`: UUID value assigned by the website and used to enforce device limits for paid livestream content. Can be found in browser local storage
#### twitch
@@ -1954,7 +1950,7 @@ # EMBEDDING YT-DLP
ydl.download(URLS)
```
Most likely, you'll want to use various options. For a list of options available, have a look at [`yt_dlp/YoutubeDL.py`](yt_dlp/YoutubeDL.py#L184).
Most likely, you'll want to use various options. For a list of options available, have a look at [`yt_dlp/YoutubeDL.py`](yt_dlp/YoutubeDL.py#L183) or `help(yt_dlp.YoutubeDL)` in a Python shell. If you are already familiar with the CLI, you can use [`devscripts/cli_to_api.py`](https://github.com/yt-dlp/yt-dlp/blob/master/devscripts/cli_to_api.py) to translate any CLI switches to `YoutubeDL` params.
**Tip**: If you are porting your code from youtube-dl to yt-dlp, one important point to look out for is that we do not guarantee the return value of `YoutubeDL.extract_info` to be json serializable, or even be a dictionary. It will be dictionary-like, but if you want to ensure it is a serializable dictionary, pass it through `YoutubeDL.sanitize_info` as shown in the [example below](#extracting-information)

View File

@@ -1,12 +1,12 @@
[
{
"action": "add",
"when": "776d1c3f0c9b00399896dd2e40e78e9a43218109",
"when": "29cb20bd563c02671b31dd840139e93dd37150a1",
"short": "[priority] **A new release type has been added!**\n * [`nightly`](https://github.com/yt-dlp/yt-dlp/releases/tag/nightly) builds will be made after each push, containing the latest fixes (but also possibly bugs).\n * When using `--update`/`-U`, a release binary will only update to its current channel (either `stable` or `nightly`).\n * The `--update-to` option has been added allowing the user more control over program upgrades (or downgrades).\n * `--update-to` can change the release channel (`stable`, `nightly`) and also upgrade or downgrade to specific tags.\n * **Usage**: `--update-to CHANNEL`, `--update-to TAG`, `--update-to CHANNEL@TAG`"
},
{
"action": "add",
"when": "776d1c3f0c9b00399896dd2e40e78e9a43218109",
"when": "5038f6d713303e0967d002216e7a88652401c22a",
"short": "[priority] **YouTube throttling fixes!**"
},
{
@@ -38,13 +38,15 @@
},
{
"action": "change",
"when": "7b37e8b23691613f331bd4ebc9d639dd6f93c972",
"short": "Improve `--download-sections`\n - Support negative time-ranges\n - Add `*from-url` to obey time-ranges in URL"
"when": "b4e0d75848e9447cee2cd3646ce54d4744a7ff56",
"short": "Improve `--download-sections`\n - Support negative time-ranges\n - Add `*from-url` to obey time-ranges in URL",
"authors": ["pukkandan"]
},
{
"action": "change",
"when": "1e75d97db21152acc764b30a688e516f04b8a142",
"short": "[extractor/youtube] Add `ios` to default clients used\n - IOS is affected neither by 403 nor by nsig so helps mitigate them preemptively\n - IOS also has higher bit-rate 'premium' formats though they are not labeled as such"
"short": "[extractor/youtube] Add `ios` to default clients used\n - IOS is affected neither by 403 nor by nsig so helps mitigate them preemptively\n - IOS also has higher bit-rate 'premium' formats though they are not labeled as such",
"authors": ["pukkandan"]
},
{
"action": "change",
@@ -55,6 +57,12 @@
{
"action": "change",
"when": "a4486bfc1dc7057efca9dd3fe70d7fa25c56f700",
"short": "[misc] Revert \"Add automatic duplicate issue detection\""
"short": "[misc] Revert \"Add automatic duplicate issue detection\"",
"authors": ["pukkandan"]
},
{
"action": "add",
"when": "1ceb657bdd254ad961489e5060f2ccc7d556b729",
"short": "[priority] Security: [[CVE-2023-35934](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-35934)] Fix [Cookie leak](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj)\n - `--add-header Cookie:` is deprecated and auto-scoped to input URL domains\n - Cookies are scoped when passed to external downloaders\n - Add `cookie` field to info.json and deprecate `http_headers.Cookie`"
}
]

View File

@@ -55,6 +55,7 @@ def commit_lookup(cls):
'dependencies',
'jsinterp',
'outtmpl',
'formats',
'plugins',
'update',
'upstream',
@@ -68,9 +69,9 @@ def commit_lookup(cls):
'misc',
'test',
},
cls.EXTRACTOR: {'extractor'},
cls.DOWNLOADER: {'downloader'},
cls.POSTPROCESSOR: {'postprocessor'},
cls.EXTRACTOR: {'extractor', 'ie'},
cls.DOWNLOADER: {'downloader', 'fd'},
cls.POSTPROCESSOR: {'postprocessor', 'pp'},
}.items()
for name in names
}
@@ -252,6 +253,7 @@ class CommitRange:
(?:\ \((?P<issues>\#\d+(?:,\ \#\d+)*)\))?
''', re.VERBOSE | re.DOTALL)
EXTRACTOR_INDICATOR_RE = re.compile(r'(?:Fix|Add)\s+Extractors?', re.IGNORECASE)
REVERT_RE = re.compile(r'(?i:Revert)\s+([\da-f]{40})')
FIXES_RE = re.compile(r'(?i:Fix(?:es)?(?:\s+bugs?)?(?:\s+in|\s+for)?|Revert)\s+([\da-f]{40})')
UPSTREAM_MERGE_RE = re.compile(r'Update to ytdl-commit-([\da-f]+)')
@@ -279,7 +281,7 @@ def _get_commits_and_fixes(self, default_author):
self.COMMAND, 'log', f'--format=%H%n%s%n%b%n{self.COMMIT_SEPARATOR}',
f'{self._start}..{self._end}' if self._start else self._end).stdout
commits = {}
commits, reverts = {}, {}
fixes = defaultdict(list)
lines = iter(result.splitlines(False))
for i, commit_hash in enumerate(lines):
@@ -300,6 +302,11 @@ def _get_commits_and_fixes(self, default_author):
logger.debug(f'Reached Release commit, breaking: {commit}')
break
revert_match = self.REVERT_RE.fullmatch(commit.short)
if revert_match:
reverts[revert_match.group(1)] = commit
continue
fix_match = self.FIXES_RE.search(commit.short)
if fix_match:
commitish = fix_match.group(1)
@@ -307,6 +314,13 @@ def _get_commits_and_fixes(self, default_author):
commits[commit.hash] = commit
for commitish, revert_commit in reverts.items():
reverted = commits.pop(commitish, None)
if reverted:
logger.debug(f'{commit} fully reverted {reverted}')
else:
commits[revert_commit.hash] = revert_commit
for commitish, fix_commits in fixes.items():
if commitish in commits:
hashes = ', '.join(commit.hash[:HASH_LENGTH] for commit in fix_commits)

View File

@@ -8,7 +8,6 @@ ignore = E402,E501,E731,E741,W503
max_line_length = 120
per_file_ignores =
devscripts/lazy_load_template.py: F401
yt_dlp/utils/__init__.py: F401, F403
[autoflake]

View File

@@ -1136,6 +1136,7 @@ # Supported sites
- **puhutv:serie**
- **Puls4**
- **Pyvideo**
- **QDance**: [*qdance*](## "netrc machine")
- **QingTing**
- **qqmusic**: QQ音乐
- **qqmusic:album**: QQ音乐 - 专辑
@@ -1363,6 +1364,8 @@ # Supported sites
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **StacommuLive**: [*stacommu*](## "netrc machine")
- **StacommuVOD**: [*stacommu*](## "netrc machine")
- **StagePlusVODConcert**: [*stageplus*](## "netrc machine")
- **stanfordoc**: Stanford Open ClassRoom
- **StarTrek**
@@ -1647,6 +1650,8 @@ # Supported sites
- **vk**: [*vk*](## "netrc machine") VK
- **vk:uservideos**: [*vk*](## "netrc machine") VK - User's Videos
- **vk:wallpost**: [*vk*](## "netrc machine")
- **VKPlay**
- **VKPlayLive**
- **vm.tiktok**
- **Vocaroo**
- **Vodlocker**
@@ -1800,7 +1805,6 @@ # Supported sites
- **youtube:search:date**: YouTube search, newest videos first; "ytsearchdate:" prefix
- **youtube:search_url**: YouTube search URLs with sorting and filter support
- **youtube:shorts:pivot:audio**: YouTube Shorts audio pivot (Shorts using audio of a given video)
- **youtube:stories**: YouTube channel stories; "ytstories:" prefix
- **youtube:subscriptions**: YouTube subscriptions feed; ":ytsubs" keyword (requires cookies)
- **youtube:tab**: YouTube Tabs
- **youtube:user**: YouTube user videos; "ytuser:" prefix

View File

@@ -684,7 +684,7 @@ def test(tmpl, expected, *, info=None, **params):
test('%(id)s.%(ext)s', '1234.mp4')
test('%(duration_string)s', ('27:46:40', '27-46-40'))
test('%(resolution)s', '1080p')
test('%(playlist_index)s', '001')
test('%(playlist_index|)s', '001')
test('%(playlist_autonumber)s', '02')
test('%(autonumber)s', '00001')
test('%(autonumber+2)03d', '005', autonumber_start=3)
@@ -1213,6 +1213,62 @@ def _real_extract(self, url):
self.assertEqual(downloaded['extractor'], 'Video')
self.assertEqual(downloaded['extractor_key'], 'Video')
def test_header_cookies(self):
from http.cookiejar import Cookie
ydl = FakeYDL()
ydl.report_warning = lambda *_, **__: None
def cookie(name, value, version=None, domain='', path='', secure=False, expires=None):
return Cookie(
version or 0, name, value, None, False,
domain, bool(domain), bool(domain), path, bool(path),
secure, expires, False, None, None, rest={})
_test_url = 'https://yt.dlp/test'
def test(encoded_cookies, cookies, headers=False, round_trip=None, error=None):
def _test():
ydl.cookiejar.clear()
ydl._load_cookies(encoded_cookies, from_headers=headers)
if headers:
ydl._apply_header_cookies(_test_url)
data = {'url': _test_url}
ydl._calc_headers(data)
self.assertCountEqual(
map(vars, ydl.cookiejar), map(vars, cookies),
'Extracted cookiejar.Cookie is not the same')
if not headers:
self.assertEqual(
data.get('cookies'), round_trip or encoded_cookies,
'Cookie is not the same as round trip')
ydl.__dict__['_YoutubeDL__header_cookies'] = []
with self.subTest(msg=encoded_cookies):
if not error:
_test()
return
with self.assertRaisesRegex(Exception, error):
_test()
test('test=value; Domain=.yt.dlp', [cookie('test', 'value', domain='.yt.dlp')])
test('test=value', [cookie('test', 'value')], error='Unscoped cookies are not allowed')
test('cookie1=value1; Domain=.yt.dlp; Path=/test; cookie2=value2; Domain=.yt.dlp; Path=/', [
cookie('cookie1', 'value1', domain='.yt.dlp', path='/test'),
cookie('cookie2', 'value2', domain='.yt.dlp', path='/')])
test('test=value; Domain=.yt.dlp; Path=/test; Secure; Expires=9999999999', [
cookie('test', 'value', domain='.yt.dlp', path='/test', secure=True, expires=9999999999)])
test('test="value; "; path=/test; domain=.yt.dlp', [
cookie('test', 'value; ', domain='.yt.dlp', path='/test')],
round_trip='test="value\\073 "; Domain=.yt.dlp; Path=/test')
test('name=; Domain=.yt.dlp', [cookie('name', '', domain='.yt.dlp')],
round_trip='name=""; Domain=.yt.dlp')
test('test=value', [cookie('test', 'value', domain='.yt.dlp')], headers=True)
test('cookie1=value; Domain=.yt.dlp; cookie2=value', [], headers=True, error='Invalid syntax')
ydl.deprecated_feature = ydl.report_error
test('test=value', [], headers=True, error='Passing cookies as a header is a potential security risk')
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,133 @@
#!/usr/bin/env python3
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import http.cookiejar
from test.helper import FakeYDL
from yt_dlp.downloader.external import (
Aria2cFD,
AxelFD,
CurlFD,
FFmpegFD,
HttpieFD,
WgetFD,
)
TEST_COOKIE = {
'version': 0,
'name': 'test',
'value': 'ytdlp',
'port': None,
'port_specified': False,
'domain': '.example.com',
'domain_specified': True,
'domain_initial_dot': False,
'path': '/',
'path_specified': True,
'secure': False,
'expires': None,
'discard': False,
'comment': None,
'comment_url': None,
'rest': {},
}
TEST_INFO = {'url': 'http://www.example.com/'}
class TestHttpieFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = HttpieFD(ydl, {})
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['http', '--download', '--output', 'test', 'http://www.example.com/'])
# Test cookie header is added
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(**TEST_COOKIE))
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['http', '--download', '--output', 'test', 'http://www.example.com/', 'Cookie:test=ytdlp'])
class TestAxelFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = AxelFD(ydl, {})
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['axel', '-o', 'test', '--', 'http://www.example.com/'])
# Test cookie header is added
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(**TEST_COOKIE))
self.assertEqual(
downloader._make_cmd('test', TEST_INFO),
['axel', '-o', 'test', 'Cookie: test=ytdlp', '--max-redirect=0', '--', 'http://www.example.com/'])
class TestWgetFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = WgetFD(ydl, {})
self.assertNotIn('--load-cookies', downloader._make_cmd('test', TEST_INFO))
# Test cookiejar tempfile arg is added
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(**TEST_COOKIE))
self.assertIn('--load-cookies', downloader._make_cmd('test', TEST_INFO))
class TestCurlFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = CurlFD(ydl, {})
self.assertNotIn('--cookie-jar', downloader._make_cmd('test', TEST_INFO))
# Test cookiejar tempfile arg is added
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(**TEST_COOKIE))
self.assertIn('--cookie-jar', downloader._make_cmd('test', TEST_INFO))
class TestAria2cFD(unittest.TestCase):
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = Aria2cFD(ydl, {})
downloader._make_cmd('test', TEST_INFO)
self.assertFalse(hasattr(downloader, '_cookies_tempfile'))
# Test cookiejar tempfile arg is added
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(**TEST_COOKIE))
cmd = downloader._make_cmd('test', TEST_INFO)
self.assertIn(f'--load-cookies={downloader._cookies_tempfile}', cmd)
@unittest.skipUnless(FFmpegFD.available(), 'ffmpeg not found')
class TestFFmpegFD(unittest.TestCase):
_args = []
def _test_cmd(self, args):
self._args = args
def test_make_cmd(self):
with FakeYDL() as ydl:
downloader = FFmpegFD(ydl, {})
downloader._debug_cmd = self._test_cmd
downloader._call_downloader('test', {**TEST_INFO, 'ext': 'mp4'})
self.assertEqual(self._args, [
'ffmpeg', '-y', '-hide_banner', '-i', 'http://www.example.com/',
'-c', 'copy', '-f', 'mp4', 'file:test'])
# Test cookies arg is added
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(**TEST_COOKIE))
downloader._call_downloader('test', {**TEST_INFO, 'ext': 'mp4'})
self.assertEqual(self._args, [
'ffmpeg', '-y', '-hide_banner', '-cookies', 'test=ytdlp; path=/; domain=.example.com;\r\n',
'-i', 'http://www.example.com/', '-c', 'copy', '-f', 'mp4', 'file:test'])
if __name__ == '__main__':
unittest.main()

View File

@@ -132,6 +132,11 @@ def do_GET(self):
self._method('GET')
elif self.path.startswith('/headers'):
self._headers()
elif self.path.startswith('/308-to-headers'):
self.send_response(308)
self.send_header('Location', '/headers')
self.send_header('Content-Length', '0')
self.end_headers()
elif self.path == '/trailing_garbage':
payload = b'<html><video src="/vid.mp4" /></html>'
self.send_response(200)
@@ -270,6 +275,7 @@ def do_req(redirect_status, method):
self.assertEqual(do_req(303, 'PUT'), ('', 'GET'))
# 301 and 302 turn POST only into a GET
# XXX: we should also test if the Content-Type and Content-Length headers are removed
self.assertEqual(do_req(301, 'POST'), ('', 'GET'))
self.assertEqual(do_req(301, 'HEAD'), ('', 'HEAD'))
self.assertEqual(do_req(302, 'POST'), ('', 'GET'))
@@ -313,6 +319,31 @@ def test_cookiejar(self):
data = ydl.urlopen(sanitized_Request(f'http://127.0.0.1:{self.http_port}/headers')).read()
self.assertIn(b'Cookie: test=ytdlp', data)
def test_passed_cookie_header(self):
# We should accept a Cookie header being passed as in normal headers and handle it appropriately.
with FakeYDL() as ydl:
# Specified Cookie header should be used
res = ydl.urlopen(
sanitized_Request(f'http://127.0.0.1:{self.http_port}/headers',
headers={'Cookie': 'test=test'})).read().decode('utf-8')
self.assertIn('Cookie: test=test', res)
# Specified Cookie header should be removed on any redirect
res = ydl.urlopen(
sanitized_Request(f'http://127.0.0.1:{self.http_port}/308-to-headers', headers={'Cookie': 'test=test'})).read().decode('utf-8')
self.assertNotIn('Cookie: test=test', res)
# Specified Cookie header should override global cookiejar for that request
ydl.cookiejar.set_cookie(http.cookiejar.Cookie(
version=0, name='test', value='ytdlp', port=None, port_specified=False,
domain='127.0.0.1', domain_specified=True, domain_initial_dot=False, path='/',
path_specified=True, secure=False, expires=None, discard=False, comment=None,
comment_url=None, rest={}))
data = ydl.urlopen(sanitized_Request(f'http://127.0.0.1:{self.http_port}/headers', headers={'Cookie': 'test=test'})).read()
self.assertNotIn(b'Cookie: test=ytdlp', data)
self.assertIn(b'Cookie: test=test', data)
def test_no_compression_compat_header(self):
with FakeYDL() as ydl:
data = ydl.urlopen(

View File

@@ -1,9 +1,11 @@
import collections
import contextlib
import copy
import datetime
import errno
import fileinput
import functools
import http.cookiejar
import io
import itertools
import json
@@ -25,7 +27,7 @@
from .cache import Cache
from .compat import urllib # isort: split
from .compat import compat_os_name, compat_shlex_quote
from .cookies import load_cookies
from .cookies import LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor
@@ -673,6 +675,9 @@ def process_color_policy(stream):
if auto_init and auto_init != 'no_verbose_header':
self.print_debug_header()
self.__header_cookies = []
self._load_cookies(traverse_obj(self.params.get('http_headers'), 'cookie', casesense=False)) # compat
def check_deprecated(param, option, suggestion):
if self.params.get(param) is not None:
self.report_warning(f'{option} is deprecated. Use {suggestion} instead')
@@ -983,6 +988,7 @@ def trouble(self, message=None, tb=None, is_error=True):
ID='green',
DELIM='blue',
ERROR='red',
BAD_FORMAT='light red',
WARNING='yellow',
SUPPRESS='light black',
)
@@ -1271,21 +1277,20 @@ def create_key(outer_mobj):
return outer_mobj.group(0)
key = outer_mobj.group('key')
mobj = re.match(INTERNAL_FORMAT_RE, key)
initial_field = mobj.group('fields') if mobj else ''
value, replacement, default = None, None, na
value, replacement, default, last_field = None, None, na, ''
while mobj:
mobj = mobj.groupdict()
default = mobj['default'] if mobj['default'] is not None else default
value = get_value(mobj)
replacement = mobj['replacement']
last_field, replacement = mobj['fields'], mobj['replacement']
if value is None and mobj['alternate']:
mobj = re.match(INTERNAL_FORMAT_RE, mobj['remaining'][1:])
else:
break
fmt = outer_mobj.group('format')
if fmt == 's' and value is not None and key in field_size_compat_map.keys():
fmt = f'0{field_size_compat_map[key]:d}d'
if fmt == 's' and value is not None and last_field in field_size_compat_map.keys():
fmt = f'0{field_size_compat_map[last_field]:d}d'
if None not in (value, replacement):
try:
@@ -1322,7 +1327,7 @@ def create_key(outer_mobj):
value = format_decimal_suffix(value, f'%{num_fmt}f%s' if num_fmt else '%d%s',
factor=1024 if '#' in flags else 1000)
elif fmt[-1] == 'S': # filename sanitization
value, fmt = filename_sanitizer(initial_field, value, restricted='#' in flags), str_fmt
value, fmt = filename_sanitizer(last_field, value, restricted='#' in flags), str_fmt
elif fmt[-1] == 'c':
if value:
value = str(value)[0]
@@ -1341,7 +1346,7 @@ def create_key(outer_mobj):
elif fmt[-1] == 'a':
value, fmt = ascii(value), str_fmt
if fmt[-1] in 'csra':
value = sanitizer(initial_field, value)
value = sanitizer(last_field, value)
key = '%s\0%s' % (key.replace('%', '%\0'), outer_mobj.group('format'))
TMPL_DICT[key] = value
@@ -1625,8 +1630,60 @@ def progress(msg):
self.to_screen('')
raise
def _load_cookies(self, data, *, from_headers=True):
"""Loads cookies from a `Cookie` header
This tries to work around the security vulnerability of passing cookies to every domain.
See: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj
The unscoped cookies are saved for later to be stored in the jar with a limited scope.
@param data The Cookie header as string to load the cookies from
@param from_headers If `False`, allows Set-Cookie syntax in the cookie string (at least a domain will be required)
"""
for cookie in LenientSimpleCookie(data).values():
if from_headers and any(cookie.values()):
raise ValueError('Invalid syntax in Cookie Header')
domain = cookie.get('domain') or ''
expiry = cookie.get('expires')
if expiry == '': # 0 is valid
expiry = None
prepared_cookie = http.cookiejar.Cookie(
cookie.get('version') or 0, cookie.key, cookie.value, None, False,
domain, True, True, cookie.get('path') or '', bool(cookie.get('path')),
cookie.get('secure') or False, expiry, False, None, None, {})
if domain:
self.cookiejar.set_cookie(prepared_cookie)
elif from_headers:
self.deprecated_feature(
'Passing cookies as a header is a potential security risk; '
'they will be scoped to the domain of the downloaded urls. '
'Please consider loading cookies from a file or browser instead.')
self.__header_cookies.append(prepared_cookie)
else:
self.report_error('Unscoped cookies are not allowed; please specify some sort of scoping',
tb=False, is_error=False)
def _apply_header_cookies(self, url):
"""Applies stray header cookies to the provided url
This loads header cookies and scopes them to the domain provided in `url`.
While this is not ideal, it helps reduce the risk of them being sent
to an unintended destination while mostly maintaining compatibility.
"""
parsed = urllib.parse.urlparse(url)
if not parsed.hostname:
return
for cookie in map(copy.copy, self.__header_cookies):
cookie.domain = f'.{parsed.hostname}'
self.cookiejar.set_cookie(cookie)
@_handle_extraction_exceptions
def __extract_info(self, url, ie, download, extra_info, process):
self._apply_header_cookies(url)
try:
ie_result = ie.extract(url)
except UserNotLive as e:
@@ -2086,8 +2143,6 @@ def syntax_error(note, start):
allow_multiple_streams = {'audio': self.params.get('allow_multiple_audio_streams', False),
'video': self.params.get('allow_multiple_video_streams', False)}
check_formats = self.params.get('check_formats') == 'selected'
def _parse_filter(tokens):
filter_parts = []
for type, string_, start, _, _ in tokens:
@@ -2260,10 +2315,19 @@ def _merge(formats_pair):
return new_dict
def _check_formats(formats):
if not check_formats:
if (self.params.get('check_formats') is not None
or self.params.get('allow_unplayable_formats')):
yield from formats
return
yield from self._check_formats(formats)
elif self.params.get('check_formats') == 'selected':
yield from self._check_formats(formats)
return
for f in formats:
if f.get('has_drm'):
yield from self._check_formats([f])
else:
yield f
def _build_selector_function(selector):
if isinstance(selector, list): # ,
@@ -2407,9 +2471,24 @@ def _calc_headers(self, info_dict):
if 'Youtubedl-No-Compression' in res: # deprecated
res.pop('Youtubedl-No-Compression', None)
res['Accept-Encoding'] = 'identity'
cookies = self.cookiejar.get_cookie_header(info_dict['url'])
cookies = self.cookiejar.get_cookies_for_url(info_dict['url'])
if cookies:
res['Cookie'] = cookies
encoder = LenientSimpleCookie()
values = []
for cookie in cookies:
_, value = encoder.value_encode(cookie.value)
values.append(f'{cookie.name}={value}')
if cookie.domain:
values.append(f'Domain={cookie.domain}')
if cookie.path:
values.append(f'Path={cookie.path}')
if cookie.secure:
values.append('Secure')
if cookie.expires:
values.append(f'Expires={cookie.expires}')
if cookie.version:
values.append(f'Version={cookie.version}')
info_dict['cookies'] = '; '.join(values)
if 'X-Forwarded-For' not in res:
x_forwarded_for_ip = info_dict.get('__x_forwarded_for_ip')
@@ -2615,10 +2694,10 @@ def sanitize_numeric_fields(info):
if field_preference:
info_dict['_format_sort_fields'] = field_preference
# or None ensures --clean-infojson removes it
info_dict['_has_drm'] = any(f.get('has_drm') for f in formats) or None
info_dict['_has_drm'] = any( # or None ensures --clean-infojson removes it
f.get('has_drm') and f['has_drm'] != 'maybe' for f in formats) or None
if not self.params.get('allow_unplayable_formats'):
formats = [f for f in formats if not f.get('has_drm')]
formats = [f for f in formats if not f.get('has_drm') or f['has_drm'] == 'maybe']
if formats and all(f.get('acodec') == f.get('vcodec') == 'none' for f in formats):
self.report_warning(
@@ -2767,11 +2846,8 @@ def is_wellformed(f):
formats_to_download = list(format_selector({
'formats': formats,
'has_merged_format': any('none' not in (f.get('acodec'), f.get('vcodec')) for f in formats),
'incomplete_formats': (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats)
# all formats are audio-only
or all(f.get('vcodec') == 'none' and f.get('acodec') != 'none' for f in formats)),
'incomplete_formats': (all(f.get('vcodec') == 'none' for f in formats) # No formats with video
or all(f.get('acodec') == 'none' for f in formats)), # OR, No formats with audio
}))
if interactive_format_selection and not formats_to_download:
self.report_error('Requested format is not available', tb=False, is_error=False)
@@ -3419,6 +3495,8 @@ def download_with_info_file(self, info_filename):
infos = [self.sanitize_info(info, self.params.get('clean_infojson', True))
for info in variadic(json.loads('\n'.join(f)))]
for info in infos:
self._load_cookies(info.get('cookies'), from_headers=False)
self._load_cookies(traverse_obj(info.get('http_headers'), 'Cookie', casesense=False)) # compat
try:
self.__download_wrapper(self.process_ie_result)(info, download=True)
except (DownloadError, EntryNotInPlaylist, ReExtractInfo) as e:
@@ -3688,7 +3766,7 @@ def render_formats_table(self, info_dict):
def simplified_codec(f, field):
assert field in ('acodec', 'vcodec')
codec = f.get(field, 'unknown')
codec = f.get(field)
if not codec:
return 'unknown'
elif codec != 'none':
@@ -3723,14 +3801,13 @@ def simplified_codec(f, field):
simplified_codec(f, 'acodec'),
format_field(f, 'abr', '\t%dk', func=round),
format_field(f, 'asr', '\t%s', func=format_decimal_suffix),
join_nonempty(
self._format_out('UNSUPPORTED', 'light red') if f.get('ext') in ('f4f', 'f4m') else None,
self._format_out('DRM', 'light red') if f.get('has_drm') else None,
format_field(f, 'language', '[%s]'),
join_nonempty(format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
delim=', '),
delim=' '),
join_nonempty(format_field(f, 'language', '[%s]'), join_nonempty(
self._format_out('UNSUPPORTED', self.Styles.BAD_FORMAT) if f.get('ext') in ('f4f', 'f4m') else None,
(self._format_out('Maybe DRM', self.Styles.WARNING) if f.get('has_drm') == 'maybe'
else self._format_out('DRM', self.Styles.BAD_FORMAT) if f.get('has_drm') else None),
format_field(f, 'format_note'),
format_field(f, 'container', ignore=(None, f.get('ext'))),
delim=', '), delim=' '),
] for f in formats if f.get('preference') is None or f['preference'] >= -1000]
header_line = self._list_format_headers(
'ID', 'EXT', 'RESOLUTION', '\tFPS', 'HDR', 'CH', delim, '\tFILESIZE', '\tTBR', 'PROTO',

View File

@@ -1327,6 +1327,13 @@ def get_cookie_header(self, url):
self.add_cookie_header(cookie_req)
return cookie_req.get_header('Cookie')
def get_cookies_for_url(self, url):
"""Generate a list of Cookie objects for a given url"""
# Policy `_now` attribute must be set before calling `_cookies_for_request`
# Ref: https://github.com/python/cpython/blob/3.7/Lib/http/cookiejar.py#L1360
self._policy._now = self._now = int(time.time())
return self._cookies_for_request(urllib.request.Request(escape_url(sanitize_url(url))))
def clear(self, *args, **kwargs):
with contextlib.suppress(KeyError):
return super().clear(*args, **kwargs)

View File

@@ -32,6 +32,7 @@
timetuple_from_msec,
try_call,
)
from ..utils.traversal import traverse_obj
class FileDownloader:
@@ -255,7 +256,8 @@ def sanitize_open(self, filename, open_mode):
@wrap_file_access('remove')
def try_remove(self, filename):
os.remove(filename)
if os.path.isfile(filename):
os.remove(filename)
@wrap_file_access('rename')
def try_rename(self, old_filename, new_filename):
@@ -418,7 +420,6 @@ def download(self, filename, info_dict, subtitle=False):
"""Download to a filename using the info from info_dict
Return True on success and False otherwise
"""
nooverwrites_and_exists = (
not self.params.get('overwrites', True)
and os.path.exists(encodeFilename(filename))
@@ -452,6 +453,11 @@ def download(self, filename, info_dict, subtitle=False):
self.to_screen(f'[download] Sleeping {sleep_interval:.2f} seconds ...')
time.sleep(sleep_interval)
# Filter the `Cookie` header from the info_dict to prevent leaks.
# See: https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj
info_dict['http_headers'] = dict(traverse_obj(info_dict, (
'http_headers', {dict.items}, lambda _, pair: pair[0].lower() != 'cookie'))) or None
ret = self.real_download(filename, info_dict)
self._finish_multiline_status()
return ret, True

View File

@@ -1,9 +1,10 @@
import enum
import json
import os.path
import os
import re
import subprocess
import sys
import tempfile
import time
import uuid
@@ -42,6 +43,7 @@ class ExternalFD(FragmentFD):
def real_download(self, filename, info_dict):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
self._cookies_tempfile = None
try:
started = time.time()
@@ -54,6 +56,9 @@ def real_download(self, filename, info_dict):
# should take place
retval = 0
self.to_screen('[%s] Interrupted by user' % self.get_basename())
finally:
if self._cookies_tempfile:
self.try_remove(self._cookies_tempfile)
if retval == 0:
status = {
@@ -125,6 +130,16 @@ def _configuration_args(self, keys=None, *args, **kwargs):
self.get_basename(), self.params.get('external_downloader_args'), self.EXE_NAME,
keys, *args, **kwargs)
def _write_cookies(self):
if not self.ydl.cookiejar.filename:
tmp_cookies = tempfile.NamedTemporaryFile(suffix='.cookies', delete=False)
tmp_cookies.close()
self._cookies_tempfile = tmp_cookies.name
self.to_screen(f'[download] Writing temporary cookies file to "{self._cookies_tempfile}"')
# real_download resets _cookies_tempfile; if it's None then save() will write to cookiejar.filename
self.ydl.cookiejar.save(self._cookies_tempfile)
return self.ydl.cookiejar.filename or self._cookies_tempfile
def _call_downloader(self, tmpfilename, info_dict):
""" Either overwrite this or implement _make_cmd """
cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
@@ -184,6 +199,8 @@ class CurlFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '--location', '-o', tmpfilename, '--compressed']
if self.ydl.cookiejar.get_cookie_header(info_dict['url']):
cmd += ['--cookie-jar', self._write_cookies()]
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
cmd += ['--header', f'{key}: {val}']
@@ -214,6 +231,9 @@ def _make_cmd(self, tmpfilename, info_dict):
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
cmd += ['-H', f'{key}: {val}']
cookie_header = self.ydl.cookiejar.get_cookie_header(info_dict['url'])
if cookie_header:
cmd += [f'Cookie: {cookie_header}', '--max-redirect=0']
cmd += self._configuration_args()
cmd += ['--', info_dict['url']]
return cmd
@@ -223,7 +243,9 @@ class WgetFD(ExternalFD):
AVAILABLE_OPT = '--version'
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies', '--compression=auto']
cmd = [self.exe, '-O', tmpfilename, '-nv', '--compression=auto']
if self.ydl.cookiejar.get_cookie_header(info_dict['url']):
cmd += ['--load-cookies', self._write_cookies()]
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
cmd += ['--header', f'{key}: {val}']
@@ -271,7 +293,7 @@ def _call_downloader(self, tmpfilename, info_dict):
return super()._call_downloader(tmpfilename, info_dict)
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c',
cmd = [self.exe, '-c', '--no-conf',
'--console-log-level=warn', '--summary-interval=0', '--download-result=hide',
'--http-accept-gzip=true', '--file-allocation=none', '-x16', '-j16', '-s16']
if 'fragments' in info_dict:
@@ -279,6 +301,8 @@ def _make_cmd(self, tmpfilename, info_dict):
else:
cmd += ['--min-split-size', '1M']
if self.ydl.cookiejar.get_cookie_header(info_dict['url']):
cmd += [f'--load-cookies={self._write_cookies()}']
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
cmd += ['--header', f'{key}: {val}']
@@ -417,6 +441,14 @@ def _make_cmd(self, tmpfilename, info_dict):
if info_dict.get('http_headers') is not None:
for key, val in info_dict['http_headers'].items():
cmd += [f'{key}:{val}']
# httpie 3.1.0+ removes the Cookie header on redirect, so this should be safe for now. [1]
# If we ever need cookie handling for redirects, we can export the cookiejar into a session. [2]
# 1: https://github.com/httpie/httpie/security/advisories/GHSA-9w4w-cpc8-h2fq
# 2: https://httpie.io/docs/cli/sessions
cookie_header = self.ydl.cookiejar.get_cookie_header(info_dict['url'])
if cookie_header:
cmd += [f'Cookie:{cookie_header}']
return cmd
@@ -527,6 +559,11 @@ def _call_downloader(self, tmpfilename, info_dict):
selected_formats = info_dict.get('requested_formats') or [info_dict]
for i, fmt in enumerate(selected_formats):
cookies = self.ydl.cookiejar.get_cookies_for_url(fmt['url'])
if cookies:
args.extend(['-cookies', ''.join(
f'{cookie.name}={cookie.value}; path={cookie.path}; domain={cookie.domain};\r\n'
for cookie in cookies)])
if fmt.get('http_headers') and re.match(r'^https?://', fmt['url']):
# Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
# [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.

View File

@@ -300,9 +300,7 @@ def frag_progress_hook(s):
def _finish_frag_download(self, ctx, info_dict):
ctx['dest_stream'].close()
if self.__do_ytdl_file(ctx):
ytdl_filename = encodeFilename(self.ytdl_filename(ctx['filename']))
if os.path.isfile(ytdl_filename):
self.try_remove(ytdl_filename)
self.try_remove(self.ytdl_filename(ctx['filename']))
elapsed = time.time() - ctx['started']
to_file = ctx['tmpfilename'] != '-'

View File

@@ -28,7 +28,16 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest, info_dict, allow_unplayable_formats=False):
def _has_drm(manifest): # TODO: https://github.com/yt-dlp/yt-dlp/pull/5039
return bool(re.search('|'.join((
r'#EXT-X-(?:SESSION-)?KEY:.*?URI="skd://', # Apple FairPlay
r'#EXT-X-(?:SESSION-)?KEY:.*?KEYFORMAT="com\.apple\.streamingkeydelivery"', # Apple FairPlay
r'#EXT-X-(?:SESSION-)?KEY:.*?KEYFORMAT="com\.microsoft\.playready"', # Microsoft PlayReady
r'#EXT-X-FAXS-CM:', # Adobe Flash Access
)), manifest))
@classmethod
def can_download(cls, manifest, info_dict, allow_unplayable_formats=False):
UNSUPPORTED_FEATURES = [
# r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@@ -50,13 +59,15 @@ def can_download(manifest, info_dict, allow_unplayable_formats=False):
]
if not allow_unplayable_formats:
UNSUPPORTED_FEATURES += [
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1], but not necessarily DRM
]
def check_results():
yield not info_dict.get('is_live')
for feature in UNSUPPORTED_FEATURES:
yield not re.search(feature, manifest)
if not allow_unplayable_formats:
yield not cls._has_drm(manifest)
return all(check_results())
def real_download(self, filename, info_dict):
@@ -81,14 +92,13 @@ def real_download(self, filename, info_dict):
message = ('Live HLS streams are not supported by the native downloader. If this is a livestream, '
f'please {install_ffmpeg}add "--downloader ffmpeg --hls-use-mpegts" to your command')
if not can_download:
has_drm = re.search('|'.join([
r'#EXT-X-FAXS-CM:', # Adobe Flash Access
r'#EXT-X-(?:SESSION-)?KEY:.*?URI="skd://', # Apple FairPlay
]), s)
if has_drm and not self.params.get('allow_unplayable_formats'):
self.report_error(
'This video is DRM protected; Try selecting another format with --format or '
'add --check-formats to automatically fallback to the next best format')
if self._has_drm(s) and not self.params.get('allow_unplayable_formats'):
if info_dict.get('has_drm') and self.params.get('test'):
self.to_screen(f'[{self.FD_NAME}] This format is DRM protected', skip_eol=True)
else:
self.report_error(
'This format is DRM protected; Try selecting another format with --format or '
'add --check-formats to automatically fallback to the next best format', tb=False)
return False
message = message or 'Unsupported features have been detected'
fd = FFmpegFD(self.ydl, self.params)

View File

@@ -339,15 +339,15 @@ def retry(e):
elif speed:
ctx.throttle_start = None
if not is_test and ctx.chunk_size and ctx.content_len is not None and byte_counter < ctx.content_len:
ctx.resume_len = byte_counter
# ctx.block_size = block_size
raise NextFragment()
if ctx.stream is None:
self.to_stderr('\n')
self.report_error('Did not get any data blocks')
return False
if not is_test and ctx.chunk_size and ctx.content_len is not None and byte_counter < ctx.content_len:
ctx.resume_len = byte_counter
raise NextFragment()
if ctx.tmpfilename != '-':
ctx.stream.close()

View File

@@ -15,7 +15,6 @@
YoutubeSearchURLIE,
YoutubeMusicSearchURLIE,
YoutubeSubscriptionsIE,
YoutubeStoriesIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeYtBeIE,
@@ -1531,6 +1530,7 @@
)
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
from .qdance import QDanceIE
from .qingting import QingTingIE
from .qqmusic import (
QQMusicIE,
@@ -1854,6 +1854,10 @@
SRGSSRPlayIE,
)
from .srmediathek import SRMediathekIE
from .stacommu import (
StacommuLiveIE,
StacommuVODIE,
)
from .stanfordoc import StanfordOpenClassroomIE
from .startv import StarTVIE
from .steam import (
@@ -2267,6 +2271,8 @@
VKIE,
VKUserVideosIE,
VKWallPostIE,
VKPlayIE,
VKPlayLiveIE,
)
from .vocaroo import VocarooIE
from .vodlocker import VodlockerIE

View File

@@ -12,6 +12,7 @@
int_or_none,
parse_iso8601,
str_or_none,
traverse_obj,
try_get,
unescapeHTML,
update_url_query,
@@ -85,6 +86,15 @@ class ABCIE(InfoExtractor):
'uploader': 'Behind the News',
'uploader_id': 'behindthenews',
}
}, {
'url': 'https://www.abc.net.au/news/2023-06-25/wagner-boss-orders-troops-back-to-bases-to-avoid-bloodshed/102520540',
'info_dict': {
'id': '102520540',
'title': 'Wagner Group retreating from Russia, leader Prigozhin to move to Belarus',
'ext': 'mp4',
'description': 'Wagner troops leave Rostov-on-Don and\xa0Yevgeny Prigozhin will move to Belarus under a deal brokered by Belarusian President Alexander Lukashenko to end the mutiny.',
'thumbnail': 'https://live-production.wcms.abc-cdn.net.au/0c170f5b57f0105c432f366c0e8e267b?impolicy=wcms_crop_resize&cropH=2813&cropW=5000&xPos=0&yPos=249&width=862&height=485',
}
}]
def _real_extract(self, url):
@@ -107,7 +117,7 @@ def _real_extract(self, url):
video = True
if mobj is None:
mobj = re.search(r'(?P<type>)"sources": (?P<json_data>\[[^\]]+\]),', webpage)
mobj = re.search(r'(?P<type>)"(?:sources|files|renditions)":\s*(?P<json_data>\[[^\]]+\])', webpage)
if mobj is None:
mobj = re.search(
r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
@@ -121,7 +131,8 @@ def _real_extract(self, url):
urls_info = self._parse_json(
mobj.group('json_data'), video_id, transform_source=js_to_json)
youtube = mobj.group('type') == 'YouTube'
video = mobj.group('type') == 'Video' or urls_info[0]['contentType'] == 'video/mp4'
video = mobj.group('type') == 'Video' or traverse_obj(
urls_info, (0, ('contentType', 'MIMEType')), get_all=False) == 'video/mp4'
if not isinstance(urls_info, list):
urls_info = [urls_info]

View File

@@ -1473,7 +1473,7 @@ def extract_redirect_url(html, url=None, fatal=False):
elif 'automatically signed in with' in provider_redirect_page:
# Seems like comcast is rolling up new way of automatically signing customers
oauth_redirect_url = self._html_search_regex(
r'continue:\s*"(https://oauth.xfinity.com/oauth/authorize\?.+)"', provider_redirect_page,
r'continue:\s*"(https://oauth\.xfinity\.com/oauth/authorize\?.+)"', provider_redirect_page,
'oauth redirect (signed)')
# Just need to process the request. No useful data comes back
self._download_webpage(oauth_redirect_url, video_id, 'Confirming auto login')

View File

@@ -170,8 +170,10 @@ def _real_extract(self, url):
continue
ext = determine_ext(asset_url, mimetype2ext(asset.get('mime_type')))
if ext == 'm3u8':
info['formats'].extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
fmts, subs = self._extract_m3u8_formats_and_subtitles(
asset_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
info['formats'].extend(fmts)
self._merge_subtitles(subs, target=info['subtitles'])
elif ext == 'f4m':
continue
# info['formats'].extend(self._extract_f4m_formats(

View File

@@ -26,6 +26,7 @@
from ..compat import compat_etree_fromstring, compat_expanduser, compat_os_name
from ..cookies import LenientSimpleCookie
from ..downloader.f4m import get_base_url, remove_encrypted_media
from ..downloader.hls import HlsFD
from ..utils import (
IDENTITY,
JSON_LD_RE,
@@ -224,7 +225,8 @@ class InfoExtractor:
width : height ratio as float.
* no_resume The server does not support resuming the
(HTTP or RTMP) download. Boolean.
* has_drm The format has DRM and cannot be downloaded. Boolean
* has_drm True if the format has DRM and cannot be downloaded.
'maybe' if the format may have DRM and has to be tested before download.
* extra_param_to_segment_url A query string to append to each
fragment's URL, or to update each existing query string
with. Only applied by the native HLS/DASH downloaders.
@@ -1979,11 +1981,7 @@ def _parse_m3u8_formats_and_subtitles(
errnote=None, fatal=True, data=None, headers={}, query={},
video_id=None):
formats, subtitles = [], {}
has_drm = re.search('|'.join([
r'#EXT-X-FAXS-CM:', # Adobe Flash Access
r'#EXT-X-(?:SESSION-)?KEY:.*?URI="skd://', # Apple FairPlay
]), m3u8_doc)
has_drm = HlsFD._has_drm(m3u8_doc)
def format_url(url):
return url if re.match(r'^https?://', url) else urllib.parse.urljoin(m3u8_url, url)

View File

@@ -490,8 +490,21 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
watch/(?P<type>concert|musicvideo)/(?P<id>\w{10})'''
watch/(?P<type>concert|musicvideo)/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79',
'info_dict': {
'ext': 'mp4',
'id': 'MV5B02C79',
'display_id': 'egaono-hana',
'title': 'Egaono Hana',
'track': 'Egaono Hana',
'artist': 'Goose house',
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genre': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C',
'info_dict': {
'ext': 'mp4',
@@ -519,11 +532,14 @@ class CrunchyrollMusicIE(CrunchyrollBaseIE):
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C/crossing-field',
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79/egaono-hana',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135/live-is-smile-always-364joker-at-yokohama-arena',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C/crossing-field',
'only_matching': True,
}]
_API_ENDPOINT = 'music'

View File

@@ -5,7 +5,9 @@
from ..utils import (
ExtractorError,
determine_ext,
extract_attributes,
get_element_by_class,
get_element_html_by_id,
int_or_none,
lowercase_escape,
try_get,
@@ -34,6 +36,7 @@ class GoogleDriveIE(InfoExtractor):
'ext': 'mp4',
'title': 'Big Buck Bunny.mp4',
'duration': 45,
'thumbnail': 'https://drive.google.com/thumbnail?id=0ByeS4oOUV-49Zzh4R1J6R09zazQ',
}
}, {
# video can't be watched anonymously due to view count limit reached,
@@ -207,10 +210,10 @@ def get_value(key):
'export': 'download',
})
def request_source_file(source_url, kind):
def request_source_file(source_url, kind, data=None):
return self._request_webpage(
source_url, video_id, note='Requesting %s file' % kind,
errnote='Unable to request %s file' % kind, fatal=False)
errnote='Unable to request %s file' % kind, fatal=False, data=data)
urlh = request_source_file(source_url, 'source')
if urlh:
def add_source_format(urlh):
@@ -237,14 +240,10 @@ def add_source_format(urlh):
urlh, url, video_id, note='Downloading confirmation page',
errnote='Unable to confirm download', fatal=False)
if confirmation_webpage:
confirm = self._search_regex(
r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', default=None)
if confirm:
confirmed_source_url = update_url_query(source_url, {
'confirm': confirm,
})
urlh = request_source_file(confirmed_source_url, 'confirmed source')
confirmed_source_url = extract_attributes(
get_element_html_by_id('download-form', confirmation_webpage) or '').get('action')
if confirmed_source_url:
urlh = request_source_file(confirmed_source_url, 'confirmed source', data=b'')
if urlh and urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
else:

View File

@@ -527,7 +527,7 @@ def _extract_vms_player_js(self, webpage, video_id):
if player_js_cache:
return player_js_cache
webpack_js_url = self._proto_relative_url(self._search_regex(
r'<script src="((?:https?)?//stc.iqiyipic.com/_next/static/chunks/webpack-\w+\.js)"', webpage, 'webpack URL'))
r'<script src="((?:https?:)?//stc\.iqiyipic\.com/_next/static/chunks/webpack-\w+\.js)"', webpage, 'webpack URL'))
webpack_js = self._download_webpage(webpack_js_url, video_id, note='Downloading webpack JS', errnote='Unable to download webpack JS')
webpack_map = self._search_json(
r'["\']\s*\+\s*', webpack_js, 'JS locations', video_id,

View File

@@ -30,7 +30,7 @@ def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, *
class KickIE(KickBaseIE):
_VALID_URL = r'https?://(?:www\.)?kick\.com/(?!(?:video|categories|search|auth)(?:[/?#]|$))(?P<id>[\w_]+)'
_VALID_URL = r'https?://(?:www\.)?kick\.com/(?!(?:video|categories|search|auth)(?:[/?#]|$))(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://kick.com/yuppy',
'info_dict': {

150
yt_dlp/extractor/qdance.py Normal file
View File

@@ -0,0 +1,150 @@
import json
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
jwt_decode_hs256,
str_or_none,
traverse_obj,
try_call,
url_or_none,
)
class QDanceIE(InfoExtractor):
_NETRC_MACHINE = 'qdance'
_VALID_URL = r'https?://(?:www\.)?q-dance\.com/network/(?:library|live)/(?P<id>\d+)'
_TESTS = [{
'note': 'vod',
'url': 'https://www.q-dance.com/network/library/146542138',
'info_dict': {
'id': '146542138',
'ext': 'mp4',
'title': 'Sound Rush [LIVE] | Defqon.1 Weekend Festival 2022 | Friday | RED',
'display_id': 'sound-rush-live-v3-defqon-1-weekend-festival-2022-friday-red',
'description': 'Relive Defqon.1 - Primal Energy 2022 with the sounds of Sound Rush LIVE at the RED on Friday! 🔥',
'season': 'Defqon.1 Weekend Festival 2022',
'season_id': '31840632',
'series': 'Defqon.1',
'series_id': '31840378',
'thumbnail': 'https://images.q-dance.network/1674829540-20220624171509-220624171509_delio_dn201093-2.jpg',
'availability': 'premium_only',
'duration': 1829,
},
'params': {'skip_download': 'm3u8'},
}, {
'note': 'livestream',
'url': 'https://www.q-dance.com/network/live/149170353',
'info_dict': {
'id': '149170353',
'ext': 'mp4',
'title': r're:^Defqon\.1 2023 - Friday - RED',
'display_id': 'defqon-1-2023-friday-red',
'description': 'md5:3c73fbbd4044e578e696adfc64019163',
'season': 'Defqon.1 Weekend Festival 2023',
'season_id': '141735599',
'series': 'Defqon.1',
'series_id': '31840378',
'thumbnail': 'https://images.q-dance.network/1686849069-area-thumbs_red.png',
'availability': 'subscriber_only',
'live_status': 'is_live',
'channel_id': 'qdancenetwork.video_149170353',
},
'skip': 'Completed livestream',
}]
_access_token = None
_refresh_token = None
def _call_login_api(self, data, note='Logging in'):
login = self._download_json(
'https://members.id-t.com/api/auth/login', None, note, headers={
'content-type': 'application/json',
'brand': 'qdance',
'origin': 'https://www.q-dance.com',
'referer': 'https://www.q-dance.com/',
}, data=json.dumps(data, separators=(',', ':')).encode(),
expected_status=lambda x: True)
tokens = traverse_obj(login, ('data', {
'_id-t-accounts-token': ('accessToken', {str}),
'_id-t-accounts-refresh': ('refreshToken', {str}),
'_id-t-accounts-id-token': ('idToken', {str}),
}))
if not tokens.get('_id-t-accounts-token'):
error = ': '.join(traverse_obj(login, ('error', ('code', 'message'), {str})))
if 'validation_error' not in error:
raise ExtractorError(f'Q-Dance API said "{error}"')
msg = 'Invalid username or password' if 'email' in data else 'Refresh token has expired'
raise ExtractorError(msg, expected=True)
for name, value in tokens.items():
self._set_cookie('.q-dance.com', name, value)
def _perform_login(self, username, password):
self._call_login_api({'email': username, 'password': password})
def _real_initialize(self):
cookies = self._get_cookies('https://www.q-dance.com/')
self._refresh_token = try_call(lambda: cookies['_id-t-accounts-refresh'].value)
self._access_token = try_call(lambda: cookies['_id-t-accounts-token'].value)
if not self._access_token:
self.raise_login_required()
def _get_auth(self):
if (try_call(lambda: jwt_decode_hs256(self._access_token)['exp']) or 0) <= int(time.time() - 120):
if not self._refresh_token:
raise ExtractorError(
'Cannot refresh access token, login with yt-dlp or refresh cookies in browser')
self._call_login_api({'refreshToken': self._refresh_token}, note='Refreshing access token')
self._real_initialize()
return {'Authorization': self._access_token}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._search_nuxt_data(webpage, video_id, traverse=('data', 0, 'data'))
def extract_availability(level):
level = int_or_none(level) or 0
return self._availability(
needs_premium=(level >= 20), needs_subscription=(level >= 15), needs_auth=True)
info = traverse_obj(data, {
'title': ('title', {str.strip}),
'description': ('description', {str.strip}),
'display_id': ('slug', {str}),
'thumbnail': ('thumbnail', {url_or_none}),
'duration': ('durationInSeconds', {int_or_none}, {lambda x: x or None}),
'availability': ('subscription', 'level', {extract_availability}),
'is_live': ('type', {lambda x: x.lower() == 'live'}),
'artist': ('acts', ..., {str}),
'series': ('event', 'title', {str.strip}),
'series_id': ('event', 'id', {str_or_none}),
'season': ('eventEdition', 'title', {str.strip}),
'season_id': ('eventEdition', 'id', {str_or_none}),
'channel_id': ('pubnub', 'channelName', {str}),
})
stream = self._download_json(
f'https://dc9h6qmsoymbq.cloudfront.net/api/content/videos/{video_id}/url',
video_id, headers=self._get_auth(), expected_status=401)
m3u8_url = traverse_obj(stream, ('data', 'url', {url_or_none}))
if not m3u8_url and traverse_obj(stream, ('error', 'code')) == 'unauthorized':
raise ExtractorError('Your account does not have access to this content', expected=True)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, fatal=False, live=True) if m3u8_url else []
if not formats:
self.raise_no_formats('No active streams found', expected=bool(info.get('is_live')))
return {
**info,
'id': video_id,
'formats': formats,
}

View File

@@ -139,8 +139,8 @@ def _real_extract(self, url):
'release_year': ('releaseYear', {int_or_none}),
'duration': ('duration', ({float_or_none}, {parse_duration})),
'is_live': ('liveStream', {bool}),
'age_limit': (
('classificationID', 'contentRating'), {str.upper}, {self._AUS_TV_PARENTAL_GUIDELINES.get}),
'age_limit': (('classificationID', 'contentRating'), {str.upper}, {
lambda x: self._AUS_TV_PARENTAL_GUIDELINES.get(x)}), # dict.get is unhashable in py3.7
}, get_all=False),
**traverse_obj(media, {
'categories': (('genres', ...), ('taxonomy', ('genre', 'subgenre'), 'name'), {str}),

View File

@@ -0,0 +1,148 @@
import time
from .wrestleuniverse import WrestleUniverseBaseIE
from ..utils import (
int_or_none,
traverse_obj,
url_or_none,
)
class StacommuBaseIE(WrestleUniverseBaseIE):
_NETRC_MACHINE = 'stacommu'
_API_HOST = 'api.stacommu.jp'
_LOGIN_QUERY = {'key': 'AIzaSyCR9czxhH2eWuijEhTNWBZ5MCcOYEUTAhg'}
_LOGIN_HEADERS = {
'Accept': '*/*',
'Content-Type': 'application/json',
'X-Client-Version': 'Chrome/JsCore/9.9.4/FirebaseCore-web',
'Referer': 'https://www.stacommu.jp/',
'Origin': 'https://www.stacommu.jp',
}
@WrestleUniverseBaseIE._TOKEN.getter
def _TOKEN(self):
if self._REAL_TOKEN and self._TOKEN_EXPIRY <= int(time.time()):
self._refresh_token()
return self._REAL_TOKEN
def _get_formats(self, data, path, video_id=None):
if not traverse_obj(data, path) and not data.get('canWatch') and not self._TOKEN:
self.raise_login_required(method='password')
return super()._get_formats(data, path, video_id)
def _extract_hls_key(self, data, path, decrypt):
encryption_data = traverse_obj(data, path)
if traverse_obj(encryption_data, ('encryptType', {int})) == 0:
return None
return traverse_obj(encryption_data, {'key': ('key', {decrypt}), 'iv': ('iv', {decrypt})})
class StacommuVODIE(StacommuBaseIE):
_VALID_URL = r'https?://www\.stacommu\.jp/videos/episodes/(?P<id>[\da-zA-Z]+)'
_TESTS = [{
# not encrypted
'url': 'https://www.stacommu.jp/videos/episodes/aXcVKjHyAENEjard61soZZ',
'info_dict': {
'id': 'aXcVKjHyAENEjard61soZZ',
'ext': 'mp4',
'title': 'スタコミュAWARDの裏側、ほぼ全部見せます〜晴れ舞台の直前ドキドキ編〜',
'description': 'md5:6400275c57ae75c06da36b06f96beb1c',
'timestamp': 1679652000,
'upload_date': '20230324',
'thumbnail': 'https://image.stacommu.jp/6eLobQan8PFtBoU4RL4uGg/6eLobQan8PFtBoU4RL4uGg',
'cast': 'count:11',
'duration': 250,
},
'params': {
'skip_download': 'm3u8',
},
}, {
# encrypted; requires a premium account
'url': 'https://www.stacommu.jp/videos/episodes/3hybMByUvzMEqndSeu5LpD',
'info_dict': {
'id': '3hybMByUvzMEqndSeu5LpD',
'ext': 'mp4',
'title': 'スタプラフェス2023〜裏側ほぼ全部見せます〜10',
'description': 'md5:85494488ccf1dfa1934accdeadd7b340',
'timestamp': 1682506800,
'upload_date': '20230426',
'thumbnail': 'https://image.stacommu.jp/eMdXtEefR4kEyJJMpAFi7x/eMdXtEefR4kEyJJMpAFi7x',
'cast': 'count:55',
'duration': 312,
'hls_aes': {
'key': '6bbaf241b8e1fd9f59ecf546a70e4ae7',
'iv': '1fc9002a23166c3bb1d240b953d09de9',
},
},
'params': {
'skip_download': 'm3u8',
},
}]
_API_PATH = 'videoEpisodes'
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._download_metadata(
url, video_id, 'ja', ('dehydratedState', 'queries', 0, 'state', 'data'))
hls_info, decrypt = self._call_encrypted_api(
video_id, ':watch', 'stream information', data={'method': 1})
return {
'id': video_id,
'formats': self._get_formats(hls_info, ('protocolHls', 'url', {url_or_none}), video_id),
'hls_aes': self._extract_hls_key(hls_info, 'protocolHls', decrypt),
**traverse_obj(video_info, {
'title': ('displayName', {str}),
'description': ('description', {str}),
'timestamp': ('watchStartTime', {int_or_none}),
'thumbnail': ('keyVisualUrl', {url_or_none}),
'cast': ('casts', ..., 'displayName', {str}),
'duration': ('duration', {int}),
}),
}
class StacommuLiveIE(StacommuBaseIE):
_VALID_URL = r'https?://www\.stacommu\.jp/live/(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'https://www.stacommu.jp/live/d2FJ3zLnndegZJCAEzGM3m',
'info_dict': {
'id': 'd2FJ3zLnndegZJCAEzGM3m',
'ext': 'mp4',
'title': '仲村悠菜 2023/05/04',
'timestamp': 1683195647,
'upload_date': '20230504',
'thumbnail': 'https://image.stacommu.jp/pHGF57SPEHE2ke83FS92FN/pHGF57SPEHE2ke83FS92FN',
'duration': 5322,
'hls_aes': {
'key': 'efbb3ec0b8246f61adf1764c5a51213a',
'iv': '80621d19a1f19167b64cedb415b05d1c',
},
},
'params': {
'skip_download': 'm3u8',
},
}]
_API_PATH = 'events'
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._call_api(video_id, msg='video information', query={'al': 'ja'}, auth=False)
hls_info, decrypt = self._call_encrypted_api(
video_id, ':watchArchive', 'stream information', data={'method': 1})
return {
'id': video_id,
'formats': self._get_formats(hls_info, ('hls', 'urls', ..., {url_or_none}), video_id),
'hls_aes': self._extract_hls_key(hls_info, 'hls', decrypt),
**traverse_obj(video_info, {
'title': ('displayName', {str}),
'timestamp': ('startTime', {int_or_none}),
'thumbnail': ('keyVisualUrl', {url_or_none}),
'duration': ('duration', {int_or_none}),
}),
}

View File

@@ -1015,18 +1015,16 @@ def _real_extract(self, url):
self.to_screen(f'{e}; trying with webpage')
webpage = self._download_webpage(url, video_id)
render_data_json = self._search_regex(
r'<script [^>]*\bid=[\'"]RENDER_DATA[\'"][^>]*>(%7B.+%7D)</script>',
webpage, 'render data', default=None)
if not render_data_json:
render_data = self._search_json(
r'<script [^>]*\bid=[\'"]RENDER_DATA[\'"][^>]*>', webpage, 'render data', video_id,
contains_pattern=r'%7B(?s:.+)%7D', fatal=False, transform_source=compat_urllib_parse_unquote)
if not render_data:
# TODO: Run verification challenge code to generate signature cookies
cookies = self._get_cookies(self._WEBPAGE_HOST)
expected = not cookies.get('s_v_web_id') or not cookies.get('ttwid')
raise ExtractorError(
'Fresh cookies (not necessarily logged in) are needed', expected=expected)
render_data = self._parse_json(
render_data_json, video_id, transform_source=compat_urllib_parse_unquote)
return self._parse_aweme_video_web(get_first(render_data, ('aweme', 'detail')), url, video_id)

View File

@@ -1,5 +1,6 @@
import json
import re
import urllib.error
from .common import InfoExtractor
from .periscope import PeriscopeBaseIE, PeriscopeIE
@@ -34,7 +35,6 @@ class TwitterBaseIE(InfoExtractor):
_GRAPHQL_API_BASE = 'https://twitter.com/i/api/graphql/'
_BASE_REGEX = r'https?://(?:(?:www|m(?:obile)?)\.)?(?:twitter\.com|twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid\.onion)/'
_AUTH = {'Authorization': 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'}
_guest_token = None
_flow_token = None
_LOGIN_INIT_DATA = json.dumps({
@@ -145,14 +145,6 @@ def _search_dimensions_in_video_url(a_format, video_url):
def is_logged_in(self):
return bool(self._get_cookies(self._API_BASE).get('auth_token'))
def _fetch_guest_token(self, headers, display_id):
headers.pop('x-guest-token', None)
self._guest_token = traverse_obj(self._download_json(
f'{self._API_BASE}guest/activate.json', display_id,
'Downloading guest token', data=b'', headers=headers), 'guest_token')
if not self._guest_token:
raise ExtractorError('Could not retrieve guest token')
def _set_base_headers(self):
headers = self._AUTH.copy()
csrf_token = try_call(lambda: self._get_cookies(self._API_BASE)['ct0'].value)
@@ -183,12 +175,15 @@ def _perform_login(self, username, password):
if self.is_logged_in:
return
self._request_webpage('https://twitter.com/', None, 'Requesting cookies')
webpage = self._download_webpage('https://twitter.com/', None, 'Downloading login page')
headers = self._set_base_headers()
self._fetch_guest_token(headers, None)
guest_token = self._search_regex(
r'\.cookie\s*=\s*["\']gt=(\d+);', webpage, 'gt', default=None) or self._download_json(
f'{self._API_BASE}guest/activate.json', None, 'Downloading guest token',
data=b'', headers=headers)['guest_token']
headers.update({
'content-type': 'application/json',
'x-guest-token': self._guest_token,
'x-guest-token': guest_token,
'x-twitter-client-language': 'en',
'x-twitter-active-user': 'yes',
'Referer': 'https://twitter.com/',
@@ -285,37 +280,24 @@ def input_dict(subtask_id, text):
self.report_login()
def _call_api(self, path, video_id, query={}, graphql=False):
headers = self._set_base_headers()
if self.is_logged_in:
headers.update({
if not self.is_logged_in:
self.raise_login_required()
result = self._download_json(
(self._GRAPHQL_API_BASE if graphql else self._API_BASE) + path, video_id,
f'Downloading {"GraphQL" if graphql else "legacy API"} JSON', headers={
**self._set_base_headers(),
'x-twitter-auth-type': 'OAuth2Session',
'x-twitter-client-language': 'en',
'x-twitter-active-user': 'yes',
})
}, query=query, expected_status={400, 401, 403, 404} if graphql else {403})
for first_attempt in (True, False):
if not self.is_logged_in:
if not self._guest_token:
self._fetch_guest_token(headers, video_id)
headers['x-guest-token'] = self._guest_token
if result.get('errors'):
errors = ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str}))))
raise ExtractorError(
f'Error(s) while querying API: {errors or "Unknown error"}', expected=True)
allowed_status = {400, 401, 403, 404} if graphql else {403}
result = self._download_json(
(self._GRAPHQL_API_BASE if graphql else self._API_BASE) + path,
video_id, headers=headers, query=query, expected_status=allowed_status,
note=f'Downloading {"GraphQL" if graphql else "legacy API"} JSON')
if result.get('errors'):
errors = ', '.join(set(traverse_obj(result, ('errors', ..., 'message', {str}))))
if not self.is_logged_in and first_attempt and 'bad guest token' in errors.lower():
self.to_screen('Guest token has expired. Refreshing guest token')
self._guest_token = None
continue
raise ExtractorError(
f'Error(s) while querying API: {errors or "Unknown error"}', expected=True)
return result
return result
def _build_graphql_query(self, media_id):
raise NotImplementedError('Method must be implemented to support GraphQL')
@@ -457,6 +439,7 @@ class TwitterIE(TwitterBaseIE):
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'(?:(?:i/web|[^/]+)/status|statuses)/(?P<id>\d+)(?:/(?:video|photo)/(?P<index>\d+))?'
_TESTS = [{
# comment_count, repost_count, view_count are only available with auth (applies to all tests)
'url': 'https://twitter.com/freethenipple/status/643211948184596480',
'info_dict': {
'id': '643211870443208704',
@@ -471,10 +454,7 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1442188653,
'upload_date': '20150913',
'uploader_url': 'https://twitter.com/freethenipple',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': [],
'age_limit': 18,
},
@@ -505,8 +485,6 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1447395772,
'upload_date': '20151113',
'uploader_url': 'https://twitter.com/starwars',
'comment_count': int,
'repost_count': int,
'like_count': int,
'tags': ['TV', 'StarWars', 'TheForceAwakens'],
'age_limit': 0,
@@ -550,10 +528,7 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1455777459,
'upload_date': '20160218',
'uploader_url': 'https://twitter.com/jaydingeer',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': ['Damndaniel'],
'age_limit': 0,
},
@@ -591,10 +566,7 @@ class TwitterIE(TwitterBaseIE):
'upload_date': '20160412',
'uploader_url': 'https://twitter.com/CaptainAmerica',
'thumbnail': r're:^https?://.*\.jpg',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': [],
'age_limit': 0,
},
@@ -641,10 +613,7 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1505803395,
'upload_date': '20170919',
'uploader_url': 'https://twitter.com/Prefet971',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': ['Maria'],
'age_limit': 0,
},
@@ -667,10 +636,7 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1527623489,
'upload_date': '20180529',
'uploader_url': 'https://twitter.com/LisPower1',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': [],
'age_limit': 0,
},
@@ -692,10 +658,7 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1548184644,
'upload_date': '20190122',
'uploader_url': 'https://twitter.com/Twitter',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': [],
'age_limit': 0,
},
@@ -713,6 +676,7 @@ class TwitterIE(TwitterBaseIE):
'view_count': int,
},
'add_ie': ['TwitterBroadcast'],
'skip': 'Requires authentication',
}, {
# unified card
'url': 'https://twitter.com/BrooklynNets/status/1349794411333394432?s=20',
@@ -729,8 +693,6 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1610651040,
'upload_date': '20210114',
'uploader_url': 'https://twitter.com/BrooklynNets',
'comment_count': int,
'repost_count': int,
'like_count': int,
'tags': [],
'age_limit': 0,
@@ -753,10 +715,7 @@ class TwitterIE(TwitterBaseIE):
'thumbnail': r're:^https?://.*\.jpg',
'duration': 30.03,
'timestamp': 1665025050,
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': [],
'age_limit': 0,
},
@@ -765,15 +724,13 @@ class TwitterIE(TwitterBaseIE):
'url': 'https://twitter.com/UltimaShadowX/status/1577719286659006464',
'info_dict': {
'id': '1577719286659006464',
'title': 'Ultima | #\u0432\u029f\u043c - Test',
'title': 'Ultima📛 | #вʟм - Test',
'description': 'Test https://t.co/Y3KEZD7Dad',
'uploader': 'Ultima | #\u0432\u029f\u043c',
'uploader': 'Ultima📛 | #вʟм',
'uploader_id': 'UltimaShadowX',
'uploader_url': 'https://twitter.com/UltimaShadowX',
'upload_date': '20221005',
'timestamp': 1664992565,
'comment_count': int,
'repost_count': int,
'like_count': int,
'tags': [],
'age_limit': 0,
@@ -795,10 +752,7 @@ class TwitterIE(TwitterBaseIE):
'duration': 21.321,
'timestamp': 1664477766,
'upload_date': '20220929',
'comment_count': int,
'repost_count': int,
'like_count': int,
'view_count': int,
'tags': ['HurricaneIan'],
'age_limit': 0,
},
@@ -825,6 +779,20 @@ class TwitterIE(TwitterBaseIE):
},
'skip': 'Requires authentication',
}, {
# Single Vimeo video result without auth
'url': 'https://twitter.com/Srirachachau/status/1395079556562706435',
'info_dict': {
'id': '551578322',
'ext': 'mp4',
'title': 'Dusty & The Mayor',
'uploader': 'Michael Chau',
'uploader_id': 'user29061007',
'uploader_url': 'https://vimeo.com/user29061007',
'duration': 478,
'thumbnail': 'https://i.vimeocdn.com/video/1139658575-0dfdce6e9a2401fe09feb24bf0d14e6f24a53c12f447ff688ace61009ad4c1ba-d_1280',
},
}, {
# Playlist result only with auth
'url': 'https://twitter.com/Srirachachau/status/1395079556562706435',
'playlist_mincount': 2,
'info_dict': {
@@ -842,6 +810,7 @@ class TwitterIE(TwitterBaseIE):
'uploader_url': 'https://twitter.com/Srirachachau',
'timestamp': 1621447860,
},
'skip': 'Requires authentication',
}, {
'url': 'https://twitter.com/DavidToons_/status/1578353380363501568',
'playlist_mincount': 2,
@@ -860,6 +829,7 @@ class TwitterIE(TwitterBaseIE):
'upload_date': '20221007',
'age_limit': 0,
},
'skip': 'Requires authentication',
}, {
'url': 'https://twitter.com/primevideouk/status/1578401165338976258',
'playlist_count': 2,
@@ -873,8 +843,6 @@ class TwitterIE(TwitterBaseIE):
'upload_date': '20221007',
'age_limit': 0,
'uploader_url': 'https://twitter.com/primevideouk',
'comment_count': int,
'repost_count': int,
'like_count': int,
'tags': ['TheRingsOfPower'],
},
@@ -889,11 +857,14 @@ class TwitterIE(TwitterBaseIE):
'uploader_id': 'MoniqueCamarra',
'live_status': 'was_live',
'release_timestamp': 1658417414,
'description': 'md5:acce559345fd49f129c20dbcda3f1201',
'timestamp': 1658407771464,
'description': 'md5:4dc8e972f1d8b3c6580376fabb02a3ad',
'timestamp': 1658407771,
'release_date': '20220721',
'upload_date': '20220721',
},
'add_ie': ['TwitterSpaces'],
'params': {'skip_download': 'm3u8'},
'skip': 'Requires authentication',
}, {
# URL specifies video number but --yes-playlist
'url': 'https://twitter.com/CTVJLaidlaw/status/1600649710662213632/video/1',
@@ -903,9 +874,7 @@ class TwitterIE(TwitterBaseIE):
'title': 'md5:be05989b0722e114103ed3851a0ffae2',
'timestamp': 1670459604.0,
'description': 'md5:591c19ce66fadc2359725d5cd0d1052c',
'comment_count': int,
'uploader_id': 'CTVJLaidlaw',
'repost_count': int,
'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'],
'upload_date': '20221208',
'age_limit': 0,
@@ -924,14 +893,11 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1670459604.0,
'uploader_id': 'CTVJLaidlaw',
'uploader': 'Jocelyn Laidlaw',
'repost_count': int,
'comment_count': int,
'tags': ['colorectalcancer', 'cancerjourney', 'imnotaquitter'],
'duration': 102.226,
'uploader_url': 'https://twitter.com/CTVJLaidlaw',
'display_id': '1600649710662213632',
'like_count': int,
'view_count': int,
'description': 'md5:591c19ce66fadc2359725d5cd0d1052c',
'upload_date': '20221208',
'age_limit': 0,
@@ -957,9 +923,6 @@ class TwitterIE(TwitterBaseIE):
'age_limit': 18,
'tags': [],
'like_count': int,
'repost_count': int,
'comment_count': int,
'view_count': int,
},
}, {
'url': 'https://twitter.com/hlo_again/status/1599108751385972737/video/2',
@@ -972,10 +935,7 @@ class TwitterIE(TwitterBaseIE):
'like_count': int,
'uploader_id': 'hlo_again',
'thumbnail': 'https://pbs.twimg.com/ext_tw_video_thumb/1599108643743473680/pu/img/UG3xjov4rgg5sbYM.jpg?name=orig',
'repost_count': int,
'duration': 9.531,
'comment_count': int,
'view_count': int,
'upload_date': '20221203',
'age_limit': 0,
'timestamp': 1670092210.0,
@@ -992,14 +952,11 @@ class TwitterIE(TwitterBaseIE):
'ext': 'mp4',
'uploader_url': 'https://twitter.com/MunTheShinobi',
'description': 'This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525 https://t.co/cNsA0MoOml',
'view_count': int,
'thumbnail': 'https://pbs.twimg.com/ext_tw_video_thumb/1600009362759733248/pu/img/XVhFQivj75H_YxxV.jpg?name=orig',
'age_limit': 0,
'uploader': 'Mün The Shinobi',
'repost_count': int,
'upload_date': '20221206',
'title': 'Mün The Shinobi - This is a genius ad by Apple. \U0001f525\U0001f525\U0001f525\U0001f525\U0001f525',
'comment_count': int,
'like_count': int,
'tags': [],
'uploader_id': 'MunTheShinobi',
@@ -1007,14 +964,14 @@ class TwitterIE(TwitterBaseIE):
'timestamp': 1670306984.0,
},
}, {
# url to retweet id, legacy API
# url to retweet id
'url': 'https://twitter.com/liberdalau/status/1623739803874349067',
'info_dict': {
'id': '1623274794488659969',
'display_id': '1623739803874349067',
'ext': 'mp4',
'title': 'Johnny Bullets - Me after going viral to over 30million people: Whoopsie-daisy',
'description': 'md5:e873616a4a8fe0f93e71872678a672f3',
'description': 'md5:224d62f54b0cdef8e33d4c56c41ac503',
'uploader': 'Johnny Bullets',
'uploader_id': 'Johnnybull3ts',
'uploader_url': 'https://twitter.com/Johnnybull3ts',
@@ -1025,10 +982,7 @@ class TwitterIE(TwitterBaseIE):
'upload_date': '20230208',
'thumbnail': r're:https://pbs\.twimg\.com/ext_tw_video_thumb/.+',
'like_count': int,
'repost_count': int,
'comment_count': int,
},
'params': {'extractor_args': {'twitter': {'legacy_api': ['']}}},
}, {
# onion route
'url': 'https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/TwitterBlue/status/1484226494708662273',
@@ -1079,8 +1033,6 @@ def _graphql_to_legacy(self, data, twid):
if 'tombstone' in result:
cause = remove_end(traverse_obj(result, ('tombstone', 'text', 'text', {str})), '. Learn more')
if cause and 'adult content' in cause:
self.raise_login_required(cause)
raise ExtractorError(f'Twitter API says: {cause or "Unknown error"}', expected=True)
status = result.get('legacy', {})
@@ -1136,19 +1088,22 @@ def _build_graphql_query(self, media_id):
def _real_extract(self, url):
twid, selected_index = self._match_valid_url(url).group('id', 'index')
if self._configuration_arg('legacy_api') and not self.is_logged_in:
status = traverse_obj(self._call_api(f'statuses/show/{twid}.json', twid, {
'cards_platform': 'Web-12',
'include_cards': 1,
'include_reply_count': 1,
'include_user_entities': 0,
'tweet_mode': 'extended',
}), 'retweeted_status', None)
if not self.is_logged_in:
try:
status = self._download_json(
'https://cdn.syndication.twimg.com/tweet-result', twid, 'Downloading syndication JSON',
headers={'User-Agent': 'Googlebot'}, query={'id': twid})
self.to_screen(f'Some metadata is missing without authentication. {self._login_hint()}')
except ExtractorError as e:
if isinstance(e.cause, urllib.error.HTTPError) and e.cause.code == 404:
self.raise_login_required('Requested tweet may only be available when logged in')
raise
else:
result = self._call_graphql_api('zZXycP0V6H7m-2r0mOnFcA/TweetDetail', twid)
status = self._graphql_to_legacy(result, twid)
status = self._graphql_to_legacy(
self._call_graphql_api('zZXycP0V6H7m-2r0mOnFcA/TweetDetail', twid), twid)
title = description = status['full_text'].replace('\n', ' ')
title = description = traverse_obj(
status, (('full_text', 'text'), {lambda x: x.replace('\n', ' ')}), get_all=False) or ''
# strip 'https -_t.co_BJYgOjSeGA' junk from filenames
title = re.sub(r'\s+(https?://[^ ]+)', '', title)
user = status.get('user') or {}
@@ -1174,12 +1129,16 @@ def _real_extract(self, url):
def extract_from_video_info(media):
media_id = traverse_obj(media, 'id_str', 'id', expected_type=str_or_none)
if not media_id:
# workaround for non-authenticated responses
media_id = traverse_obj(media, (
'video_info', 'variants', ..., 'url',
{lambda x: re.search(r'_video/(\d+)/', x)[1]}), get_all=False)
self.write_debug(f'Extracting from video info: {media_id}')
video_info = media.get('video_info') or {}
formats = []
subtitles = {}
for variant in video_info.get('variants', []):
for variant in traverse_obj(media, ('video_info', 'variants', ...)):
fmts, subs = self._extract_variant_formats(variant, twid)
subtitles = self._merge_subtitles(subtitles, subs)
formats.extend(fmts)
@@ -1199,12 +1158,12 @@ def add_thumbnail(name, size):
add_thumbnail('orig', media.get('original_info') or {})
return {
'id': media_id,
'id': media_id or twid,
'formats': formats,
'subtitles': subtitles,
'thumbnails': thumbnails,
'view_count': traverse_obj(media, ('mediaStats', 'viewCount', {int_or_none})),
'duration': float_or_none(video_info.get('duration_millis'), 1000),
'duration': float_or_none(traverse_obj(media, ('video_info', 'duration_millis')), 1000),
# The codec of http formats are unknown
'_format_sort_fields': ('res', 'br', 'size', 'proto'),
}
@@ -1284,12 +1243,15 @@ def get_binding_value(k):
}
videos = traverse_obj(status, (
(None, 'quoted_status'), 'extended_entities', 'media', lambda _, m: m['type'] != 'photo', {dict}))
('mediaDetails', ((None, 'quoted_status'), 'extended_entities', 'media')),
lambda _, m: m['type'] != 'photo', {dict}))
if self._yes_playlist(twid, selected_index, video_label='URL-specified video number'):
selected_entries = (*map(extract_from_video_info, videos), *extract_from_card_info(status.get('card')))
else:
desired_obj = traverse_obj(status, ('extended_entities', 'media', int(selected_index) - 1, {dict}))
desired_obj = traverse_obj(status, (
('mediaDetails', ((None, 'quoted_status'), 'extended_entities', 'media')),
int(selected_index) - 1, {dict}), get_all=False)
if not desired_obj:
raise ExtractorError(f'Video #{selected_index} is unavailable', expected=True)
elif desired_obj.get('type') != 'video':
@@ -1436,7 +1398,10 @@ class TwitterSpacesIE(TwitterBaseIE):
'uploader': r're:Lucio Di Gaetano.*?',
'uploader_id': 'luciodigaetano',
'live_status': 'was_live',
'timestamp': 1659877956397,
'timestamp': 1659877956,
'upload_date': '20220807',
'release_timestamp': 1659904215,
'release_date': '20220807',
},
'params': {'skip_download': 'm3u8'},
}]
@@ -1482,26 +1447,31 @@ def _real_extract(self, url):
metadata = space_data['metadata']
live_status = try_call(lambda: self.SPACE_STATUS[metadata['state'].lower()])
is_live = live_status == 'is_live'
formats = []
if live_status == 'is_upcoming':
self.raise_no_formats('Twitter Space not started yet', expected=True)
elif live_status == 'post_live':
self.raise_no_formats('Twitter Space ended but not downloadable yet', expected=True)
else:
source = self._call_api(
f'live_video_stream/status/{metadata["media_key"]}', metadata['media_key'])['source']
# XXX: Native downloader does not work
elif not is_live and not metadata.get('is_space_available_for_replay'):
self.raise_no_formats('Twitter Space ended and replay is disabled', expected=True)
elif metadata.get('media_key'):
source = traverse_obj(
self._call_api(f'live_video_stream/status/{metadata["media_key"]}', metadata['media_key']),
('source', ('noRedirectPlaybackUrl', 'location'), {url_or_none}), get_all=False)
formats = self._extract_m3u8_formats(
traverse_obj(source, 'noRedirectPlaybackUrl', 'location'),
metadata['media_key'], 'm4a', 'm3u8', live=live_status == 'is_live',
headers={'Referer': 'https://twitter.com/'})
source, metadata['media_key'], 'm4a', live=is_live, fatal=False,
headers={'Referer': 'https://twitter.com/'}) if source else []
for fmt in formats:
fmt.update({'vcodec': 'none', 'acodec': 'aac'})
if not is_live:
fmt['container'] = 'm4a_dash'
participants = ', '.join(traverse_obj(
space_data, ('participants', 'speakers', ..., 'display_name'))) or 'nobody yet'
if not formats and live_status == 'post_live':
self.raise_no_formats('Twitter Space ended but not downloadable yet', expected=True)
return {
'id': space_id,
'title': metadata.get('title'),
@@ -1513,7 +1483,7 @@ def _real_extract(self, url):
'live_status': live_status,
'release_timestamp': try_call(
lambda: int_or_none(metadata['scheduled_start'], scale=1000)),
'timestamp': metadata.get('created_at'),
'timestamp': int_or_none(metadata.get('created_at'), scale=1000),
'formats': formats,
}

View File

@@ -70,6 +70,7 @@ def _real_extract(self, url):
r'src\s*:\s*(["\'])(?P<url>(?:https?://)?(?:(?!\1).)+)\1',
webpage) or []]
for source in sources:
source = urljoin(url, source)
height = int(self._search_regex(r'(\d+).mp4', source, 'height', default=360))
if self._request_webpage(HEADRequest(source), video_id, f'Checking {height}p url', errnote=False):
formats.append({

View File

@@ -11,11 +11,13 @@
from .youtube import YoutubeIE
from ..utils import (
ExtractorError,
UserNotLive,
clean_html,
get_element_by_class,
get_element_html_by_id,
int_or_none,
join_nonempty,
parse_resolution,
str_or_none,
str_to_int,
try_call,
@@ -25,6 +27,7 @@
url_or_none,
urlencode_postdata,
urljoin,
traverse_obj,
)
@@ -701,3 +704,139 @@ def _real_extract(self, url):
return self.playlist_result(
entries, post_id, join_nonempty(uploader, f'Wall post {post_id}', delim=' - '),
clean_html(get_element_by_class('wall_post_text', webpage)))
class VKPlayBaseIE(InfoExtractor):
_RESOLUTIONS = {
'tiny': '256x144',
'lowest': '426x240',
'low': '640x360',
'medium': '852x480',
'high': '1280x720',
'full_hd': '1920x1080',
'quad_hd': '2560x1440',
}
def _extract_from_initial_state(self, url, video_id, path):
webpage = self._download_webpage(url, video_id)
video_info = traverse_obj(self._search_json(
r'<script[^>]+\bid="initial-state"[^>]*>', webpage, 'initial state', video_id),
path, expected_type=dict)
if not video_info:
raise ExtractorError('Unable to extract video info from html inline initial state')
return video_info
def _extract_formats(self, stream_info, video_id):
formats = []
for stream in traverse_obj(stream_info, (
'data', 0, 'playerUrls', lambda _, v: url_or_none(v['url']) and v['type'])):
url = stream['url']
format_id = str_or_none(stream['type'])
if format_id in ('hls', 'live_hls', 'live_playback_hls') or '.m3u8' in url:
formats.extend(self._extract_m3u8_formats(url, video_id, m3u8_id=format_id, fatal=False))
elif format_id == 'dash':
formats.extend(self._extract_mpd_formats(url, video_id, mpd_id=format_id, fatal=False))
elif format_id in ('live_dash', 'live_playback_dash'):
self.write_debug(f'Not extracting unsupported format "{format_id}"')
else:
formats.append({
'url': url,
'ext': 'mp4',
'format_id': format_id,
**parse_resolution(self._RESOLUTIONS.get(format_id)),
})
return formats
def _extract_common_meta(self, stream_info):
return traverse_obj(stream_info, {
'id': ('id', {str_or_none}),
'title': ('title', {str}),
'release_timestamp': ('startTime', {int_or_none}),
'thumbnail': ('previewUrl', {url_or_none}),
'view_count': ('count', 'views', {int_or_none}),
'like_count': ('count', 'likes', {int_or_none}),
'categories': ('category', 'title', {str}, {lambda x: [x] if x else None}),
'uploader': (('user', ('blog', 'owner')), 'nick', {str}),
'uploader_id': (('user', ('blog', 'owner')), 'id', {str_or_none}),
'duration': ('duration', {int_or_none}),
'is_live': ('isOnline', {bool}),
'concurrent_view_count': ('count', 'viewers', {int_or_none}),
}, get_all=False)
class VKPlayIE(VKPlayBaseIE):
_VALID_URL = r'https?://vkplay\.live/(?P<username>[^/]+)/record/(?P<id>[a-f0-9\-]+)'
_TESTS = [{
'url': 'https://vkplay.live/zitsmann/record/f5e6e3b5-dc52-4d14-965d-0680dd2882da',
'info_dict': {
'id': 'f5e6e3b5-dc52-4d14-965d-0680dd2882da',
'ext': 'mp4',
'title': 'Atomic Heart (пробуем!) спасибо подписчику EKZO!',
'uploader': 'ZitsmanN',
'uploader_id': '13159830',
'release_timestamp': 1683461378,
'release_date': '20230507',
'thumbnail': r're:https://images.vkplay.live/public_video_stream/record/f5e6e3b5-dc52-4d14-965d-0680dd2882da/preview\?change_time=\d+',
'duration': 10608,
'view_count': int,
'like_count': int,
'categories': ['Atomic Heart'],
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
username, video_id = self._match_valid_url(url).groups()
record_info = traverse_obj(self._download_json(
f'https://api.vkplay.live/v1/blog/{username}/public_video_stream/record/{video_id}', video_id, fatal=False),
('data', 'record', {dict}))
if not record_info:
record_info = self._extract_from_initial_state(url, video_id, ('record', 'currentRecord', 'data'))
return {
**self._extract_common_meta(record_info),
'id': video_id,
'formats': self._extract_formats(record_info, video_id),
}
class VKPlayLiveIE(VKPlayBaseIE):
_VALID_URL = r'https?://vkplay\.live/(?P<id>[^/]+)/?(?:[#?]|$)'
_TESTS = [{
'url': 'https://vkplay.live/bayda',
'info_dict': {
'id': 'f02c321e-427b-408d-b12f-ae34e53e0ea2',
'ext': 'mp4',
'title': r're:эскапизм крута .*',
'uploader': 'Bayda',
'uploader_id': 12279401,
'release_timestamp': 1687209962,
'release_date': '20230619',
'thumbnail': r're:https://images.vkplay.live/public_video_stream/12279401/preview\?change_time=\d+',
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'categories': ['EVE Online'],
'live_status': 'is_live',
},
'skip': 'livestream',
'params': {'skip_download': True},
}]
def _real_extract(self, url):
username = self._match_id(url)
stream_info = self._download_json(
f'https://api.vkplay.live/v1/blog/{username}/public_video_stream', username, fatal=False)
if not stream_info:
stream_info = self._extract_from_initial_state(url, username, ('stream', 'stream', 'data', 'stream'))
formats = self._extract_formats(stream_info, username)
if not formats and not traverse_obj(stream_info, ('isOnline', {bool})):
raise UserNotLive(video_id=username)
return {
**self._extract_common_meta(stream_info),
'formats': formats,
}

View File

@@ -22,7 +22,7 @@ def _extract_packed(self, webpage):
packed = self._search_regex(
r'(eval\(function.+)', webpage, 'packed code')
unpacked = decode_packed_codes(packed)
digits = self._search_regex(r'\[((?:\d+,?)+)\]', unpacked, 'digits')
digits = self._search_regex(r'\[([\d,]+)\]', unpacked, 'digits')
digits = [int(digit) for digit in digits.split(',')]
key_digit = self._search_regex(
r'fromCharCode\(.+?(\d+)\)}', unpacked, 'key digit')

View File

@@ -14,12 +14,14 @@
try_call,
url_or_none,
urlencode_postdata,
variadic,
)
class WrestleUniverseBaseIE(InfoExtractor):
_NETRC_MACHINE = 'wrestleuniverse'
_VALID_URL_TMPL = r'https?://(?:www\.)?wrestle-universe\.com/(?:(?P<lang>\w{2})/)?%s/(?P<id>\w+)'
_API_HOST = 'api.wrestle-universe.com'
_API_PATH = None
_REAL_TOKEN = None
_TOKEN_EXPIRY = None
@@ -67,24 +69,28 @@ def _perform_login(self, username, password):
'returnSecureToken': True,
'email': username,
'password': password,
}, separators=(',', ':')).encode())
}, separators=(',', ':')).encode(), expected_status=400)
token = traverse_obj(login, ('idToken', {str}))
if not token:
raise ExtractorError(
f'Unable to log in: {traverse_obj(login, ("error", "message"))}', expected=True)
self._REFRESH_TOKEN = traverse_obj(login, ('refreshToken', {str}))
if not self._REFRESH_TOKEN:
self.report_warning('No refresh token was granted')
self._TOKEN = traverse_obj(login, ('idToken', {str}))
self._TOKEN = token
def _real_initialize(self):
if WrestleUniverseBaseIE._DEVICE_ID:
if self._DEVICE_ID:
return
WrestleUniverseBaseIE._DEVICE_ID = self._configuration_arg('device_id', [None], ie_key='WrestleUniverse')[0]
if not WrestleUniverseBaseIE._DEVICE_ID:
WrestleUniverseBaseIE._DEVICE_ID = self.cache.load(self._NETRC_MACHINE, 'device_id')
if WrestleUniverseBaseIE._DEVICE_ID:
self._DEVICE_ID = self._configuration_arg('device_id', [None], ie_key=self._NETRC_MACHINE)[0]
if not self._DEVICE_ID:
self._DEVICE_ID = self.cache.load(self._NETRC_MACHINE, 'device_id')
if self._DEVICE_ID:
return
WrestleUniverseBaseIE._DEVICE_ID = str(uuid.uuid4())
self._DEVICE_ID = str(uuid.uuid4())
self.cache.store(self._NETRC_MACHINE, 'device_id', WrestleUniverseBaseIE._DEVICE_ID)
self.cache.store(self._NETRC_MACHINE, 'device_id', self._DEVICE_ID)
def _refresh_token(self):
refresh = self._download_json(
@@ -108,10 +114,10 @@ def _call_api(self, video_id, param='', msg='API', auth=True, data=None, query={
if data:
headers['Content-Type'] = 'application/json;charset=utf-8'
data = json.dumps(data, separators=(',', ':')).encode()
if auth:
if auth and self._TOKEN:
headers['Authorization'] = f'Bearer {self._TOKEN}'
return self._download_json(
f'https://api.wrestle-universe.com/v1/{self._API_PATH}/{video_id}{param}', video_id,
f'https://{self._API_HOST}/v1/{self._API_PATH}/{video_id}{param}', video_id,
note=f'Downloading {msg} JSON', errnote=f'Failed to download {msg} JSON',
data=data, headers=headers, query=query, fatal=fatal)
@@ -137,12 +143,13 @@ def decrypt(data):
}, query=query, fatal=fatal)
return api_json, decrypt
def _download_metadata(self, url, video_id, lang, props_key):
def _download_metadata(self, url, video_id, lang, props_keys):
metadata = self._call_api(video_id, msg='metadata', query={'al': lang or 'ja'}, auth=False, fatal=False)
if not metadata:
webpage = self._download_webpage(url, video_id)
nextjs_data = self._search_nextjs_data(webpage, video_id)
metadata = traverse_obj(nextjs_data, ('props', 'pageProps', props_key, {dict})) or {}
metadata = traverse_obj(nextjs_data, (
'props', 'pageProps', *variadic(props_keys, (str, bytes, dict, set)), {dict})) or {}
return metadata
def _get_formats(self, data, path, video_id=None):

View File

@@ -2499,29 +2499,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader_id': '@abaointokyo',
},
'params': {'skip_download': True}
}, {
# Story. Requires specific player params to work.
'url': 'https://www.youtube.com/watch?v=vv8qTUWmulI',
'info_dict': {
'id': 'vv8qTUWmulI',
'ext': 'mp4',
'availability': 'unlisted',
'view_count': int,
'channel_id': 'UCzIZ8HrzDgc-pNQDUG6avBA',
'upload_date': '20220526',
'categories': ['Education'],
'title': 'Story',
'channel': 'IT\'S HISTORY',
'description': '',
'duration': 12,
'playable_in_embed': True,
'age_limit': 0,
'live_status': 'not_live',
'tags': [],
'thumbnail': 'https://i.ytimg.com/vi_webp/vv8qTUWmulI/maxresdefault.webp',
'channel_url': 'https://www.youtube.com/channel/UCzIZ8HrzDgc-pNQDUG6avBA',
},
'skip': 'stories get removed after some period of time',
}, {
'url': 'https://www.youtube.com/watch?v=tjjjtzRLHvA',
'info_dict': {
@@ -3140,7 +3117,7 @@ def _extract_n_function_name(self, jscode):
return funcname
return json.loads(js_to_json(self._search_regex(
rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])[,;]', jscode,
rf'var {re.escape(funcname)}\s*=\s*(\[.+?\])\s*[,;]', jscode,
f'Initial JS player n function list ({funcname}.{idx})')))[int(idx)]
def _extract_n_function_code(self, video_id, player_url):
@@ -3356,7 +3333,7 @@ def _extract_comment(self, comment_renderer, parent=None):
info['author_is_uploader'] = author_is_uploader
comment_abr = traverse_obj(
comment_renderer, ('actionsButtons', 'commentActionButtonsRenderer'), expected_type=dict)
comment_renderer, ('actionButtons', 'commentActionButtonsRenderer'), expected_type=dict)
if comment_abr is not None:
info['is_favorited'] = 'creatorHeart' in comment_abr
@@ -3426,7 +3403,9 @@ def extract_thread(contents):
# Pinned comments may appear a second time in newest first sort
# See: https://github.com/yt-dlp/yt-dlp/issues/6712
continue
self.report_warning('Detected YouTube comments looping. Stopping comment extraction as we probably cannot get any more.')
self.report_warning(
'Detected YouTube comments looping. Stopping comment extraction '
f'{"for this thread" if parent else ""} as we probably cannot get any more.')
yield
else:
tracker['seen_comment_ids'].add(comment['id'])
@@ -3517,12 +3496,18 @@ def extract_thread(contents):
# Ignore incomplete data error for replies if retries didn't work.
# This is to allow any other parent comments and comment threads to be downloaded.
# See: https://github.com/yt-dlp/yt-dlp/issues/4669
if 'incomplete data' in str(e).lower() and parent and self.get_param('ignoreerrors') is True:
self.report_warning(
'Received incomplete data for a comment reply thread and retrying did not help. '
'Ignoring to let other comments be downloaded.')
else:
raise
if 'incomplete data' in str(e).lower() and parent:
if self.get_param('ignoreerrors') in (True, 'only_download'):
self.report_warning(
'Received incomplete data for a comment reply thread and retrying did not help. '
'Ignoring to let other comments be downloaded. Pass --no-ignore-errors to not ignore.')
return
else:
raise ExtractorError(
'Incomplete data received for comment reply thread. '
'Pass --ignore-errors to ignore and allow rest of comments to download.',
expected=True)
raise
is_forced_continuation = False
continuation = None
for continuation_items in traverse_obj(response, continuation_items_path, expected_type=list, default=[]):
@@ -3612,7 +3597,7 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
yt_query = {
'videoId': video_id,
}
if smuggled_data.get('is_story') or _split_innertube_client(client)[0] == 'android':
if _split_innertube_client(client)[0] == 'android':
yt_query['params'] = self._PLAYER_PARAMS
yt_query.update(self._generate_player_context(sts))
@@ -3737,7 +3722,7 @@ def append_client(*client_names):
def _needs_live_processing(self, live_status, duration):
if (live_status == 'is_live' and self.get_param('live_from_start')
or live_status == 'post_live' and (duration or 0) > 4 * 3600):
or live_status == 'post_live' and (duration or 0) > 2 * 3600):
return live_status
def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, live_status, duration):
@@ -3752,7 +3737,12 @@ def _extract_formats_and_subtitles(self, streaming_data, video_id, player_url, l
'small', 'medium', 'large', 'hd720', 'hd1080', 'hd1440', 'hd2160', 'hd2880', 'highres'
])
streaming_formats = traverse_obj(streaming_data, (..., ('formats', 'adaptiveFormats'), ...))
all_formats = self._configuration_arg('include_duplicate_formats')
format_types = self._configuration_arg('formats')
all_formats = 'duplicate' in format_types
if self._configuration_arg('include_duplicate_formats'):
all_formats = True
self._downloader.deprecated_feature('[youtube] include_duplicate_formats extractor argument is deprecated. '
'Use formats=duplicate extractor argument instead')
def build_fragments(f):
return LazyList({
@@ -3892,18 +3882,23 @@ def build_fragments(f):
if single_stream and dct.get('ext'):
dct['container'] = dct['ext'] + '_dash'
if all_formats and dct['filesize']:
if (all_formats or 'dashy' in format_types) and dct['filesize']:
yield {
**dct,
'format_id': f'{dct["format_id"]}-dashy' if all_formats else dct['format_id'],
'protocol': 'http_dash_segments',
'fragments': build_fragments(dct),
}
dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE}
yield dct
if all_formats or 'dashy' not in format_types:
dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE}
yield dct
needs_live_processing = self._needs_live_processing(live_status, duration)
skip_bad_formats = not self._configuration_arg('include_incomplete_formats')
skip_bad_formats = 'incomplete' not in format_types
if self._configuration_arg('include_incomplete_formats'):
skip_bad_formats = False
self._downloader.deprecated_feature('[youtube] include_incomplete_formats extractor argument is deprecated. '
'Use formats=incomplete extractor argument instead')
skip_manifests = set(self._configuration_arg('skip'))
if (not self.get_param('youtube_include_hls_manifest', True)
@@ -3915,7 +3910,7 @@ def build_fragments(f):
skip_manifests.add('dash')
if self._configuration_arg('include_live_dash'):
self._downloader.deprecated_feature('[youtube] include_live_dash extractor argument is deprecated. '
'Use include_incomplete_formats extractor argument instead')
'Use formats=incomplete extractor argument instead')
elif skip_bad_formats and live_status == 'is_live' and needs_live_processing != 'is_live':
skip_manifests.add('dash')
@@ -3932,9 +3927,12 @@ def process_manifest_format(f, proto, client_name, itag):
elif itag:
f['format_id'] = itag
if f.get('source_preference') is None:
f['source_preference'] = -1
if itag in ('616', '235'):
f['format_note'] = join_nonempty(f.get('format_note'), 'Premium', delim=' ')
f['source_preference'] = (f.get('source_preference') or -1) + 100
f['source_preference'] += 100
f['quality'] = q(itag_qualities.get(try_get(f, lambda f: f['format_id'].split('-')[0]), -1))
if f['quality'] == -1 and f.get('height'):
@@ -3943,6 +3941,10 @@ def process_manifest_format(f, proto, client_name, itag):
f['format_note'] = join_nonempty(f.get('format_note'), client_name, delim=', ')
if f.get('fps') and f['fps'] <= 1:
del f['fps']
if proto == 'hls' and f.get('has_drm'):
f['has_drm'] = 'maybe'
f['source_preference'] -= 5
return True
subtitles = {}
@@ -4015,8 +4017,6 @@ def _download_player_responses(self, url, smuggled_data, video_id, webpage_url):
webpage = None
if 'webpage' not in self._configuration_arg('player_skip'):
query = {'bpctr': '9999999999', 'has_verified': '1'}
if smuggled_data.get('is_story'): # XXX: Deprecated
query['pp'] = self._PLAYER_PARAMS
webpage = self._download_webpage(
webpage_url, video_id, fatal=False, query=query)
@@ -4044,6 +4044,10 @@ def _list_formats(self, video_id, microformats, video_details, player_responses,
else None)
streaming_data = traverse_obj(player_responses, (..., 'streamingData'))
*formats, subtitles = self._extract_formats_and_subtitles(streaming_data, video_id, player_url, live_status, duration)
if all(f.get('has_drm') for f in formats):
# If there are no formats that definitely don't have DRM, all have DRM
for f in formats:
f['has_drm'] = True
return live_broadcast_details, live_status, streaming_data, formats, subtitles
@@ -4228,7 +4232,7 @@ def is_bad_format(fmt):
for fmt in filter(is_bad_format, formats):
fmt['preference'] = (fmt.get('preference') or -1) - 10
fmt['format_note'] = join_nonempty(fmt.get('format_note'), '(Last 4 hours)', delim=' ')
fmt['format_note'] = join_nonempty(fmt.get('format_note'), '(Last 2 hours)', delim=' ')
if needs_live_processing:
self._prepare_live_from_start_formats(
@@ -4888,7 +4892,8 @@ def _extract_entries(self, parent_renderer, continuation_list):
'videoRenderer': lambda x: [self._video_entry(x)],
'playlistRenderer': lambda x: self._grid_entries({'items': [{'playlistRenderer': x}]}),
'channelRenderer': lambda x: self._grid_entries({'items': [{'channelRenderer': x}]}),
'hashtagTileRenderer': lambda x: [self._hashtag_tile_entry(x)]
'hashtagTileRenderer': lambda x: [self._hashtag_tile_entry(x)],
'richGridRenderer': lambda x: self._extract_entries(x, continuation_list),
}
for key, renderer in isr_content.items():
if key not in known_renderers:
@@ -6380,6 +6385,28 @@ class YoutubeTabIE(YoutubeTabBaseInfoExtractor):
'channel_is_verified': True,
},
'playlist_mincount': 10,
}, {
# Playlist with only shorts, shown as reel renderers
# FIXME: future: YouTube currently doesn't give continuation for this,
# may do in future.
'url': 'https://www.youtube.com/playlist?list=UUxqPAgubo4coVn9Lx1FuKcg',
'info_dict': {
'id': 'UUxqPAgubo4coVn9Lx1FuKcg',
'channel_url': 'https://www.youtube.com/channel/UCxqPAgubo4coVn9Lx1FuKcg',
'view_count': int,
'uploader_id': '@BangyShorts',
'description': '',
'uploader_url': 'https://www.youtube.com/@BangyShorts',
'channel_id': 'UCxqPAgubo4coVn9Lx1FuKcg',
'channel': 'Bangy Shorts',
'uploader': 'Bangy Shorts',
'tags': [],
'availability': 'public',
'modified_date': '20230626',
'title': 'Uploads from Bangy Shorts',
},
'playlist_mincount': 100,
'expected_warnings': [r'[Uu]navailable videos (are|will be) hidden'],
}]
@classmethod
@@ -7104,22 +7131,6 @@ class YoutubeHistoryIE(YoutubeFeedsInfoExtractor):
}]
class YoutubeStoriesIE(InfoExtractor):
IE_DESC = 'YouTube channel stories; "ytstories:" prefix'
IE_NAME = 'youtube:stories'
_VALID_URL = r'ytstories:UC(?P<id>[A-Za-z0-9_-]{21}[AQgw])$'
_TESTS = [{
'url': 'ytstories:UCwFCb4jeqaKWnciAYM-ZVHg',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = f'RLTD{self._match_id(url)}'
return self.url_result(
smuggle_url(f'https://www.youtube.com/playlist?list={playlist_id}&playnext=1', {'is_story': True}),
ie=YoutubeTabIE, video_id=playlist_id)
class YoutubeShortsAudioPivotIE(InfoExtractor):
IE_DESC = 'YouTube Shorts audio pivot (Shorts using audio of a given video)'
IE_NAME = 'youtube:shorts:pivot:audio'

View File

@@ -1,3 +1,4 @@
# flake8: noqa: F401, F403
import warnings
from ..compat.compat_utils import passthrough_module

View File

@@ -1556,7 +1556,12 @@ def redirect_request(self, req, fp, code, msg, headers, newurl):
new_method = req.get_method()
new_data = req.data
remove_headers = []
# Technically the Cookie header should be in unredirected_hdrs,
# however in practice some may set it in normal headers anyway.
# We will remove it here to prevent any leaks.
remove_headers = ['Cookie']
# A 303 must either use GET or HEAD for subsequent request
# https://datatracker.ietf.org/doc/html/rfc7231#section-6.4.4
if code == 303 and req.get_method() != 'HEAD':
@@ -1573,7 +1578,7 @@ def redirect_request(self, req, fp, code, msg, headers, newurl):
new_data = None
remove_headers.extend(['Content-Length', 'Content-Type'])
new_headers = {k: v for k, v in req.headers.items() if k.lower() not in remove_headers}
new_headers = {k: v for k, v in req.headers.items() if k.title() not in remove_headers}
return urllib.request.Request(
newurl, headers=new_headers, origin_req_host=req.origin_req_host,
@@ -3758,8 +3763,6 @@ def __init__(self, chapters, ranges, from_info=False):
self.chapters, self.ranges, self.from_info = chapters, ranges, from_info
def __call__(self, info_dict, ydl):
if not any((self.ranges, self.chapters, self.from_info)):
yield {}
warning = ('There are no chapters matching the regex' if info_dict.get('chapters')
else 'Cannot match chapters since chapter information is unavailable')
@@ -3779,9 +3782,11 @@ def __call__(self, info_dict, ydl):
if self.from_info and (info_dict.get('start_time') or info_dict.get('end_time')):
yield {
'start_time': info_dict.get('start_time'),
'end_time': info_dict.get('end_time'),
'start_time': info_dict.get('start_time') or 0,
'end_time': info_dict.get('end_time') or float('inf'),
}
elif not self.ranges and not self.chapters:
yield {}
@staticmethod
def _handle_negative_timestamp(time, info):
@@ -5113,7 +5118,7 @@ def format_field(obj, field=None, template='%s', ignore=NO_DEFAULT, default='',
def clean_podcast_url(url):
return re.sub(r'''(?x)
url = re.sub(r'''(?x)
(?:
(?:
chtbl\.com/track|
@@ -5127,6 +5132,7 @@ def clean_podcast_url(url):
st\.fm # https://podsights.com/docs/
)/e
)/''', '', url)
return re.sub(r'^\w+://(\w+://)', r'\1', url)
_HEX_TABLE = '0123456789abcdef'

View File

@@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2023.06.22'
__version__ = '2023.07.06'
RELEASE_GIT_HEAD = '812cdfa06c33a40e73a8e04b3e6f42c084666a43'
RELEASE_GIT_HEAD = 'b532a3481046e1eabb6232ee8196fb696c356ff6'
VARIANT = None