Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to check if asset exists when using "use this filename" option #1489

Closed
jamesmacwhite opened this issue Aug 2, 2024 · 5 comments
Closed
Assignees
Labels

Comments

@jamesmacwhite
Copy link
Contributor

Description

I have a feed which is importing assets to an entry, this is the feed configuration for reference relating to the asset field.

image

Options configured:

Strategy:

✔Create new Elements
✔Update existing elements

Asset field options:

✔ Create asset from URL
If exists: replace existing asset

Use filename for assets created from URL:
Specific field with filename value: A filename value is provided in the Feed Me mapping.

There's a bit of processing behind the scenes for this asset import initially, because it's coming from a Microsoft SharePoint site (Microsoft Graph API). The image URLs provided directly are not publicly accessible, because it is an intranet site, but there is a way to generate a public image URL with a token that last for 1 hour. This feed is being provided an image URL which is public and valid as it is generated on the feed being triggered, but only for 1 hour. However the idea is, Feed Me can import the asset from this temporary URL when the feed starts and then store this as a asset instead going forward.

The first issue which is unrelated to Feed Me is the URL generated from the Microsoft Graph API looks like this horrible mess:

https://ukwest1-mediap.svc.ms/transform/thumbnail?provider=spo&inputFormat=png&cs=NWE2MWI5M2YtNzBkMy00OGRhLWFiNmUtNGFkOWZkM2UzNjgwfFNQTw&docid=https%3A%2F%2Fexample.sharepoint.com%2F_api%2Fv2.0%2Fdrives%2Fb!wt9_FRAbtEKE0np4yV82bRtcCkE_MnREpOFqIJDiJ3eHCqPq7QJfQqIyMBzSSIqY%2Fitems%2F01LVK66DAN6DS5N7XSFVA2FJWWB335ZNMV%3Ftempauth%3Dv1.eyJzaXRlaWQiOiIxNTdmZGZjMi0xYjEwLTQyYjQtODRkMi03YTc4Yzk1ZjM2NmQiLCJhcHBfZGlzcGxheW5hbWUiOiJOb3R0aW5naGFtIENvbGxlZ2UgQVBJIFNoYXJlUG9pbnQiLCJhcHBpZCI6IjVhNjFiOTNmLTcwZDMtNDhkYS1hYjZlLTRhZDlmZDNlMzY4MCIsImF1ZCI6IjAwMDAwMDAzLTAwMDAtMGZmMS1jZTAwLTAwMDAwMDAwMDAwMC9ub3R0aW5naGFtY29sbGVnZWFjdWsuc2hhcmVwb2ludC5jb21AMzYwODQ3YTgtZmRmMy00ZGU2LWI3YzgtNDU3MmM5MWM1YjgxIiwiZXhwIjoiMTcyMjYxMDgwMCJ9.CgoKBHNuaWQSAjg0EgYIwrQ6EAEaDTIwLjE5MC4xNTkuMjkqLEt2cjc1dmNmQzhCVy82NmR6TWFXaGk0ZndVT2g0aDlIY3N6bHJGUElac0U9MKkBOAFKEGhhc2hlZHByb29mdG9rZW5SHVsia21zaSIsImR2Y19jbXAiLCJkdmNfZG1qZCJdcikwaC5mfG1lbWJlcnNoaXB8MTAwMzdmZmVhNWYzMDJiOUBsaXZlLmNvbXoBMoIBEgmoRwg28_3mTRG3yEVyyRxbgZIBBUphbWVzmgEFV2hpdGWiASNqYW1lcy53aGl0ZUBub3R0aW5naGFtY29sbGVnZS5hYy51a6oBEDEwMDM3RkZFQTVGMzAyQjmyAR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWQ.cDOSv7_A78buJi0A1Oq0GgBBX5-rU9Nul8UjbbZ-SdI%26version%3DPublished&width=800&height=800&cb=63844722231

One of the first major problems is there is no file extension to infer this is an image, which is a problem when importing to image fields with specific file type restrictions. Likewise getting any usable filename from the URL like this offers nothing valuable or remotely usuable.

However, using the filename option, we are providing something that works. The origin API data provides the original filename of the generated image even though this cannot be used as a image URL path, so a value such as "25938-Army-Careers.png" is provided to the "Use this filename for assets created from URL" option, which resolves this problem.

This allows the asset import to work and not get blocked by the lack of file extension, but every time the feed runs the asset import doesn't appear to be working entirely and is logging this error:

newImage is the field handle of the asset field.

newsImage - Asset error: https://ukwest1-mediap.svc.ms/transform/thumbnail?provider=spo&inputFormat=png&cs=NWE2MWI5M2YtNzBkMy00OGRhLWFiNmUtNGFkOWZkM2UzNjgwfFNQTw&docid=https%3A%2F%2Fexample.sharepoint.com%2F_api%2Fv2.0%2Fdrives%2Fb!wt9_FRAbtEKE0np4yV82bRtcCkE_MnREpOFqIJDiJ3eHCqPq7QJfQqIyMBzSSIqY%2Fitems%2F01LVK66DAN6DS5N7XSFVA2FJWWB335ZNMV%3Ftempauth%3Dv1.eyJzaXRlaWQiOiIxNTdmZGZjMi0xYjEwLTQyYjQtODRkMi03YTc4Yzk1ZjM2NmQiLCJhcHBfZGlzcGxheW5hbWUiOiJOb3R0aW5naGFtIENvbGxlZ2UgQVBJIFNoYXJlUG9pbnQiLCJhcHBpZCI6IjVhNjFiOTNmLTcwZDMtNDhkYS1hYjZlLTRhZDlmZDNlMzY4MCIsImF1ZCI6IjAwMDAwMDAzLTAwMDAtMGZmMS1jZTAwLTAwMDAwMDAwMDAwMC9ub3R0aW5naGFtY29sbGVnZWFjdWsuc2hhcmVwb2ludC5jb21AMzYwODQ3YTgtZmRmMy00ZGU2LWI3YzgtNDU3MmM5MWM1YjgxIiwiZXhwIjoiMTcyMjYxMDgwMCJ9.CgoKBHNuaWQSAjg0EgYIwrQ6EAEaDTIwLjE5MC4xNTkuMjkqLEt2cjc1dmNmQzhCVy82NmR6TWFXaGk0ZndVT2g0aDlIY3N6bHJGUElac0U9MKkBOAFKEGhhc2hlZHByb29mdG9rZW5SHVsia21zaSIsImR2Y19jbXAiLCJkdmNfZG1qZCJdcikwaC5mfG1lbWJlcnNoaXB8MTAwMzdmZmVhNWYzMDJiOUBsaXZlLmNvbXoBMoIBEgmoRwg28_3mTRG3yEVyyRxbgZIBBUphbWVzmgEFV2hpdGWiASNqYW1lcy53aGl0ZUBub3R0aW5naGFtY29sbGVnZS5hYy51a6oBEDEwMDM3RkZFQTVGMzAyQjmyAR5hbGxzaXRlcy5yZWFkIGFsbHByb2ZpbGVzLnJlYWQ.cDOSv7_A78buJi0A1Oq0GgBBX5-rU9Nul8UjbbZ-SdI%26version%3DPublished&width=800&height=800&cb=63844722231 - Unable to check if news/84365-rapincel-WORLD-BOOK-DAY-20_2024-08-02-113032_iapr.png exists.

It appears because of using the replace i.e. conflict option, it couldn't confirm if the image existed. However, the image does exist and was first imported on the very first time this item was imported. Any subsequent update however triggers the same error, across any item.

I originally had "Use existing asset" set, however I found on every feed run it was duplicate the same image but naming it with _xxxx and generated 1000s of images. With "replace existing asset" it appears to be throwing an error but doesn't duplicate images each time.

I'm not sure if there's a bug here, or if due to the exotic asset importing setup needed, it's going outside of the current scope of asset importing/handling.

Steps to reproduce

  1. Setup asset field on entry with image type restriction.
  2. Setup a feed with importing an asset from a URL which is not very friendly, lacking a file extension and filename.
  3. Create asset from URL, if exists "replace existing asset".
  4. Configure the use this filename option with a clean value with the file extension.
  5. After the first import Feed Me will report Unable to check if the image exists.

Additional info

  • Craft version: 5.2.9
  • PHP version: 8.2
  • Database driver & version: MySQL
  • Plugins & versions: Feed Me 6.2.1
@i-just
Copy link
Contributor

i-just commented Sep 4, 2024

Hi, thanks for getting in touch and for all the info!

We already use the value mapped under “Use this filename for assets created from URL” to check if the asset exists: https://github.com/craftcms/feed-me/blob/6.2.1/src/fields/Assets.php#L183-L185. I have tested this, and I can see it working as expected.

In your case, I suspect the problem might be with the concatenation with getRemoteUrlExtension().
Technically, the value under “Use this filename for assets created from URL” should be just the filename without extension. The extension is supposed to be grabbed from the remote file.

That being said, it looks like, in the case of those very special URLs, we cannot determine the extension of the remote file, so providing a value under “Use this filename for assets created from URL” with an extension works as you’d expect it to. That takes care of the first import, but on the second import, assuming the filename from your data is my-image.jpg, the one used to check for existing assets, could be something like my-image.jpg. - this then causes the existing file not to be found and the attempt to create a new one.

That explains why the match is not being found, and I have raised a PR to adjust this check to account for an empty extension.

The Unable to check if <filename> exists. error comes from a flysystem package, not Feed Me, from the fileExists() method, which is supposed to check if a given file exists.

  • Could you let me know what filesystem you’re using?
  • If it’s not a local filesystem, could you switch to local and see if that changes things?
  • Finally, it would be great to know what Feed Me logs say about this. Could you let me know what you see when you click “Show details” for the node that triggered this error?

@i-just i-just self-assigned this Sep 4, 2024
@angrybrad
Copy link
Member

Resolved in #1506 and will be included in the next v5 and v6 releases.

@jamesmacwhite
Copy link
Contributor Author

Sorry I never replied to this! My apologies.

For info and future reference if it helps, this was using an AWS filesystem, not local. The issue stems from the fact that the generated SharePoint image path, provides no usable filename, so I have to craft one from the available image metadata i.e. original filename and point FeedMe to use it, otherwise it just can't handle it on its own. I personally think the generated file value from the Microsoft Graph API is horrible so being able to supply a filename, is absolutely needed here, which we are doing with manipulating the data.

These are the errors that are being reported, although doesn't block the image field being populated.

image

@jamesmacwhite
Copy link
Contributor Author

We also trigger a "cannot find image", but it does still get uploaded.

image

@jamesmacwhite
Copy link
Contributor Author

jamesmacwhite commented Nov 5, 2024

@i-just I still think there's further areas around this that could be investigated. We still see errors like the above when feed imports occur, specifically for these images which have horrible URLs. It also triggered some undesired behaviour related to Amazon Cloudfront where Craft was sending invalidations specifically for these images constantly because Feed Me thinks they are new or can't be matched. Invalidations can be expensive, so this caused a bit of a spike in our AWS billing. The default cache max-age value wasn't set to a high value (which is completely on me) but Feed Me not being able to determine these images exist as assets, was also a contributing factor.

Because the URL provides no obvious extension and mime type sniffing isn't going to yield anything because there's nothing to go on we have to pass a file extension in the filename. I suspect this might be the issue with the matching, but it's a catch 22, without doing this we can't even get Feed Me to import the asset URL.

We unfortunately can't change the asset URL directly, only influenced a nicer filename through the use filename option. This however I think is part of the problem, but we also rely on it to even get the assets able to be created to begin with, a fun loop!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants