i'm trying to recognize munchkin cards from the card game. i've been trying to use a variety of image recognition APIs(google vision api, vize.ai, azure's computer vision api and more), but none of them seem to work ok.
they're able to recognize one of the cards when only one appears in the demo image, but when both appear with another one it fails to identify one or the other.
i've trained the APIs with a set of about 40 different images per card, with different angles, backgrounds and lighting.
i've also tried using ocr(via google vision api) which works only for some cards, probably due to small letters and not much details on some cards.
Does anyone know of a way i can teach one of these APIs(or another) to read these cards better? or perhaps recognize cards in a different way?
the outcome should be a user capturing an image while playing the game and have the application understand which cards he has in front of him and return the results.
thank you.
What a coincidence! I've recently done something very similar – link to video – with great success! Specifically, I was trying to recognise and track Chinese-language Munchkin cards to replace them with English ones. I used iOS's ARKit 2 (requires an iPhone 6S or higher; or a relatively new iPad; and isn't supported on desktop).
I basically just followed the Augmented Reality Photo Frame demo 41 minutes into WWDC 2018's What's New in ARKit 2 presentation. My code below is a minor adaptation to theirs (merely replacing the target with a static image rather than a video). The tedious part was scanning all the cards in both languages, cropping them out, and adding them as AR resources...
Here's my source code, ViewController.swift:
import UIKit
import SceneKit
import ARKit
import Foundation
class ViewController: UIViewController, ARSCNViewDelegate {
#IBOutlet var sceneView: ARSCNView!
override func viewDidLoad() {
super.viewDidLoad()
var videoPlayer: AVPlayer
// Set the view's delegate
sceneView.delegate = self
// Show statistics such as fps and timing information
sceneView.showsStatistics = true
sceneView.scene = SCNScene()
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
// Create a configuration
let configuration = ARImageTrackingConfiguration()
guard let trackingImages = ARReferenceImage.referenceImages(inGroupNamed: "card_scans", bundle: Bundle.main) else {
print("Could not load images")
return
}
// Setup configuration
configuration.trackingImages = trackingImages
configuration.maximumNumberOfTrackedImages = 16
// Run the view's session
sceneView.session.run(configuration)
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
// Pause the view's session
sceneView.session.pause()
}
// MARK: - ARSCNViewDelegate
// Override to create and configure nodes for anchors added to the view's session.
public func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
let node = SCNNode()
if let imageAnchor = anchor as? ARImageAnchor {
// Create a plane
let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width,
height: imageAnchor.referenceImage.physicalSize.height)
print("Asset identified as: \(anchor.name ?? "nil")")
// Set UIImage as the plane's texture
plane.firstMaterial?.diffuse.contents = UIImage(named:"replacementImage.png")
let planeNode = SCNNode(geometry: plane)
// Rotate the plane to match the anchor
planeNode.eulerAngles.x = -.pi / 2
node.addChildNode(planeNode)
}
return node
}
func session(_ session: ARSession, didFailWithError error: Error) {
// Present an error message to the user
}
func sessionWasInterrupted(_ session: ARSession) {
// Inform the user that the session has been interrupted, for example, by presenting an overlay
}
func sessionInterruptionEnded(_ session: ARSession) {
// Reset tracking and/or remove existing anchors if consistent tracking is required
}
}
Unfortunately, I met a limitation: card recognition becomes rife with false positives the more cards you add as AR targets to distinguish from (to clarify: not the number of targets simultaneously onscreen, but the library size of potential targets). While a 9-target library performed with 100% success rate, it didn't scale to a 68-target library (which is all the Munchkin treasure cards). The app tended to flit between 1-3 potential guesses when faced with each target. Seeing the poor performance, I didn't take the effort to add all 168 Munchkin cards in the end.
I used Chinese cards as the targets, which are all monochrome; I believe it could have performed better if I'd used the English cards as targets (as they are full-colour, and thus have richer histograms), but on my initial inspection of a 9-card set in each language, I was receiving as many warnings for the AR resources being hard to distinguish for English as I was for Chinese. So I don't think the performance would improve so far as to scale reliably to the full 168-card set.
Unity's Vuforia would be another option to approach this, but again has a hard limit of 50-100 targets. With (an eye-wateringly expensive) commercial licence, you can delegate target recognition to cloud computers, which could be a viable route for this approach.
Thanks for investigating the OCR and ML approaches – they would've been my next ports of call. If you find any other promising approaches, please do leave a message here!
You are going to wrong direction. As i understand, you have an image. And inside that image, there are several munchkin cards (2 in your example). It is not just only "Recognition" but also "Card detection" is needed. So your task should be divided into card detection task and card's text recognition task
For each task you can use the following algorithm
1. Card detection task
Simple color segmentation
( if you have enough time and patient, train SSD to detect card)
2. Card's text recognition
Use tesseract with english dictionary
(You could add some card rotating process to improve accuracy)
Hope that help
You can try this: https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp#OCR. It will detect text and then you can have your custom logic (based on detected text) to handle actions.
Related
I get data from the accelerometer (CMMotionManager) and training (HKWorkoutSession) and transfer it to the phone in real time, but at a random moment the watch falls asleep.
In the info I use WKBackgroundModes: workout-processing The strap is tightened tightly, at first I thought that he was losing contact and the reason was in it. When I wrote the same functions earlier using WatchKit, there was no such problem, but now with SwiftUI there is this a problem.
do {
let workoutConfiguration = HKWorkoutConfiguration()
workoutConfiguration.activityType = .mindAndBody
workoutConfiguration.locationType = .unknown
self.session = try HKWorkoutSession(healthStore: self.healthStore, configuration: workoutConfiguration)
self.builder = self.session?.associatedWorkoutBuilder()
self.builder?.dataSource = HKLiveWorkoutDataSource(healthStore: self.healthStore, workoutConfiguration: workoutConfiguration)
self.session?.delegate = self
self.builder?.delegate = self
// timer for update state
self.timerHealth = Timer.scheduledTimer(timeInterval: 1, target: self, selector: #selector(self.getHealth), userInfo: nil, repeats: true)
self.session?.startActivity(with: self.startDate)
self.builder?.beginCollection(withStart: self.startDate) { (success, error) in
guard success else {
print(error?.localizedDescription)
return
}
}
} catch {
print(error.localizedDescription)
return
}
The timer print the current time, at a random moment the output stops and is restored only after the screen is turned on
Apple's documentation write that if the workout process is enabled, the application will continue in the background, but it is not. How to set up background work? What did I miss?
You could get suspended when running in background if your app uses a lot of CPU. See https://developer.apple.com/documentation/healthkit/workouts_and_activity_rings/running_workout_sessions
To maintain high performance on Apple Watch, you must limit the amount
of work your app performs in the background. If your app uses an
excessive amount of CPU while in the background, watchOS may suspend
it. Use Xcode’s CPU report tool or the time profiler in Instruments to
test your app’s CPU usage. The system also generates a log with a
backtrace whenever it terminates your app.
It would be good to check if your SwiftUI app is doing more work than your WatchKit based one causing the suspension. You should also see a log file saved on watch that could indicate this. It'll look like a crash log but should note that CPU time was exceeded.
I am writing a HomeKit app that successfully shows live data from my supported accessories in-app. I can read single values (HMCharacteristic.readValue) or use notifications to stay updated (HMCharacteristic.enableNotification).
Now I want to implement Widgets that show this data on the user's Home Screen. This consists of four steps:
A dynamic Intent fetches all the registered (and supported) Accessories from the HMHomeManager and enables the user to select one of them to be shown on the Widget.
Inside the IntentTimelineProvider's getTimeline function I can then again use the HMHomeManager to retrieve the Accessory I want to display on the Widget (based on the Accessory's UUID which is stored inside the getTimeline's configuration parameter - the Intent).
Still inside the getTimeline function I can choose the Services and Characteristics I need for displaying the Accessory's Widget from the HMHomeManager.
Up until here everything works fine.
However, when I try to read the values from the Characteristics I chose before using HMCharacteristic.readValue, the callback contains an error stating
Error Domain=HMErrorDomain Code=80 "Missing entitlement for API."
The Widget's Info.plist contains the 'Privacy - HomeKit Usage Description' field and the Target has the HomeKit capability.
After some research I came up with the following theory: Obviously the whole WidgetKit API runs my code in background. And it seems like HomeKit does not allow access from a background context. Well, it does allow access to Homes/Services/Characteristics, but it does not allow reading or writing on Characteristics (I guess to make sure App developers use HomeKit Automations and don't try to implement custom automations that are controlled by some background process of their app running on the iPhone).
My (simplified) getTimeline code:
func getTimeline(for configuration: SelectAccessoryIntent, in context: Context, completion: #escaping (Timeline<Entry>) -> ()) {
// id stores the uuid of the accessory that was chosen by the user using the dynamic Intent
if let id = configuration.accessory?.identifier {
// Step 2.: fetch the accessory
// hm is a HMHomeManager
let hm = HomeStore.shared.homeManager
// take a short nap until the connection to the local HomeKit instance is established (otherwise hm.homes will create an empty array on first call)
sleep(1)
let accessories = hm.homes.flatMap({ h in h.accessories })
if let a = accessories.filter({ a in a.uniqueIdentifier.uuidString == id }).first {
// a holds our HMAccessory
// Step 3.: select the characteristic I want
// obviously the real code chooses a specific characteristic
let s: HMService = a.services.first!
let c: HMCharacteristic = s.characteristics.first!
// Step 4.: read the characteristic's value
c.readValue(completionHandler: {err in
if let error = err {
print(error)
} else {
print(c.value ?? "nil")
}
// complete with timeline
completion(Timeline(entries: [RenderAccessoryEntry(date: Date(), configuration: configuration, value: c.value)], policy: .atEnd))
})
}
}
}
}
My questions:
First: Is my theory correct?
If so: What can I do? Are there any entitlements that allow me to access HomeKit in background or similar? Do I need to perform the readValue call elsewhere? Or is it just impossible to use the HomeKit API with WidgetKit with the current versions of HomeKit/WidgetKit/iOS and best I can do is hope they introduce this capability at some point in the future?
If not: What am I missing?
I have a video with 3 persons speaking and I would like to annotate the location of people's eyes during it. I know that the Google Video Intelligence API has functionalities for object tracking, but it's possible to handle such an eye-tracking process using the API?
Google Video Intelligence API represents Face detection feature, which gives you opportunity to perform face detection from within video frames as well as special face attributes.
In general, you need to adjust FaceDetectionConfig throughout videos.annotate method, supplying includeBoundingBoxes and includeAttributes arguments in JSON request body:
{
"inputUri":"string",
"inputContent":"string",
"features":[
"FACE_DETECTION"
],
"videoContext":{
"segments":[
"object (VideoSegment)"
],
"faceDetectionConfig":{
"model":"string",
"includeBoundingBoxes":"true",
"includeAttributes":"true"
}
},
"outputUri":"string",
"locationId":"string"
}
There is a detailed (Python) example from Google on how to track objects and print out detected objects afterward. You could combine this with the AIStreamer live object tracking feature, to which you can upload a live video stream to get results back.
Some ideas/steps you could follow:
Recognize the eyes in the first frame of the video.
Set/highlight a box around the eyes you are tracking.
Track the eyes as an object in the next frames.
I am very new to Swift and programming. I'm trying to create a pattern of haptic feedback triggered by a UILongPressGestureRecognizer. When the user "long presses" the screen, I want the phone to vibrate three times with a 1 second delay between each vibration. I tried using "sleep" to accomplish the 1 second delays, but this didn't work. What is the best way to do this correctly?
var feedbackGenerator : UIImpactFeedbackGenerator? = nil
func performFeedbackPattern() {
//create the feedback generator
feedbackGenerator = UIImpactFeedbackGenerator(style: .heavy)
feedbackGenerator?.prepare()
//play the feedback three times with 1 second between each feedback
feedbackGenerator?.impactOccurred()
sleep (1)
feedbackGenerator?.impactOccurred()
sleep (1)
feedbackGenerator?.impactOccurred()
}
#IBAction func gestureRecognizer(_ sender: UILongPressGestureRecognizer) {
switch sender.state {
case .began:
performFeedbackPattern()
default: break
}
Recently I was doing something similar and come up with a small pod you can take a look.
Here is the link https://github.com/iSapozhnik/Haptico
So the idea is to build an OperationQueue with the banch of Operations. One operation could be your haptic feedback and another one - pause operation.
You can create an OperationQueue and add operations with haptic feedback. The operation would look like this:
class HapticFeedbackOperation: Operation {
override func main() {
// Play the haptic feedback
UIImpactFeedbackGenerator(style: .heavy).impactOccurred()
}
}
You might want to add a delay between the operations.
Checkout my open source framework Haptica, it supports both Haptic Feedback, AudioServices and unique vibrations patterns. Works on Swift 4.2, Xcode 10
I'm having trouble displaying a LiveCard.
public class RollTheDiceActivity extends Activity {
private LiveCard mLiveCard;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_roll_the_dice);
// ^^^^^^^^^^^^^^^^^^^^^^
publishCard(this);
}
private void publishCard(Context context) {
// Already published
if (mLiveCard != null)
return;
String cardId = "my_card";
TimelineManager tm = TimelineManager.from(context);
mLiveCard = tm.getLiveCard(cardId);
RemoteViews mRemoteViews = new RemoteViews(context.getPackageName(),
R.layout.livecard_roll_the_dice);
// ^^^^^^^^^^^^^^^^^^^^^^
mLiveCard.setViews(mRemoteViews);
Intent intent = new Intent(context, RollTheDiceActivity.class);
mLiveCard.setAction(PendingIntent.getActivity(context, 0, intent, 0));
mLiveCard.publish();
}
}
I expected to see the contents livecard_roll_the_dice instead of activity_roll_the_dice, since publishing will be instant and take over the screen.
Instead, activity_roll_the_dice content is showing. I think this means that the mLiveCard is either never published or published but not pushed to the screen.
How do I show the contents of a published card on the screen?
In case it helps, I'm launching the app through a voice trigger from home screen: "Ok Google, roll the dice"
Thank you!
Live cards are published in the background unless you pass PublishMode.REVEAL to the publish method to force it to be displayed. However, a larger problem is that live cards should be published by a background service rather than an activity. Live cards need to be owned by a long-running context in order to stay alive in the timeline while the user is navigating elsewhere in the timeline or in immersions.
So, if you want an activity to also publish a live card, you should put the live card code in a service and have the activity make a call to that service (e.g., using a binder) to publish the card.
Is there a reason you're using an activity at all and setting its content view when you expect the live card to be displayed immediately? If that's the behavior you want, you might consider removing the activity entirely and using a service instead, with the voice trigger attached to the service. The compass sample provides an example of this.
Calvin, the live card's "life" should be tied to something more "persistent", as the previous poster points out. If you look at my example code, it always uses a background service to control the life of the live card. Your activity will come and go (paused/resumed) whereas the live card is "pinned" and it's always there until you explicitly "unpublish" it.
One other thing i found that might save someone a bit of time with this problem!
When using RemoteViews for "low frequency updates" from a service to manage a LiveCard (rather than a DirectRenderingCallback for "high frequency" updates), ensure you DONT call setDirectRenderingEnabled(true) on the LiveCard.
This will cause the RemoteView to not work at all! Removing the setDirectRenderingEnabled from the liveCard when using a RemoteView to manage its view resource fixed the livecard not appearing issue for me.